Please wait...

Selectors.

Dataflow kit is no-coding-skills-required platform for web data extraction. So in most cases it is enough to point and select needed elements on loaded page to scrape data.

Dataflow kit uses CSS selectors to find HTML elements in web pages and to extract data from. DFK engine makes its best guess what the CSS selector might be for the selected elements. But sometimes you may specify CSS selector values manually. At the bottom of the page is a queue of links describing CSS Selectors.

Selector types.

selector patterns

 Text

This selector type is used for extracting human-readable text from the selected element and from all its child elements. HTML tags are stripped and only text is returned.

 Link

It is used for link extraction and website navigation. Capture `href` attribute (URL), text or specify a special `Path` option for navigation only.
When `Path` option specified, all other selectors become disable and no results from the current page will be returned.

 Image

This selector extracts src (URL) and alt attributes of an image elements of a web page.

Filters.

Filters are used to manipulate text data when extracting.

The following filters are available:

Trim

returns a copy of the Extractor's text/ attribute, with all leading and trailing white space removed.

Normal case

leaves the case and capitalization of text/ attribute exactly as is.

UPPERCASE

makes all of the letters in the Extractor's text/ attribute uppercase.

lowercase

makes all of the letters in the Extractor's text/ attribute lowercase.

Capitalize

capitalizes the first letter of each word in the Extractor's text/ attribute

Filters are available for Text, Link and Image extractor types. Image alt attribute, Link Text and Text are influenced by specified filters.

selector patterns

Regex.

The regular expression can be used to extract a substring of the text that the selector extracts.
The whole match (group 0) will be returned as a result.
Some useful examples are listed in the table.

RegExr is an online tool to learn, build, & test Regular Expressions.

text regex result
price: 10.99$ [0-9]+\.[0-9]+ 10.99
id: H18JKDX4 [A-Z0-9]{8} H18JKDX4
date: 2018-10-19 [0-9]{4}\-[0-9]{2}\-[0-9]{2} 2018-10-19

Modify selectors.

Double click on a selector name to rename it while selecting elements on the page.

Note: In the output spreadsheet, the selector name will become the header for the column containing the data you collected.

Specify CSS Selector value for web elements. Double click it to enter new value manually.

Delete any selector from collection anytime by clicking `Trash` button if you don't need it anymore.