Please wait...
Dataflow kit is no-coding-skills-required platform for web data extraction. So in most cases it is enough to point and select needed elements on loaded page to scrape data.
Dataflow kit uses CSS selectors to find HTML elements in web pages and to extract data from. DFK engine makes its best guess what the CSS selector might be for the selected elements. But sometimes you may specify CSS selector values manually. At the bottom of the page is a queue of links describing CSS Selectors.
Text |
This selector type is used for extracting human-readable text from the selected element and from all its child elements. HTML tags are stripped and only text is returned. |
Link |
It is used for link
extraction and website navigation. Capture `href` attribute (URL), text or specify a
special `Path` option for navigation only. When `Path` option specified, all other selectors become disable and no results from the current page will be returned. |
Image |
This selector extracts src (URL) and alt attributes of an image elements of a web page. |
Filters are used to manipulate text data when extracting.
The following filters are available:
Trim |
returns a copy of the Extractor's text/ attribute, with all leading and trailing white space removed. |
Normal case |
leaves the case and capitalization of text/ attribute exactly as is. |
UPPERCASE |
makes all of the letters in the Extractor's text/ attribute uppercase. |
lowercase |
makes all of the letters in the Extractor's text/ attribute lowercase. |
Capitalize |
capitalizes the first letter of each word in the Extractor's text/ attribute |
Filters are available for Text, Link and Image extractor types. Image alt attribute, Link Text and Text are influenced by specified filters.
The regular expression can be used to extract a substring of the text that the selector
extracts.
The whole match (group 0) will be returned as a result.
Some useful examples
are listed in the table.
RegExr is an online tool to learn, build, & test Regular Expressions.
text | regex | result |
---|---|---|
price: 10.99$ | [0-9]+\.[0-9]+ |
10.99 |
id: H18JKDX4 | [A-Z0-9]{8} |
H18JKDX4 |
date: 2018-10-19 | [0-9]{4}\-[0-9]{2}\-[0-9]{2} |
2018-10-19 |
Double click on a selector name to rename it while selecting elements on the page.
Specify CSS Selector value for web elements. Double click it to enter new value manually.
Delete any selector from collection anytime by clicking
`Trash` button if you don't need it anymore.