Please wait...

HTML Scraping from any website.

Execute Javascript code and render dynamic content to static HTML with Headless Chrome in the cloud.

We route HTTP requests via a worldwide proxy network according to specified target geolocation.

Dataflow Kit Scraper

HTML proxy scraper as a service. Render Javascript in the cloud.

Parameters description.

Parameter Description
api_key API Key is used to authenticate with the API - You can find it in your Account Dashboard
URL Specify URL to download a web page.
Proxy Select country to pass requests through a proxy located there to target web sites.
Render Javascript If the content of a website depends on Javascript set to "Yes." It usually happens when a website is built with a framework like React or Angular.
For static HTML web pages set the value to "No." Defaults to "Yes."
Wait Delay Specify the "Wait Delay" parameter for a custom delay (in seconds).
It may be useful if certain elements of the website need to be rendered after the initial page load.
Initial cookies The "Initial Cookies" option is useful for crawling websites that require a login. The simplest solution to get an array of cookies for specific websites is to use a web browser and EditThisCookie extension. Copy a cookie array with EditThisCookie and paste it into the "Initial cookie" field.
Read the article on Passing cookies to a scraper to crawl web sites requiring a login.
Ignore HTTP status error codes The HTTP 200 OK success status response code indicates that the request has succeeded. Sometimes a server returns normal HTML content even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode option is useful when you need to force the return of HTML content. Defaults to "false."
Actions Use Actions to automate manual workflows while rendering web pages. They simulate real-world human interaction with pages.

Add Actions

  1. Add actions by clicking the Actions menu items above.
  2. Available action parameters will be highlighted in the rows of the table for your convenience.
  3. Then right-click or double-click an action line for help on the current action and its options.

HTML Scraping API code generator.

Specify parameters to generate a "ready-to-run" code for your preferred language.


            
 Copy the snippet and give it a go! 

How to scrape HTML from a website built with Javascript?

Base Fetcher Render Javascript: No

Server-side rendering (SSR) is a technique when the whole document is generated totally on the server. Whenever a request comes, the server first generates the entire document and returns its content to the client. The browser in the client machine simply displays that document, without any further rendering.

"Base Fetcher" is suitable for processing server-side rendered pages where the HTML in the HTTP response contains all content.

Crawling a URL with "Base fetcher" takes fewer resources and works faster than rendering HTML with "Chrome fetcher."

Requesting static HTML pages is always cheaper ...

But...

Chrome Fetcher Render Javascript: Yes

When they talk about client-side rendering, that means rendering content in the browser using JavaScript. So instead of getting all of the content from the HTML document itself, you are getting a bare-bones HTML document with a JavaScript file that will render the rest of the site using the browser. Usually, additional AJAX calls come from the client to the server to refresh web page content or to receive extra data.

JavaScript Frameworks like Angular, React, Vue.js used widely for building modern web applications. They consist of HTML + JS code. HTML initially does not contain all the actual content. It loads dynamically after rendering JavaScript code.

So scraping such HTML pages 'as is' is useless for most cases.

The headless Chrome browser is used by "Chrome fetcher" to render dynamic content and return it as a static HTML. It renders websites in the same way as a real browser would do it.

V/S

Automation of manual workflows.

Of course, we don't intend only to render JavaScript driven web pages but to perform tasks with them.

Doing actions helps you to become closer to the desired data.

Actions are performed by scraper upon visiting a Web page. It simulates real-world human interaction with the page.

You can use DFK API for executing simple actions after rendering a web page:

"Input" action

Specify Input CSS Selector and Input Text to perform search queries, or fill forms.

"Click" action

Click on an element with the specified CSS Selector.

"Wait" action

Wait for the specific DOM elements you want to manipulate.

"Scroll" action

Automatically scroll a page down to load more content, simulating user interaction with infinite scrolled pages.

Let machines do the grunt work and let humans do what they do best.

Proxy scraper.

Proxy scraper online service from Dataflow kit is useful to get around content download restrictions from specific websites.

Choose the one from 100+ supported global locations to send your html scraping API requests.

Or select "country-any" to use random geo-targets.

Dataflow kit API.

Render JavaScript web pages right from your application.

Just send an API request specifying the desired web page and parameters.

Easily integrate DFK API with your applications using your favorite framework or language.

Cloud file storage.

Store from a few records to a few hundred million, with the same low latency and high reliability in our S3 compatible storage.

And besides, you can easily upload your data to the following cloud storages:

Google Drive,
Dropbox,
Microsoft Onedrive

spider logo
Data Extraction from HTML.

The next step obviously after scraping a webpage is to extract specific data from rendered HTML.

Depending on a website, it may be a separate HTML element like an image, text, link. Or for example, e-commerce sites list several products on a page as blocks of data grouped by some patterns.

Another web scraping task would be extracting your prospect's email, phone contacts from web pages for lead generation.

For automating such kind of tasks, we offer visual point-&-click web scraper.

data pattens grouped in blocks