HTML Scraping. Render Javascript with Headless chrome in the cloud.

HTML Scraping from any website.

Execute Javascript code and render dynamic content to static HTML with Headless Chrome in the cloud.

We route HTTP requests via a worldwide proxy network according to specified target geolocation.

HTML proxy scraper as a service. Render Javascript in the cloud.

Parameter	Description
api_key	API Key is used to authenticate with the API - You can find it in your Account Dashboard
URL	Specify URL to download a web page.
Proxy	Select country to pass requests through a proxy located there to target web sites.
Render Javascript	If the content of a website depends on Javascript set to "Yes." It usually happens when a website is built with a framework like React or Angular. For static HTML web pages set the value to "No." Defaults to "Yes."
Wait Delay	Specify the "Wait Delay" parameter for a custom delay (in seconds). It may be useful if certain elements of the website need to be rendered after the initial page load.
Initial cookies	The "Initial Cookies" option is useful for crawling websites that require a login. The simplest solution to get an array of cookies for specific websites is to use a web browser and EditThisCookie extension. Copy a cookie array with EditThisCookie and paste it into the "Initial cookie" field. Read the article on Passing cookies to a scraper to crawl web sites requiring a login.
Ignore HTTP status error codes	The HTTP 200 OK success status response code indicates that the request has succeeded. Sometimes a server returns normal HTML content even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode option is useful when you need to force the return of HTML content. Defaults to "false."
Actions	Use Actions to automate manual workflows while rendering web pages. They simulate real-world human interaction with pages.

Parameter

Description

api_key

API Key is used to authenticate with the API - You can find it in your Account Dashboard

URL

Specify URL to download a web page.

Proxy

Select country to pass requests through a proxy located there to target web sites.

Render Javascript

If the content of a website depends on Javascript set to "Yes." It usually happens when a website is built with a framework like React or Angular.
For static HTML web pages set the value to "No." Defaults to "Yes."

Wait Delay

Specify the "Wait Delay" parameter for a custom delay (in seconds).
It may be useful if certain elements of the website need to be rendered after the initial page load.

Initial cookies

The "Initial Cookies" option is useful for crawling websites that require a login. The simplest solution to get an array of cookies for specific websites is to use a web browser and EditThisCookie extension. Copy a cookie array with EditThisCookie and paste it into the "Initial cookie" field.
Read the article on Passing cookies to a scraper to crawl web sites requiring a login.

Ignore HTTP status error codes

The HTTP 200 OK success status response code indicates that the request has succeeded. Sometimes a server returns normal HTML content even with an erroneous Non-200 HTTP response status code. The IgnoreHTTPStatusCode option is useful when you need to force the return of HTML content. Defaults to "false."

Actions

Use Actions to automate manual workflows while rendering web pages. They simulate real-world human interaction with pages.

How to scrape HTML from a website built with Javascript?

Server-side rendering (SSR) is a technique when the whole document is generated totally on the server. Whenever a request comes, the server first generates the entire document and returns its content to the client. The browser in the client machine simply displays that document, without any further rendering.

"Base Fetcher" is suitable for processing server-side rendered pages where the HTML in the HTTP response contains all content.

Crawling a URL with "Base fetcher" takes fewer resources and works faster than rendering HTML with "Chrome fetcher."

Requesting static HTML pages is always cheaper ...

But...

When they talk about client-side rendering, that means rendering content in the browser using JavaScript. So instead of getting all of the content from the HTML document itself, you are getting a bare-bones HTML document with a JavaScript file that will render the rest of the site using the browser. Usually, additional AJAX calls come from the client to the server to refresh web page content or to receive extra data.

JavaScript Frameworks like Angular, React, Vue.js used widely for building modern web applications. They consist of HTML + JS code. HTML initially does not contain all the actual content. It loads dynamically after rendering JavaScript code.

So scraping such HTML pages 'as is' is useless for most cases.

The headless Chrome browser is used by "Chrome fetcher" to render dynamic content and return it as a static HTML. It renders websites in the same way as a real browser would do it.

V/S

Of course, we don't intend only to render JavaScript driven web pages but to perform tasks with them.

Doing actions helps you to become closer to the desired data.

Actions are performed by scraper upon visiting a Web page. It simulates real-world human interaction with the page.

You can use DFK API for executing simple actions after rendering a web page:

"Input" action	Specify Input CSS Selector and Input Text to perform search queries, or fill forms.
"Click" action	Click on an element with the specified CSS Selector.
"Wait" action	Wait for the specific DOM elements you want to manipulate.
"Scroll" action	Automatically scroll a page down to load more content, simulating user interaction with infinite scrolled pages.

Let machines do the grunt work and let humans do what they do best.

Proxy scraper online service from Dataflow kit is useful to get around content download restrictions from specific websites.

Choose the one from 100+ supported global locations to send your html scraping API requests.

Or select "country-any" to use random geo-targets.

Render JavaScript web pages right from your application.

Just send an API request specifying the desired web page and parameters.

Easily integrate DFK API with your applications using your favorite framework or language.

Store from a few records to a few hundred million, with the same low latency and high reliability in our S3 compatible storage.

And besides, you can easily upload your data to the following cloud storages:

Google Drive,

Dropbox,

Microsoft Onedrive

Data Extraction from HTML.

The next step obviously after scraping a webpage is to extract specific data from rendered HTML.

Depending on a website, it may be a separate HTML element like an image, text, link. Or for example, e-commerce sites list several products on a page as blocks of data grouped by some patterns.

Another web scraping task would be extracting your prospect's email, phone contacts from web pages for lead generation.

For automating such kind of tasks, we offer visual point-&-click web scraper.

HTML Scraping from any website.

HTML proxy scraper as a service. Render Javascript in the cloud.

Parameters description.

Add Actions

HTML Scraping API code generator.

How to scrape HTML from a website built with Javascript?

Base Fetcher Render Javascript: No

Chrome Fetcher Render Javascript: Yes

Automation of manual workflows.

"Input" action

"Click" action

"Wait" action

"Scroll" action

Proxy scraper.

Dataflow kit API.

Cloud file storage.

Data Extraction from HTML.