Introduction

Welcome to the Dataflow Kit (DFK) API!

DFK’s API enables you to programatically manage and run your web data extraction and SERPs collection Tasks. You can easily retrieve extracted data afterwards.

Rendering web pages, Converting URLs to PDF or Capturing a web page screenshots are also can be run in Dataflow kit cloud.

Quick links to DFK API services:

Extract web data
Scrape search engine results (SERPs)
Render JS Dynamic HTML pages
Capture a screenshot
Convert URL to PDF

Curl, Go, Python, Node.js, and PHP code examples are available. You can view them in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right. By default, curl is selected so that you can try out the commands in your terminal.

Authentication

To authorize, use this code:

# With shell, you can just pass a valid API Key with each request
curl --request POST \
     --url https://api.dataflowkit.com/v1/{API-ENDPOINT}?api_key=YOUR_API_KEY -d \
'{
  "foo":"bar"
}'

API-ENDPOINT corresponds to specified API endpoint. Make sure to replace YOUR_API_KEY with your API key.

After signing up, every user is assigned a personal API Access Key - a unique "password" used to make requests to the Dataflow Kit API.

Dataflow Kit API needs to be authenticated by passing a secret API Key to all API requests to the server as the api_key query parameter.

It looks like the following: api_key=YOUR_API_KEY

The API Key can be found in the DFK Dashboard after registration.

Once you sign-up, we grant you free 1000 credits which is equal to €5 for evaluation and testing.

You must replace YOUR_API_KEY with your personal API key.

Versioning

All Dataflow Kit API endpoints URLs start with https://api.dataflowkit.com/v1/.

The current API version 1 is available via the /v1 prefix.

If there are backward incompatible changes that need to be made to our API, we will release a new API version. The previous API version will be maintained for at least a year after releasing the new version.

Tasks & Processes

Tasks and processes are central to the Dataflow kit API.

Task represents an instance of web data extractor, search engine results (SERPs) collector. Task spawns a new process every run with a given set of parameters.

Examples of tasks are listed below.

Extract web data
Scrape search engine results (SERPs)

Task endpoints	Description	Results
/task/create	Create new task	Returns `{JSON object}` representing task structure. Pass `task id` to the `run` endpoint to launch it afterwards.
/task/{Task_ID}/run	Run the task with `task id` created before	A new process spawned from the specified task is created and `{JSON object}` representing process structure returned.
/task/{Task_ID}/info	Get an information about task with `task id`	`{JSON object}` containing JSON payload and other meta information.
/task/{Task_ID}/results	Send a request to `/results` endpoint to retrieve a list of processes belong to this task	Returns `[JSON array]` containing processes belong to this task.
/task/{Task_ID}/update	Update existing task. Pass here `{JSON object}` task structure with updated fields.	Returns `{JSON object}` representing updated task structure.
/task/{Task_ID}/delete	Deletes the task with `task id`	`{"deleted":"ok"}`

Process is a single job spawned by a Task performing data extraction or conversion action.

Process endpoints	Description
/Process/{Process ID}/info	Returns `{JSON object}` representing process structure with specified `process id`.
/Process/{Process ID}/cancel	Cancels the process specifying its `process id`. Returns {JSON object} representing process structure by `process id`.

The next sections list HTTP endpoints that can be used to manipulate Tasks & Processes.

Create a Task

Create an Web data Extractor/ SERP collection Task specifying payload configuration

curl --request POST \
     --url https://api.dataflowkit.com/v1/task/create?api_key=YOUR_API_KEY \
     -d '{JSON Task Payload}'

Create task endpoint is used to create tasks with specified parameters to run them multiple times afterwards. The same payload structure is used both for Web Data extraction and Search engine results (SERPs) collection tasks.

Send JSON Task payload to /task/create endpoint.

Create task endpoint returns New Task Object

{
   "id":"1XtQA0Z15N3fqKZuzPKESUsTIW1",
   "name":"Task Name",
   "webhook":"https://your-web-site.com/webhook",
   "payload":{JSON Paylod},
   "description":"Task description...",
   "type":"extract"
}

Returned object

If successful, returns task JSON object. The error is returned otherwise.

Run a Task

Run the task

curl --request POST \
     --url https://api.dataflowkit.com/v1/task/{Task_ID}/run?api_key=YOUR_API_KEY

Posting a request to /task/{Task_ID}/run endpoint starts a new process spawned from the previously created Task with {Task_ID} in the Dataflow Kit cloud.

This method returns immediately a {JSON Process Object} generated by the current task, while the process continues in the background. You can use webhooks or polling /process/{Process_ID}/info endpoint to figure out when resulted data for this Process ID is ready to retrieve it.

Process object

Process object

{
  "id": "1PBhj5EGo2hAvBsytLDL363A6Mq",
  "status":"finished",
  "taskID":"1NGYaLJsY8Xf7RwO99Ew3yyt5rz",
  "startedAt": "1580302278",
  "finishedAt": "1580312522",
  "requestCount": 1000,
  "responseCount":1000,
  "results" : "Results File Name",
  "logFile" : "",
  "missingCredits":0,
  "cost":50,
}

Process object contains the following information:

Property	Description
id	A globally unique id represents this Process.
status	Represent status of the current process. Possible status values are described below.
taskID	Task ID which Curent Process belong to.
startedAt	The time that this Process was started at, in Unix time format.
finishedAt	The time that this Process was completed or Cancelled. This field will be null if the run is either initialized or running. Time is in in Unix time format.
requestCount	The number of requests for web data / SERP extraction Tasks that have been performed by this Process so far.
responseCount	The number of successful responses for web data / SERP extraction Tasks that have been performed by this Process so far.
results	The name of a file results in Dataflow Kit storage. File format can be specified in a task payload as either CSV, MS Excel, JSON, JSON Lines or XML.
logFile	The Link to the log file.
missingCredits	A number of missing credits needed to complete a process. Partial data that was extracted so far will be available for download. The complete data set may be returned after replenishment of funds.
cost	A number of credits that have been withdrawn for the current process.

Once after a process spawned by Task is completed, its status changes from running to the one following statuses:

finished Process is finished successfully,
canceled Process has been canceled by user,
failed An error occured during process execution,
pending There are not enough credits to complete the process. In order to get all results user should replenish credits.

Process info

curl --request POST \
     --url https://api.dataflowkit.com/v1/processes/{Process_ID}/info?api_key=YOUR_API_KEY

Process Info endpoint returns a process object described above that contains all the details about a specific Process.

If response status is running then polling the process info endpoint on the way will return different request and response count according to the actual progress.

Right after process completion extra information like startedAt, finishedAt, results and cost will be returned.

No results or incomplete result sets are returned if the process has been canceled or the process failed.

Download results.

Get Results download link

curl --request GET \
     --url https://api.dataflowkit.com/v1/getlink?api_key=YOUR_API_KEY \ 
     -d 'Results File Name'

Send a request containing Results File Name from a Process to /getlink endpoint to retrieve download link.

As a result the actual download link to results file will be returned.

https://dfk-storage.ams3.digitaloceanspaces.com/results/96d16bce_2019-05-15_19%3A02.json?X-Amz-Signature=1b321eb76325140fb85a2dfb0fbc4834a7d8b998d3054d84636a77ecdd8016ef

Run the script to download results file

curl --request GET \
     --url  "https://dfk-storage.ams3.digitaloceanspaces.com/results/96d16bce_2019-05-15_19%3A02.json?X-Amz-Signature=1b321eb76325140fb85a2dfb0fbc4834a7d8b998d3054d84636a77ecdd8016ef"

Run the script on the right to download results providing the link above.

Intermediate conclusion

Cancel a Process

curl --request POST \
     --url https://api.dataflowkit.com/v1/processes/{Process_ID}/cancel?api_key=YOUR_API_KEY

Cancel method stops the specified currently running Process. Credit will be withdrawn for already processed successfull requests.

Task info

curl --request POST \
     --url https://api.dataflowkit.com/v1/tasks/{Task_ID}/info?api_key=YOUR_API_KEY

Gets a Task object that contains all the details about a specific Task.

Task object

{
  "id": "1PBhaN1wLaqN8BINrsDXlZANpWN",
  "name": "taskName",
  "description":"Task description...",
  "type":"extract",
  "payload": {JSON Payload},
  "webhook" : "http://mywebsite.com/webhook/"
}

Task object has the following properties:

Property	Description
id	A globally unique id represents this Task.
name	Task name parameter is optional.
description	Task description...
type	Currently only one "extract" type available for all tasks.
payload	JSON structure that describes a set of rules for Task launch. Payload depends on task type. Each type of payload is described in corresponded section.
webhook	If provided, Dataflow Kit API will send the results to given URL.

Get a Task results

Get a Task's results after completion.

curl --request POST \
     --url https://api.dataflowkit.com/v1/task/{Task_ID}/results?api_key=YOUR_API_KEY

Response consists of an array of corresponded processes that were created by specific task.

[
  {
    "id": "1PBhj5EGo2hAvBsytLDL363A6Mq",
    "status":"finished",
    "taskID":"1NGYaLJsY8Xf7RwO99Ew3yyt5rz",
    "startedAt": "1580302278",
    "finishedAt": "1580312522",
    "requestCount": 1000,
    "responseCount":1000,
    "missingCredits":0,
    "cost":100,
    "results" : "Results File Name",
    "logFile" : ""
  },
  {
    "id":"1NotHmEj03c27QUn54dtgICziSy",
    "status":"failed",
    "taskID":"1NGYaLJsY8Xf7RwO99Ew3yyt5rz",
    "startedAt": "1580302278",
    "finishedAt": "1580312522",
    "requestCount": 8,
    "responseCount":8,
    "missingCredits":0,
    "cost":100,
    "results" : "Results File Name",
    "logFile" : ""
  }
]

Send a request to the task /task/{Task_ID}/results endpoint to retrieve an array of corresponded processes that were created by specified task.

Depending on data extraction settings, resulted data then may be either downloaded from DFK storage or uploaded directly to Google Cloud, Dropbox and Microsoft Onedrive.

Get a list of Tasks

Get a list of tasks.

curl --request POST \
     --url https://api.dataflowkit.com/v1/tasks?api_key=YOUR_API_KEY

This endpoint returns the list of all Tasks that the user created or used. The response is a list of Tasks where each object contains a basic information about a single Task.

As a response, a JSON array will be returned with objects containing user tasks.

[
  {
   "id":"1XtQA0Z15N3fqKZuzPKESUsTIW1",
   "name":"SERP Task",
   "webhook":"https://your-web-site.com/webhook1",
   "payload":{JSON Paylod},
   "description":"SERP description...",
   "type":"extract"
  },
  {
   "id":"fg1QA0Z15N3fqKZuzPKESUsTIW1",
   "name":"Web Extraction Name",
   "webhook":"https://your-web-site.com/webhook2",
   "payload":{JSON Paylod},
   "description":"Web description...",
   "type":"extract"
  }
]

Delete a Task

Delete a Task

curl --request DELETE \
     --url https://api.dataflowkit.com/v1/task/{task_ID}delete?api_key=YOUR_API_KEY

Calling this endpoint deletes a specific Task along with corresponding resulted data and log files.

As a response the JSON object is returned. {"deleted":"ok"}

References

Refer to the corresponded sections for more information about specific task types:

Extract web data
Scrape search engine results (SERPs)

Single Processes

Single process is intended for performing simple jobs like rendering/ fetching html, capturing a screenshot or print web page to PDF. It is similar to a Task. But the general difference is that a Single Process can be run only once and returns result immediately after finishing.

Examples of Single process types are listed here:

Fetch/ Render HTML pages
Capture screenshots
Print URLs to PDF

Fetch HTML

Base Fetcher

curl --request POST \
     --url https://api.dataflowkit.com/v1/fetch?api_key=YOUR_API_KEY -d \
'{
  "type":"base",
  "url":"https://anysite.com",
  "proxy": "country-any"
}'

Chrome Fetcher

curl --request POST \
     --url https://api.dataflowkit.com/v1/fetch?api_key=YOUR_API_KEY -d \
'{
  "type":"chrome",
  "url":"http://google.com",
  "proxy":"country-any",
  "waitDelay":0.5,
  "actions": [
        {
            "input": {
                "selector": "#search-box",
                "value": "Search Term"
            }
        },
        {
            "click": {
                "selector": "#button"
            }
        },
        {
            "waitVisible": {
                "selector": ":root"
            }
        },
        {
            "scroll": {
                "times": "10"
            }
        }
    ]
}'

Fetch endpoint is used for web pages download. Regular pages are fetched "as is" using standard http requests. But real headless chrome web browser is used for rendering dynamic Javascript driven web pages.

Base Fetcher

Base fetcher uses standard http requests to download regular pages. It works faster than Chrome fetcher.

Chrome Fetcher

Chrome fetcher is intended for rendering dynamic Javascript based content. It sends requests to Chrome running in headless mode.

Parameters

Parameter	Description
type	If set to `"base"`, Base fetcher is used for downloading web page content. Use `"chrome"` for fetching content with headless chrome browser.
url	Specify url to download.
proxy	Specify proxy like country-sk
waitDelay	Specify a custom delay (in seconds). This may be useful if certain elements of the web site need to be rendered after initial page load. (Chrome fetcher type only)
actions	Use actions to automate manual workflows while rendering web pages. They simulate real-world human interaction with pages. (Chrome fetcher type only)

Fetch Response

Fetch returns utf8 encoded web page content.

Capture a Screenshot

Create a PNG Screenshot from URL

curl --request POST \
    --url https://api.dataflowkit.com/v1/convert/url/screenshot?api_key=  \
    -H "Content-Type: application/json" \
    -d '{
    "url": "https://dataflowkit.com",
    "proxy": "country-au",
    "width": 1920,
    "height": 1080,
    "offsetx": 50,
    "offsety": 50,
    "scale": 1,
    "format": "jpeg",
    "quality": 90,
    "waitDelay": 0.5,
    "actions":[]
}'

Dataflow Kit Screenshot endpoint is intended for taking screenshots from web pages.

It returns the png/ jpeg captured screenshot download link.

Parameter	Default	Description
url	-	Remote web page URL to take a screenshot from
format	png	Sets the Format of output image. Values: png, jpeg
quality	80	Sets the Quality of output image. Compression quality from range [0..100] (jpeg only).
fullPage	false	takes a screenshot of a full web page. It ignores offsetX, offsety, width and height argument values.
clipSelector	-	captures a screenshot of specified HTML element. For example, pass CSS selector like "#clipped-element" as an Value.
offsetx	0	X offset in device independent pixels (dip).
offsety	0	Y offset in device independent pixels (dip).
width	800	Rectangle width in device independent pixels (dip).
height	600	Rectangle height in device independent pixels (dip).
scale	1	Page scale factor. range [0.1..3] defaults to 1
waitDelay	-	Specify a custom delay (in seconds) before making of a Screenshot. This may be useful if certain elements of the web site need to be rendered after initial page load. (e.g. CSS animations, JavaScript effects, etc.)

Print URL to PDF

Create an Converter Task specifying payload configuration

curl --request POST \
        --url https://api.dataflowkit.com/v1/convert/url/pdf?api_key=  \
        -H "Content-Type: application/json" \
        -d '{
          "url": "https://dataflowkit.com",
          "proxy": "country-at",
          "paperSize": "A4",
          "landscape": false,
          "printBackground": false,
          "printHeaderFooter": true,
          "scale": 1,
          "pageRanges": "",
          "marginTop": 0.4,
          "marginLeft": 0.4,
          "marginRight": 0.4,
          "marginBottom": 0.4,
          "waitDelay": 0.5,
          "actions":[]
}'

Parameter	Default	Description
url	-	The full URL address (including HTTP/HTTPS) of web page that you want to print to PDF
proxy	-	Select country to locate proxy to pass requests through to target web sites.
orientation	false	Paper orientation. Set landscape = true for portrait orientation.
Page size	"A4"	Page size parameter consists of the most popular page formats. Possible values are: "A3", "A4", "A5", "A6", "Letter", "Legal", "Tabloid"
Print background	false	Print background graphics in the PDF.
Page ranges	-	Specify page ranges to convert, e.g., '1-4, 6, 10-12'. Defaults to the empty value, which means convert all pages.
Scale	1	By default, PDF document content is generated according to the size and dimensions of the original web page content. Using Scale parameter you can specify a custom zoom factor from 0.1 to 5.0 of the webpage rendering.
marginTop	0.4 inches	Top Margin of the PDF
marginLeft	0.4 inches	Left Margin of the PDF
marginRight	0.4 inches	Right Margin of the PDF
marginBottom	0.4 inches	Bottom Margin of the PDF
Header and Footer	false	Turn the header/footer on or off. They include the date, name of the web page, the page URL and how many pages the document you're printing.
Wait Delay	-	Specify a custom delay (in seconds) before generation of a PDF. This may be useful if certain elements of the web site need to be rendered after initial page load. (e.g. CSS animations, JavaScript effects, etc.)
Actions	-	Actions simulate real-world human interaction with pages. They can be used to automate manual workflows before a PDF conversion is performed.

Extract data from web

/extract endpoint crawls web pages and extracts data like text, links or images following the specified rules. Dataflow kit uses CSS selectors to find HTML elements in web pages and to extract data from. Extracted data is returned in CSV, MS Excel, JSON, JSON(Lines) or XML format.

Collection scheme

Here is a simple collection object:

'{
    "name":"test.dataflowkit.com",
    "request":{
        "url":"https://test.dataflowkit.com/persons/page-0",
        "type":"chrome",
        "proxy":"country-any"
    },
    "commonParent":".parent",
    "fields":[
        {
            "name":"Number",
            "selector":".badge-primary",
            "attrs":["text"],
            "type":1,
            "filters":[
                {
                    "name":"trim"
                }
            ]
        },
        {
            "name":"Name",
            "selector":"#cards a",
            "attrs":["href","text"],
            "type":2,
            "filters":[
                {
                    "name":"trim"
                }
            ]
        },
        {
            "name":"Picture",
            "selector":".card-img-top",
            "attrs":["src","alt"],
            "type":0,
            "filters":[
                {
                    "name":"trim"
                }
            ]
        }
    ],
    "paginator":{
        "nextPageSelector":".page-link",
        "pageNum":2
        },
    "path":false,
    "format":"JSON"
}'

Collection scheme represents settings for data extraction from specified web site. It has the following properties:

Property	Description	Required
name	Collection name	required
request	Request parameters for downloading html pages. Refer to Fetch HTML section for more details about request parameters	required
url	url holds the the starting web page address to be downloaded.	required
type	type specifies fetcher type which may be "base" or "chrome" value. If omited "base" fetcher is used by default	optional
commonParent	commonParent specifies common ancestor block for all fields used to extract data from a web page	optional
fields	A set of fields used to extract data from a web page. A Field represents a given chunk of data to be extracted from every block on each page. Read more about field types	required
name	Field name is used to aggregate results.	required
selector	Selector represents a CSS selector for data extraction within the given block.	required
attrs	A set of attributes to extract from a Field. Find more information about attributes	required
type	Selector type. ( 0 - image, 1 - text, 2 - link)	required
filters	Filters are used to pre-processing of text data when extracting.	optional
details	Details is an optional field strictly intended for Link extractor type. Details themself represent independent collection to extract data from linked pages. Read more at "details"	optional
paginator	Paginator is used to scrape multiple pages. If there is no paginator in Scheme, then no pagination is performed and it is assumed that the initial URL is the only page. Read more about paginators	optional
path	Path is a special field for navigation only. It is used to collect information from detailed pages. No results from the current page will be returned. Defaults to false.	optional
format	Extracted data is returned either in CSV, MS Excel, JSON, JSON(Lines) or XML format.	required

When requesting /task/create endpoint a new Task object will be created and returned. Refer to Tasks section for more details.

Field types and attributes

There are 3 predefined field types:

Text extracts human-readable text from the selected element and from all its child elements. HTML tags are stripped and only text is returned.

Link is used for link extraction and website navigation. Capture href(URL) attribute and , link text or specify a special Path option for navigation only. When Path option specified, all other selectors will be ignored and no results from the current page will be returned.

Image selector extracts src (URL) and alt attributes of an image.

Filters

Filters are used to manipulate text data when extracting.

The following filters are available:

Trim returns a copy of the Field's text/ attribute, with all leading and trailing white space removed.

Normal leaves the case and capitalization of text/ attribute exactly as is.

UPPERCASE makes all of the letters in the Field's text/ attribute uppercase.

lowercase makes all of the letters in the Field's text/ attribute lowercase.

Capitalize capitalizes the first letter of each word in the Field's text/ attribute

Concatinate joins text array element into a single string

Regular Expressions

"filters":[ 
    {  
      "name":"regex",
      "param":"[\\d.]+"
    }
]

For more advanced text formatting regular expression can be used.

e.x. the currency signs removed from product prices.

The whole match (group 0) will be returned as a result. Some useful examples are listed below:

Input text	Regex	Result
price: 10.99€	`[0-9]+.[0-9]+`	10.99
phone: 0 (944) 244-18-22	`\w+`	09442441822

Details

Some parts are omited for brevity

...
"fields":[
  {
      "name":"link2details",
      "selector":"h3 a",
      "details":{
          "name":"DetailsPage",
          "request":{
              "url":"http://example.com/details1/index.html",
              "type":"",
          },
          "fields":[
              {
                  "name":"title",
                  "selector":"h1",
                  "attrs":[
                      "text"
                  ],
              }
          ],
          "paginator":{},
          "path":false,
      },
      "attrs":[
          "href",
          "text"
      ],
  },
  ],
...

The Link field type might serve as a navigation link to a details page containing additional data.

So following the links from the main page, elements on detailed page can be gathered into separate collection.

Special Path option is used for navigation only. When Path option specified, no results from the current page will be returned. But grouped results from details pages will be returned instead.

Detailed page consists of its own fields and may contain paginators and deeper leveled detailed pages' collections.

Paginator

Paginator is used to scrape multiple pages. It extracts the next page from a document by querying a given CSS selector.

There are three paginator types.

"Next link" paginator type is used on pages containing link pointing to a next page. The next page link is extracted from a document by querying href attribute of a given element's CSS selector.

"Infinite scroll" paginator type automatically loads additional page content while user scrolls page down.

"Load more Button" paginator type looks like "Next link" but behaves as "Infinite scroll" paginator type. It loads additional page content on its click.

Point-and-click toolkit

The most easiest way to define fields for extraction is to use Dataflow Kit Visual interface

Just click elements on loaded page and then export collection to a file.

Select Elements

Export collection

Extract SERPs

To crawl search engine result pages (SERPs) you can run either single process or create a task. SERPs collection service extracts a list of organic results, news, images and more. Specify advanced configuration parameters such as country or languages to customize output SERP data.

The following search engines are supported:


Google Web	Google Images
Google News	Google Shopping
Bing	DuckDuckGo
Baidu	Yandex

To run SERPs service as a single process send a request with corresponding payload to /serp endpoint. Learn more info about Single processes.

Search parameters

Create an SERP Extractor Task.

curl --request POST \
        --url https://api.dataflowkit.com/v1/extract?api_key=YOUR_API_KEY \
        -H 'Content-Type: application/json' \
        -d '{
    "name": "google",
    "request": {
        "url": "https://www.google.com/search?q=dataflow+kit&lr=lang_de&gl=at",
        "proxy": "country-at",
        "type": "chrome"
    },
    "fields": [
        {
            "name": "selector1",
            "selector": ".r>a:first-of-type",
            "attrs": [
                "href",
                "text"
            ],
            "type": 2,
            "filters": [
                {
                    "name": "trim"
                }
            ]
        }
    ],
    "paginator": {
        "nextPageSelector": ".b.navend:last-child a",
        "pageNum": 3
    },
    "format": "csv"
}'

Parameter	Description	Notes
name	Collection name	required
url	url holds the link to a Search Engine to use, and other optional parameters like languages or country.	required. See URL GET parameters description below.

URL GET parameters


q	Parameter defines encoded search term. You can use anything that you would use in a regular Search engines search. (e.g. for Google, `link:dataflowkit.com`, `site:twitter.com Bratislava`, `inurl:view/view.shtml`, etc.) See The Complete List of 42 Advanced Google Search Operators	`q` parameter is used by google, Bing, DuckDuckGo. `text` is used as query holder by Yandex SE. Chineese Baidu uses `wd` for this purpose.
tbm	`tbm` is a special Google parameter used to differentiate between search types	- `tbm=isch` - Google Images, - `tbm=nws` - Google News, - `tbm=shop` - Google Shopping
lr	Restricts the search to documents written in a particular languages.	Google uses `lang_{two-letter lang code}` to specify languages and `\|` as a delimiter. (e.g., lang_sk\|lang_de will only search Slovak and German pages). See the full list of possible values for Google. For Bing specify `setLang=en` parameter. In Yandex use `lang=ca` parameter
gl	Specify the country to search from. It's a two-letter country code. (e.g., `sk` for Slovakia, or `us` for the United States).	For Google see the Country Codes page for a list of valid values. For Bing `cc=at` parameter is used.

Parameter	Description	Notes
proxy	Select country to locate proxy to pass requests through to target web sites. NOTE: You Always have to use proxy when requesting SERPs	Use `country-{two-letter lang code}` to locate proxy in specified country or `country-any` for random proxy. (e.g., `country-us` will pass all requests to US proxy; `country-any` will pass proxified requests to random country;).
fields	Set of definite CSS selectors (patterns) used to gather data from Search Engine Result Pages.	Payloads for collecting search results (SERP data) from the most popular Search Engines are available. These payloads are fully customizable.
pageNum	Specify number of pages to crawl.	Defaults to 1
format	Select format of output data.	Possible Values are CSV, JSON(L), XML

Results

Extracted data is returned in CSV, JSON, JSON(Lines) or XML format.