Many large sites detect multiple requests coming in from one IP address in a short amount of time. This usually indicates some sort of automated access and a site blocks future requests from that client for a pre-set period of time.
In order to get around this type of restriction when crawling or processing data from certain sites, you need to diversify the IP addresses of your requests out evenly across a number of proxy servers.
Proxy servers are often used to get around geo-IP based content restrictions. For example, someone in Europe wants to extract data from a website with restricted access to US users only. It is evident to make requests through a proxy server that’s located in USA, since their traffic seems to be coming from the US IP address.
In order to obtain country-specific versions of target websites, just specify any arbitrary country in request parameters in Dataflow Kit fetch service.
When you make an HTTP request to a site using a proxy server, instead of travelling directly to that site, your request first passes through the proxy server, and then on to your target site.
Thus, the proxy server is making the request on your behalf ("by proxy") and then passing the response from the target site back to you.
Dataflow kit forwards web page fetching requests to proxy servers and in return proxies sends response back with downloaded web page content.
From the perspective of the target site, they have no idea that the request is being proxied. They simply see a normal web request coming in from the proxy server’s IP address.
In order to get around content download restrictions from certain web sites Dataflow Kit offers utilizing proxy IPs.
Our default datacenter shared proxy servers are usable for most sites and at most volumes. Usage of these proxies incurs an additional request for each page processed.
If you would like to utilize private or residential proxies for specific sites or individual crawls, please contact Dataflow Kit Support.