We’ve made a change. Scrapinghub is now Zyte! 

Spoofing your Scrapy bot IP using tsocks

time to read
< 1
Mins
By the one and only
November 12, 2010

It is well known that many websites show different content depending on the region where they’re accessed. For example, some retailer sites show products available only for the region (US, Europe) of the user accessing the site.

Although this can be quite convenient for the website customers, it can be a pain for developers writing a spider for the site and running it from their local machines.

There is a simple way to proxy all requests as if they came from another server. You only need SSH access to this other server, no need to install any HTTP proxy. For this, you can use a program called tsocks.

Here’s how to do it in Ubuntu, though this recipe should be easy to extended to other Linux distros.

First, install tsocks with:

$ apt-get install tsocks

Then add this content to ~/.tsocksrc (update: recent versions settings are stored at ~/.tsocks.conf, but it may vary across distributions):

server = 127.0.0.1 server_type = 5 server_port = 9999

Next, SSH to the remote server you want to use:

$ ssh -D 9999 some_remote_server

And finally, in another terminal (without closing the SSH console), just run Scrapy by prefixing it with the tsocks command, like this:

$ tsocks scrapy crawl myspider

That’s all. Your spider will run in your local machine but proxying all communication through the remote server. No need to change any settings or configuration.

Written by Kevin McKinless
Web scraping specialist with over 10 years experience. An expert in Python and Rocket League. Join me on social media and we can talk all things Data Extraction.
Sign up to the blog