We’ve made a change. Scrapinghub is now Zyte! 

How to use a proxy in Puppeteer

time to read
3
Mins
By the one and only
January 23, 2020

Puppeteer is a high-level API for headless chrome. It’s one of the most popular tools to use for web automation or web scraping in Node.js. In web scraping, many developers use it to handle javascript rendering and web data extraction. In this article, we are going to cover how to set up a proxy in Puppeteer and what your options are if you want to rotate proxies.

Puppeteer and proxies

In this section, we’re going to configure Puppeteer to use a proxy. For this, you will need a working proxy and a destination URL to send the request to.

'use strict';

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch({

     args: [ '--proxy-server=http://10.10.10.10:8000' ]

  });
  const page = await browser.newPage();
  await page.goto('http://toscrape.com');
  await browser.close();
})();

As simple as that. This code will ensure that every request goes through the defined proxy. One downside with Puppeteer is that you cannot define proxies for each request in a simple way. So, the specified proxy will be used for all the requests of the browser instance.

IP rotation with Puppeteer

When you scrape the web at scale, you need to rotate proxies to avoid bans. If you want to implement your own IP pool in Puppeteer you will realize that you can only set up proxies on browser-level (code above) and not per request. This is not ideal if you need to use different proxies for each request. See this Github issue for more information about this topic.

To rotate proxies in Puppeteer and to use a different IP address for each request you need a proxy server. To have a proxy server, you can implement your own or just use a backconnect proxy service for this. Be aware, implementing your own proxy server might put you into a rabbit hole where you will need to solve problems that are totally unrelated to web scraping and you can get distracted from what you really want to achieve (extract the data). So it’s not recommended. But if you decide to go this way, this is an example, created with proxy-chain:

const proxies = {
  'useragent1': 'http://user:pass@85.237.57.198:44959,
  'useragent2': 'http://user:pass@116.0.2.94:43379,
  'useragent3': 'http://user:pass@186.86.247.169:39168,
};

const server = new ProxyChain.Server({
  port: 8000,
  prepareRequestFunction: ({request}) => {
    const userAgent = request.headers['user-agent'];
    const proxy = proxies[userAgent];
    return {
    upstreamProxyUrl: proxy,
    };
  });
});

server.listen(() => console.log('Proxy server works!));

Puppeteer with Zyte Smart Proxy Manager (formerly Crawlera)

If you don’t want to implement your own JS proxy server, you can use a rotating proxy service, like Zyte Smart Proxy Manager. This is the simplest way to use proxies with Puppeteer. If you don’t want to struggle with IP rotation and just want successful requests, this is how to use Puppeteer with Zyte Proxy Manager:

Note: It is recommended to use Puppeteer 1.17 with Chromium 76.0.3803.0. For newer versions of Puppeteer, the latest Chromium snapshot that can be used is r669921.

  1. Set ignoreHTTPSErrors to true in puppeteer.launch method
  2. Specify Zyte Proxy Manager's host and port in --proxy-server flag
  3. Send Zyte Proxy Manager credentials in the Proxy-Authorization header

Here’s an example:

const puppeteer = require('puppeteer');
(async () => {
     const browser = await puppeteer.launch({
        ignoreHTTPSErrors: true,
        args: [
            '--proxy-server=proxy.crawlera.com:8010'
        ]
    });
    const page = await browser.newPage();

    await page.setExtraHTTPHeaders({
        'Proxy-Authorization': 'Basic ' + Buffer.from(':').toString('base64'),
    });
    
    console.log('Opening page ...');
    try {
        await page.goto('https://httpbin.scrapinghub.com/redirect/6', {timeout: 180000});
    } catch(err) {
        console.log(err);
    }
  
    console.log('Taking a screenshot ...');
    await page.screenshot({path: 'screenshot.png'});
    await browser.close();
})();

With Zyte Smart Proxy Manager , you don’t have to struggle with IPs and rotation. Zyte Proxy Manager will take care of making your requests successful. For more tips on how to use Zyte Smart Proxy Manager with Puppeteer see our support page. If you want to try it for FREE, go here!

Written by Attila Toth
Sign up to the blog