Using Selenium and/or Zyte Smart Proxy Manager? You just stumbled upon the right blog.
I am thrilled to share some good news with all the Selenium users, who are looking for an easy-to-integrate anti-ban solution, and all the Zyte Smart Proxy Manager users, who use Selenium (a Web Automation-Headless browser library) for extracting data from javascript-heavy websites. We have just launched a new Zyte SmartProxy Selenium
.
At Zyte, the developer experience matters the most, and we wanted to give you a smooth experience of scraping dynamic websites with seamless integration between Selenium and our smart rotating proxy service, Zyte Smart Proxy Manager.
Let’s dive into how to get started, it’s super easy!
Zyte SmartProxy Selenium library is a client library built on top of Selenium — an open-source framework for web automation across Chromium, Firefox, and WebKit, with a single API, written to work seamlessly with Zyte Smart Proxy Manager.
With this library, you will be able to make the best of the headless browser capabilities of Selenium and manage bans by unlocking the powerful proxy management tool - Zyte Smart Proxy Manager in your web scraping projects.
In this tutorial, I will demonstrate how your Selenium web scraping script will have superhero capabilities to
In order to run the script used in the tutorial, please make sure that you are ready with the following:
Before installing the library, You first need to install browser drivers, for the browser you want to work with. We will be using Chromium/Chrome for this tutorial, so you need to download the ‘ChromeDriver’, as per your current version of the Chrome browser. To know the current version of the chrome,
$PATH
environment variable using these instructionsNow, let’s install the Zyte SmartProxy Selenium library. Just run the following command using ‘pip’, and it will install all the dependencies along with the native selenium library.
$ python3 -m pip install zyte-smartproxy-selenium
Awesome, now that you are all set and configured. Let’s code!
To demonstrate the integration between Zyte Smart Proxy Manager and Headless browser library - Selenium, we will write a script that will cause our headless browser to take a screenshot of ‘Web Scraping Sandbox’. This sandbox is developed by Zyte for demonstration purposes, feel free to play around with it and experiment with new techniques around web scraping.
Let’s start our Zyte SmartProxy Selenium tutorial with this basic example.
Create a new file with the name sample.py
and open it in your favorite code editor
from zyte_smartproxy_selenium import webdriver
spm_apikey
. spm_apikey
, as mentioned in the prerequisite above.browser = webdriver.Chrome(spm_options={'spm_apikey': '<Smart Proxy Manager API KEY>'})
get
function. browser.get('https://toscrape.com')
save_screenshot
command. In the path argument, give the path to the directory where you want to save the screenshot. The path used in this script will save the screenshot in your current directory which contains sample.py.browser.save_screenshot('screenshot.png')
browser.close()
The final code should look like this:
from zyte_smartproxy_selenium import webdriver browser = webdriver.Chrome(spm_options={'spm_apikey': '<Smart Proxy Manager API KEY>'}) browser.get('https://toscrape.com') browser.save_screenshot('screenshot.png') browser.close()
Execute script on the command line.
$ python3 sample.py
If your script runs successfully, You should be able to see screenshot.png in your project folder.
In addition to easy integration and management of headless capabilities of Selenium with Zyte Smart Proxy Manager, our library provides additional functionalities such as
'block_ads'
argument and set it 'true'
. and the library will block ads defined by block_list
.'
static_bypass'
argument and set it to 'true'.
and the library will skip the Proxy used for static assets defined by '
static_bypass_regex'
or pass false to use the Proxy.Important note: block_ads
and static_bypass
are enabled by default. Some websites may not work with block_ads
and static_bypass
enabled. Try disabling them if you encounter any issues. To know more about these functionalities, read here.
Error:
You may encounter the following exception
Resolution:
Double-check the path to the ChromeDriver, you can set the path as shown in the screenshot. This screenshot follows these instructions.
Using libraries like Zyte SmartProxy Selenium can make it so much easier to work with and can make it so much easier to work with dynamic websites and manage bans and proxies all together in a single piece of code. Later this month, on the 22nd of June, I will be hosting a webinar to demonstrate the true power of this new integration and show you how to make the most out of it. So be sure to join me!
This webinar will be a good opportunity for you to interact with our web scraping experts and clarify your doubts on the fly while doing hands-on integration of these libraries.
If you are new to headless browsers, Selenium and Zyte Smart Proxy Manager. Here are some links to learn more about these topics. I hope you find them useful.