PINGDOM_CHECK

Web Scraping Copilot is live. Build Scrapy spiders 3× faster, free in VS Code.

Install Now
  • Data Services
  • Pricing
  • Login
    Sign up👋 Contact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Subscribe to our Blog

The latest from Valdir Stumm Junior

Blog post thumbnail
How To

Deploy Your Scrapy Spiders From GitHub | Scrapy Cloud

April 19, 2017
Blog post thumbnail
Use case

Web Scraping Price Monitoring

November 24, 2016
Blog post thumbnail
How To

How to use XPath to extract web data

October 27, 2016
Blog post thumbnail
How To

How To Run Python Scripts In Scrapy Cloud

September 28, 2016
Blog post thumbnail
How To

How To Deploy Custom Docker Images For Your Web Crawlers

September 8, 2016
Blog post thumbnail
Open Source

How to crawl the web with Scrapy

August 25, 2016
Blog post thumbnail
Product Update

Introducing Scrapy Cloud with Python 3 support

August 17, 2016
Blog post thumbnail
Open Source

Meet Parsel: The Selector Library Behind Scrapy

July 28, 2016
Blog post thumbnail
Open Source

Scrapy Tips from the Pros (July 2016): Tips for Effective Scraping

July 20, 2016
Blog post thumbnail
Open Source

Scrapely: The Brains Behind Portia Spider

July 7, 2016
Blog post thumbnail
Product Update

Introducing Portia2Code: Transforming Portia Projects into Scrapy Spiders

June 29, 2016
Blog post thumbnail
How To

Scraping Infinite Scrolling Pages

June 22, 2016
Blog post thumbnail
Product Update

Data Extraction With Scrapy And Python 3

May 25, 2016
Blog post thumbnail
How To

How To Debug Your Scrapy Spiders

May 18, 2016
Blog post thumbnail
Product Update

Scrapy + MonkeyLearn: Textual Analysis Of Web Data

May 11, 2016
Blog post thumbnail
Product Update

Introducing Scrapy Cloud 2.0

May 4, 2016
Blog post thumbnail
Open Source

Scraping Websites Based On ViewStates With Scrapy

April 20, 2016
Blog post thumbnail
Open Source

Scrapy Tips from the Pros (March 2016 Edition): Mastering the Craft

March 23, 2016
Blog post thumbnail
Use case

How Web Scraping Reveals Lobbying and Corruption in Peru

March 9, 2016
Blog post thumbnail
Product Update

Splash 2.0: Powering Web Rendering with QT 5 and Python 3

February 29, 2016
Blog post thumbnail
Product Update

Migrate Your Kimono Projects to Portia: Smooth Transition Guide

February 26, 2016
Blog post thumbnail
Open Source

Scrapy Tips from the Pros (February 2016 Edition): Continuous Learning

February 24, 2016
Blog post thumbnail
Open Source

Portia: The Open-source Alternative To Kimono Labs

February 17, 2016
Blog post thumbnail
How To

Scrapy Tips from the Pros (Part 1): Expert Advice for Better Scraping

January 19, 2016
Blog post thumbnail
Open Source

Parse Natural Language Dates With Dateparser

November 9, 2015
Blog post thumbnail
Open Source

Aduana: Link Analysis to Crawl the Web at Scale

September 29, 2015
Blog post thumbnail
Open Source

Scrapy on the Road to Python 3 Support: Modernizing the Framework

August 19, 2015
Blog post thumbnail
Product Update

Introducing JavaScript Support for Portia: Expanding Web Scraping Capabilities

August 19, 2015
Blog post thumbnail
How To

Link Analysis Algorithms Explained

June 19, 2015
Blog post thumbnail
Announcement

EuroPython Gold Sponsor

June 12, 2015
Blog post thumbnail
Open Source

Aduana: Link Analysis With Frontera | Zyte

June 8, 2015
Blog post thumbnail
Open Source

Skinfer: Inferring JSON Schemas Made Easy

March 5, 2015
Blog post thumbnail
How To

XPath Tips From The Web Scraping Trenches

July 17, 2014
Blog post thumbnail
Product Update

Introducing Data Reviews: Unlocking Insights with Zyte

June 27, 2014
Blog post thumbnail
How To

Extract Schema.Org Microdata with Scrapy Selectors

June 18, 2014

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026
Up until now, your deployment process using Scrapy Cloud has probably been something like this: code and test your spiders locally, commit and push your changes to a GitHub repository, and finally deploy them to Scrapy Cloud using shub deploy.
Computers are great at repetitive tasks. They don't get distracted, bored, or tired.
Let's start with what is XPath? XPath is a powerful language that is often used for scraping the web. It allows you to select nodes or compute values from an XML or HTML document and is actually one of the languages that you can use to extract web data using Scrapy.
You can deploy, run, and maintain control over your Scrapy spiders in Scrapy Cloud, our production environment.
What if you could have complete control over your environment? Your crawling environment, that is...
The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners.
It’s the end of an era. Python 2 is on its way out with only a few security and bug fixes forthcoming from now until its official retirement in 2020.
We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel.
Fasten your seat belts, ladies and gentlemen: Scrapy 1.1 with Python 3 support is officially out! After a couple of months of hard work and four release candidates, this is the first official Scrapy release to support Python 3.
Welcome to Scrapy Tips from the Pros! Every month we release a few tricks and hacks to help speed up your web scraping and data extraction activities.
We recently announced our integration with MonkeyLearn, bringing machine learning to Scrapy Cloud. MonkeyLearn offers numerous text analysis services via its API. Since there are so many uses to this platform addon, we’re launching a series of tutorials to help get you started.
Welcome to the April Edition of Scrapy Tips from the Pros. Each month we’ll release a few tricks and hacks that we’ve developed to help make your Scrapy workflow go more smoothly.
Update: With the release of the Panama Papers, a reliable means of exposing corruption and the methods of money laundering and tax evasion are now even more important. Web scraping provides an avenue to find, collate, and organize data without relying on information leaks.
We’re pleased to announce that Splash 2.0 is officially live after many months of hard work.
Heads up, Kimono Labs users!
Today, we are releasing a tool to help you migrate your Kimono projects to Portia.
Welcome to the February Edition of Scrapy Tips from the Pros. Each month we’ll release a few tips and hacks that we’ve developed to help make your Scrapy workflow go more smoothly.
Imagine your business depended heavily on a third party tool and one day that company decided to shut down its service with only 2 weeks notice. That, unfortunately, is what happened to users of Kimono Labs yesterday.
Today we released the latest version of Portia bringing with it the ability to crawl pages that require JavaScript. To celebrate this release we are making Splash available as a free trial to all Portia users so you can try it out with your projects.
When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.
33 Zytans from 15 countries will be meeting (most of them, for the first time) in Bilbao, for what is going to be our largest get-together event so far.
It's not uncommon to need to crawl a large number of unfamiliar websites when gathering content. Page ranking algorithms are incredibly useful in these scenarios as it can be tricky to determine which pages are relevant to the content you're looking for.
Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have.
In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors.
One of the things that takes more time when building a spider is reviewing the scraped data and making sure it conforms to the requirements and expectations of your client or team.
We have released an lxml-based version of this code as an open-source library called extruct. The Source code is on Github, and the package is available on PyPI. Enjoy!
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior
Valdir Stumm Junior