Subscribe to our Blog
The latest from Valdir Stumm Junior

How To
Deploy Your Scrapy Spiders From GitHub | Scrapy Cloud
Up until now, your deployment process using Scrapy Cloud has probably been something like this: code and test your spiders locally, commit and push your changes to a GitHub repository, and finally deploy them to Scrapy Cloud using shub deploy.
April 19, 2017

Use case
Web Scraping Price Monitoring
Computers are great at repetitive tasks. They don't get distracted, bored, or tired.
November 24, 2016

How To
How to use XPath to extract web data
Let's start with what is XPath? XPath is a powerful language that is often used for scraping the web. It allows you to select nodes or compute values from an XML or HTML document and is actually one of the languages that you can use to extract web data using Scrapy.
October 27, 2016

How To
How To Run Python Scripts In Scrapy Cloud
You can deploy, run, and maintain control over your Scrapy spiders in Scrapy Cloud, our production environment.
September 28, 2016

How To
How To Deploy Custom Docker Images For Your Web Crawlers
What if you could have complete control over your environment? Your crawling environment, that is...
September 8, 2016

Open Source
How to crawl the web with Scrapy
The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners.
August 25, 2016

Product Update
Introducing Scrapy Cloud with Python 3 support
It’s the end of an era. Python 2 is on its way out with only a few security and bug fixes forthcoming from now until its official retirement in 2020.
August 17, 2016

Open Source
Meet Parsel: The Selector Library Behind Scrapy
We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel.
July 28, 2016

Product Update
Introducing Portia2Code: Transforming Portia Projects into Scrapy Spiders
June 29, 2016

Product Update
Data Extraction With Scrapy And Python 3
Fasten your seat belts, ladies and gentlemen: Scrapy 1.1 with Python 3 support is officially out! After a couple of months of hard work and four release candidates, this is the first official Scrapy release to support Python 3.
May 25, 2016

How To
How To Debug Your Scrapy Spiders
Welcome to Scrapy Tips from the Pros! Every month we release a few tricks and hacks to help speed up your web scraping and data extraction activities.
May 18, 2016

Product Update
Scrapy + MonkeyLearn: Textual Analysis Of Web Data
We recently announced our integration with MonkeyLearn, bringing machine learning to Scrapy Cloud. MonkeyLearn offers numerous text analysis services via its API. Since there are so many uses to this platform addon, we’re launching a series of tutorials to help get you started.
May 11, 2016

Open Source
Scraping Websites Based On ViewStates With Scrapy
Welcome to the April Edition of Scrapy Tips from the Pros. Each month we’ll release a few tricks and hacks that we’ve developed to help make your Scrapy workflow go more smoothly.
April 20, 2016

Use case
How Web Scraping Reveals Lobbying and Corruption in Peru
Update: With the release of the Panama Papers, a reliable means of exposing corruption and the methods of money laundering and tax evasion are now even more important. Web scraping provides an avenue to find, collate, and organize data without relying on information leaks.
March 9, 2016

Product Update
Splash 2.0: Powering Web Rendering with QT 5 and Python 3
We’re pleased to announce that Splash 2.0 is officially live after many months of hard work.
February 29, 2016

Product Update
Migrate Your Kimono Projects to Portia: Smooth Transition Guide
Heads up, Kimono Labs users!
Today, we are releasing a tool to help you migrate your Kimono projects to Portia.
Today, we are releasing a tool to help you migrate your Kimono projects to Portia.
February 26, 2016

Open Source
Scrapy Tips from the Pros (February 2016 Edition): Continuous Learning
Welcome to the February Edition of Scrapy Tips from the Pros. Each month we’ll release a few tips and hacks that we’ve developed to help make your Scrapy workflow go more smoothly.
February 24, 2016

Open Source
Portia: The Open-source Alternative To Kimono Labs
Imagine your business depended heavily on a third party tool and one day that company decided to shut down its service with only 2 weeks notice. That, unfortunately, is what happened to users of Kimono Labs yesterday.
February 17, 2016

Product Update
Introducing JavaScript Support for Portia: Expanding Web Scraping Capabilities
Today we released the latest version of Portia bringing with it the ability to crawl pages that require JavaScript. To celebrate this release we are making Splash available as a free trial to all Portia users so you can try it out with your projects.
August 19, 2015

How To
Link Analysis Algorithms Explained
When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.
June 19, 2015

Announcement
EuroPython Gold Sponsor
33 Zytans from 15 countries will be meeting (most of them, for the first time) in Bilbao, for what is going to be our largest get-together event so far.
June 12, 2015

Open Source
Aduana: Link Analysis With Frontera | Zyte
It's not uncommon to need to crawl a large number of unfamiliar websites when gathering content. Page ranking algorithms are incredibly useful in these scenarios as it can be tricky to determine which pages are relevant to the content you're looking for.
June 8, 2015

Open Source
Skinfer: Inferring JSON Schemas Made Easy
Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have.
March 5, 2015

How To
XPath Tips From The Web Scraping Trenches
In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors.
July 17, 2014

Product Update
Introducing Data Reviews: Unlocking Insights with Zyte
One of the things that takes more time when building a spider is reviewing the scraped data and making sure it conforms to the requirements and expectations of your client or team.
June 27, 2014


