Price scraping is something that you need to do if you want to extract pricing data from websites. It might look easy and just a minor technical detail that needs to be handled but in reality, if you don’t know the best way to get those price values from the HTMLs, it can be a headache over time.
In this article, first, I will show you some examples where price scraping is essential for business success. We will then learn how to use our open-source python library, price-parser. This library was made specifically to make it easy to extract clean price information from an e-commerce site.
If you’re at the beginning of your web scraping journey, here are some examples to give you inspiration on how price-scraping can help you.
The eCommerce world has become very noisy and competitive. Companies are searching for ways to raise margins, cut expenses, and ultimately display prices that increase their overall revenue the most. This is where competitor price monitoring comes in. There’s no real online retail seller that doesn’t monitor competitor prices on a daily basis in one way or another. Price scraping is a big part of this task - extracting real-time data from millions of price points on a regular basis.
Another huge use case of price scraping is brand monitoring. When your brand is visible on multiple platforms online, maintaining price compliance for your product is as important as keeping an eye on the competitor’s pricing. You would ideally want to scrape the product pages that display your products (i.e. your resellers) as well as the competitor’s product data to make sure your pricing strategy is up to date. This would help you establish a competitive price and keep the pricing policy violators in check.
You would also want to scrape prices if you do any kind of e-commerce market research. Whether it’s a one-time project or an ongoing one, if you scrape multiple web pages with different price strings it’s important to find a solution for effectively extracting pricing data.
At Zyte (formerly Scrapinghub) we’ve developed our own open-source library for price scraping. You can find it on GitHub, as price-parser. It is capable of extracting price and currency values from raw text strings.
You want to use this library for two important reasons:
pip install price-parser
>>> price_string = response.css(‘span.price-tag’).get() price_string "22,90 €"
2. Use this library to clean up the string
Normally, at this point, you would need to write a custom function to get the numeric value from the string. Using regex or some python code. However, with price-parser, you just need to import the library and use the same function every time:
>>> from price_parser import Price >>> price = Price.fromstring(price_string)
Then we can retrieve the amount and currency values using attributes:
>>> price.amount Decimal('22.90') # numeric price amount >>> price.amount_text # price amount, as appears in the string '22,90' >>> price.amount_float # price amount as float, not Decimal 22.9 >>> Price.currency # currency symbol, as appears in the string '€'
The library has been tested with 900+ real-world price strings, see some of the supported cases here.