News and article extraction quality is vital for successful analyses and insights into brand awareness, product launches, topic and sentiment research, and keyword trending.
Zyte provides AI-powered data extraction of news and article data at scale, with the highest data quality on the market.
All the essential fields are extracted automatically: headline, article body (text of the article), cleaned HTML of the article, publication date, authors, images - you can find an ever-expanding list in the documentation.
Zyte Automatic Extraction news API supports a comprehensive list of metadata types and with the output delivered directly to your AWS S3 bucket ensuring flexibility to evolve with your needs.
Zyte uses deep learning to extract articles and news data from web pages. It understands what to include and more importantly what to exclude: links to related content, share buttons, ads, and other unnecessary information, leading to 4 times more precise and clean data when compared with top competitors.
The quality of our solution is validated not only by rigorous studies, but we’re also hearing it from our customers as well:
[ { "article": { "headline": "Article headline", "datePublished": "2019-06-19T00:00:00", "datePublishedRaw": "June 19, 2019", "dateModified": "2019-06-21T00:00:00", "dateModifiedRaw": "June 21, 2019", "author": "Article author", "authorsList": [ "Article author" ], "inLanguage": "en", "breadcrumbs": [ { "name": "Level 1", "link": "http://example.com" } ], "mainImage": "http://example.com/image.png", "images": [ "http://example.com/image.png" ], "description": "Article summary", "articleBody": "Article body ...", "articleBodyHtml": "<article><p>Article body ... </p> ... </article>", "articleBodyRaw": "<div id=\"an-article\">Article body ...", "videoUrls": [ "https://example.com/video.mp4" ], "audioUrls": [ "https://example.com/audio.mp3" ], "probability": 0.95, "canonicalUrl": "https://example.com/article/article-about-something", "url": "https://example.com/article?id=24" }, "webPage": { "inLanguages": [ {"code": "en"}, {"code": "es"} ] }, "query": { "id": "1564747029122-9e02a1868d70b7a3", "domain": "example.com", "userQuery": { "pageType": "article", "url": "http://example.com/article?id=24" } }, "algorithmVersion": "20.8.1" } ]