Effortless Web Scraping with Cloudflare Workers
Some time ago I had the need to grab information from a website to display some visitor numbers onto a widget powered by Scriptable. Unfortunately, that website didn’t have an API for its numbers, it only displayed it in a table on the website. This is where a web scraper is needed.
There are endless ways and articles about web scrapers, so I want to share a little tool I used for that job. I do like it because: It’s easy to use, fast, and free. All you need is an account at Cloudflare.
The tool I want to highlight is web.scraper.workers.dev by Adam Schwartz. Give it a URL and a CSS selector and you’re done! If the website ever goes down, you can grab the code on GitHub and host it yourself.
If for example you’d use the URL example.com
and the CSS selector h1
you’d get the result:
{ "result": { "h1": ["Example Domain"] }}
You can also generate a permalink to it:
https://web.scraper.workers.dev/?url=example.com&selector=h1&scrape=text&pretty=true
I’m now using such a permalink as my API for those website numbers. I find using Cloudflare Workers for web scraping to be quite approachable and Adam’s tool makes it even easier.
Oh, and in case you’re wondering how I use the table data, here’s a small code playground to show it:
const incoming = "Name Max Current PlaceA 144 51 PlaceB 50 25 PlaceC 200 130" function chunk(arr: Array<string>, size = 3): Array<Array<string>> { const bulks: Array<Array<string>> = [] for (let i = 0; i < Math.ceil(arr.length / size); i++) { bulks.push(arr.slice(i * size, (i + 1) * size)) } return bulks } function parseString(input: string, size = 3): ParseStringResponse { const arr = input.split(" ") const columns = arr.slice(0, size).map((i) => ({ heading: i, property: i.toLowerCase() })) const rawData = arr.slice(size, arr.length) const chunkedData = chunk(rawData) const data = chunkedData.map((item) => ({ name: item[0], max: item[1], current: item[2], })) return { columns, data } } export const output = parseString(incoming) type ParseStringResponse = { columns: Array<{ heading: string property: string }> data: Array<{ name: string max: string current: string }> }