Scaling up scrapers with Prefect and Google Cloud

Yesterday I made an apperance on the Twitch stream Prefect Live to share how our team at Big Local News is using Python and Prefect pipelining tools to scale up how we scrape data from the web.

I reference a couple tutorials we’ve made for others. One is this post showing how to kickstart a Prefect agent with Kubernetes in Google Cloud.

Another is this post unpacking how to authorize your GitHub repository to automatically release Docker images to Google Artifact Registry. The same trick can work for private Python packages.

A third is this template repository with all the fundamentals needed to develop and deploy a Python data-processing routine for Prefect pipelines. It’s what we’re using to stand up our scrapers. Maybe it can work for you too.