The Big Local News site now offers an enriched dataset covering tens of thousands of animal-care inspections across America.

In partnership with the Data Liberation Project, our team has developed a software pipeline that, each day, gathers the latest records posted online by the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service.

The USDA is charged with upholding the Animal Welfare Act of 1966, a federal law signed by President Lyndon B. Johnson that set standards for commercial dealers, zoos, research facilities and others.

Inspectors make routine visits to animal-care providers, follow up on complaints and, when they find violations, issue citations. Every visit generates a public record, but the online search tool doesn’t allow the public to download the full dataset. The interface also fails to provide structured versions of key data fields, such the species of animals found at each site, which are buried in separate PDF documents.

The APHIS website

That’s where Big Local News and the Data Liberation Project have stepped in.

Working together, we created an open-source scraping system, released today, that collects all of the 80,000 published inspection reports and extracts data from the agency’s PDFs. It updates daily.

All of the information we’ve gathered is available on GitHub and in the Big Local News portal. So far, we’re able to provide:

Inspections by the agency are routinely cited by local journalists. Earlier this month, USDA records were featured by Erik Sandoval at News6 in Orlando as part of an investigation into the shooting of an escaped rhino.

News6 rhino coverage

With greater access to the underlying data, we’re eager to see what reporters can do. If you’re interested in exploring opportunities, please reach out.

About Big Local News

From its base at Stanford University, Big Local News gathers data, builds tools and collaborates with reporters to produce journalism that makes an impact. Its website at biglocalnews.org offers a free archiving service for journalists to store and share data. Learn more by visiting our about page.

About the Data Liberation Project

The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish and disseminate government datasets of public interest.

The project was launched by Jeremy Singer-Vine, who previously served as the founding data editor for BuzzFeed News.