Every day there is more data available, more technology and more tools at the disposal of journalists. Yet, accessibility to these assets, especially for local newsrooms, is a significant challenge.

The Story Discovery at Scale conference, held at Stanford University on March 28-29, 2024, gathered journalists, researchers and technologists to foster collaboration to develop a sustainable and accessible infrastructure for every journalist, regardless of their data skills.

Attendees showcased different solutions to ease the accessibility to AI technology and computational methods to newsrooms, especially local newsrooms with fewer resources.

“We need to process it [data] and manage it in a way that will help us leverage the news, help us leverage the stories that we want to tell,” said Cheryl Phillips, the founder of Big Local News and a Stanford lecturer.

Organized by Big Local News and sponsored by the Knight Foundation, the conference featured presentations on cutting-edge techniques for story discovery in journalism, including automation, generative AI tools, and smarter ways to identify news tips or gather public records.

The challenges often revolve around how local newsrooms can effectively use data and tools with limited resources and a lack of expertise, said Marc Lavallee, director of technology product and strategy for the journalism program at the Knight Foundation.

Lavallee and Phillips shared one approach to address that challenge, introducing the DART Matrix with four main features: Data, Algorithms, Reporting Recipes and Tools. With this matrix, data expertise in a newsroom serves as a guide for filling a newsroom’s needs. For example, in newsrooms lacking data experts, emphasis is placed on news detection alerts, training and consulting. In newsrooms with a team of data specialists, the focus shifts to supporting data feeds and algorithmic tools.

DART Matrix The DART Matrix.

In all cases, from local newspapers with zero data experts to those with a technical team, data-driven journalism is a vital tool for uncovering stories, enriching storytelling and forging meaningful collaborations, said Phillips and Lavallee.

Yet, for many smaller newsrooms, unlocking the potential of these data sources remains a challenge. Whether covering routine local events or delving into investigative journalism, the task of effectively harnessing data presents a hurdle.

This is where collaboration presents a promising solution: by sharing data, providing journalist training, implementing automatic alerts for key issues such as campaign donations or mass layoffs, and developing user-friendly tools, smaller newsrooms can enhance their reporting capabilities.

“In the next 10 years we will see a new ecosystem of independent and interconnected news media, a mix of nonprofit and for profit. So each serves a distinct audience, but works in consortium with others at the local, state, and national level,” Phillips said.

At the conference, the attendees delved into ways to support and reinforce local journalism through shared infrastructure and editorial partnerships.

Enterprise data reporting is widely appreciated but it consumes a significant amount of time because of extensive work required to build scrapers and parse pipelines. This can take up to 90 percent of the effort, said Léopold Mebazaa, a fellow from the Brown Institute at Columbia University.

“Time that’s not spent on actual analysis, especially if the people doing these pipelines don’t necessarily have the technical know-how,” Mebazaa said. “And the corollary of this is that it makes newsrooms less ambitious.”

AI is a potential solution to this bottleneck, Mebazaa said. He introduced a tool that he is developing to simplify web scraping. The goal is to reduce coding hours and make web scraping more accessible to journalists with limited technical expertise.

Simon Willison, creator of Datasette, a tool for exploring and publishing data, ​​and the co-founder of the Django web framework, also explored practical applications of AI at the conference. Willison showed how AI can be used for multiple reporting tasks, such as extracting data from images, enriching existing datasets or finding story ideas.

The demo by Willison evoked murmurs of amazement and has since been shared on Youtube and by other data journalism platforms, including Hacker News, a social news website run by startup incubator Y Combinator.

If AI plays an essential role in facilitating certain journalistic tasks, automation also plays a key role by streamlining the integration of data, particularly in smaller and independent newsrooms, through tools like Slack-based automated news tips that alert journalists to potential stories.

Story Discovery at Scale Panel Aron Pilhofer, Candice Mays, Heather Bryant and Katherine Ann Rowlands during one of the panels.

For instance, Big Local News’ team mentioned several of its current bots, available to its local partners, which send real-time alerts whenever a mass layoff occurs or an animal welfare inspection is carried out. In other words, the information flows directly to the reporters without having to search for it, find it and analyze it.

During the conference, speakers highlighted the challenge of leveraging existing data, which often demands substantial investment. They presented various approaches to use commonly available local datasets like police stops, public meeting agendas, property tax records, and restaurant inspections.

The Associated Press, in collaboration with the University of Missouri, introduced an AI-powered tool to help reporters to uncover newsworthy stories based on restaurant health inspection reports. The tool identifies anomalies and provides valuable story tips for further investigation.

Alana Rocha, who leads the Rural News Network at the Institute for Nonprofit News (INN), highlighted a recent collaborative reporting effort that shows the power of sharing the data, and of adding in specific training.

“We tried our first data recipe based collaboration in the fall publishing a series on Biden’s nursing home staffing minimums and what that would look like in each of the communities,” Rocha said.

USA TODAY Investigative Data Reporter Jayme Fraser gathered and cleaned payroll data from the Centers for Medicare and Medicaid Services which regulates nursing homes that accept public health insurance. The data showed that most nursing homes were understaffed.

Together with Big Local News, Fraser built a story recipe for local journalists and organized a webinar to train them on how to accurately interpret the numbers and tell local stories.

“We opened it up to all 70 plus newsrooms, they attended a webinar and seven turned a story. It was a huge success,” Rocha said, recounting some of the feedback: “‘I just love that they [Big Local News] checked my work, that I was able to make sure that I wasn’t putting bad information out there.’”

Attendees also talked about how journalism organizations need to work together on the business side, advocating for sharing insights and strategies for sustainable revenue generation within journalism.

Other ideas explored at the conference included collecting and organizing news archives, compiling public records from investigations, and standardizing databases for profit generation. The focus should be building collaborative projects aimed at minimizing redundant work across newsrooms and creating new sources of income, attendees suggested.

Discussions centered around constructing a federated system to support and promote local journalism. Over the course of the two-day conference, attendees had the chance to join working groups to tackle key issues, including identifying necessary tools for local journalists, determining optimal training opportunities, and brainstorming a multitude of short and long term solutions to fill these needs.

“As we think about this common infrastructure and all the things that can be built on top of this, that’s one desired outcome of this [conference],” Lavallee said. “But also, you’re finding more connections peer to peer as well and realizing that you’re working on similar things. And that there’s opportunities to work together and reduce duplication of efforts as well.”

“So hopefully for Story Discovery at Scale three,” Lavalle continued, “probably in this room next year, we feel that it is even more enmeshed and networked and federated as a system.”

About Big Local News

From its base at Stanford University, Big Local News gathers data, builds tools and collaborates with reporters to produce journalism that makes an impact. Its website at biglocalnews.org offers a free archiving service for journalists to store and share data.

Learn more by visiting our About page.