Datathon for Democracy: Combining Public Data and Agentic AI to Analyze Speech on Social Media

Written by Miranda M. Yarowsky, SFS ’26, MDI Program Assistant

As part of Tech & Society Week, the Massive Data Institute, Bright Data, and IBM sponsored the Datathon for Democracy on March 29th, a one day event that provided students with access to social media data and agentic AI tools to develop strategies and policy relevant research about online information pollution. The datathon brought together public web data and student-led analysis to explore how emerging technologies can be used not only to understand online harms, but also to develop practical responses to them.

This year’s datathon centered on three major challenges: Deepfake Detection, Online Toxicity / Harassment, and Counterspeech and Prebunking Communications. Together, these themes reflect some of the largest concerns related to information integrity on social media.

Using Data and AI to Make Sense of Digital Threats

Throughout the day, students worked with Bright Data’s public web data from the 2025–2026 election cycles and IBM’s watsonx AI tools to conduct deep data exploration, culminating in a presentation where groups of 4 translated their findings into an analysis with policy implications.

As Joy Stoffer of IBM’s AI Productivity team explained, the datathon challenged students to think about “how we can automate and better define how things are being discerned,” making complex online information “easier to understand and to see and to find.” To Stoffer, this event illustrated how these tools are not just useful for processing large amounts of data, but also useful in helping decision-makers identify and respond to emerging digital threats.

However, the datathon also showed that AI is not simply a tool for processing more data. On the contrary, AI can be used to help researchers and decision-makers make sense of complex information. IBM’s platform helped students process data to determine the types of communication and information being shared by state election officials and political leaders. With this information AI can identify patterns, assess information more systematically, and respond more effectively to emerging digital threats, just to name a few.

Moving From Data Analysis to Real-World Solutions

As the day progressed, participants were pushed to move beyond identifying trends in the data and think more deeply about what those patterns could mean in practice. For instance, “How can policymakers and civil society organizations communicate in ways that build resilience against misinformation?” That means looking more closely at what kinds of messages actually reach the general public.

Understanding What Messaging Reaches People

For many of the teams, these questions pointed to a broader concern of how institutions communicate with the general public in an increasingly complex information environment. This made the connection between technical analysis and thoughtful public communication a central theme throughout the Datathon. Reflecting on the work students presented, Lisa Singh, Director of the Massive Data Institute, emphasized that the data and methods participants were using could provide deeper insight into what kinds of messaging are reaching people and what is not. That question was especially important for teams focused on counterspeech and prebunking, who looked at how misleading content spreads and how institutions might respond more effectively.

For Ibadat Jarg ’26, an MDI Scholar, that meant thinking more critically about how government communication works in practice. “I think government messaging needs to be more self-reflective and critical,” Jarg said, “because currently it’s just about putting out messages rather than evaluating how they are (interpreted)….” His comment underscored a broader takeaway from the day, which is that information integrity depends not just on putting out messages, but on understanding whether those messages are trustworthy and reaching the people they are intended for.

Pairing Technical Tools With Public Education

In addition to their analytical work, students developed evidence-based policy recommendations that connected technical findings to actionable interventions for policymakers. One recurring theme across team presentations, Stoffer noted, was the importance of public education. “We can do a lot more to head off [these challenges]”, she said, “but much of that work begins with “making sure that people understand what they’re seeing” and recognize that “a lot of stuff may not be real.” Her observation emphasized a key insight from the Datathon, which is that technological tools on their own are essential, but they are most effective when paired with efforts to strengthen digital literacy.

The day ended with a networking dinner, offering students a chance to reflect on the work they had done and share experiences. By bringing together public-interest, data analysis and policy thinking, the Datathon for Democracy helped generate ideas to help address some of the most pressing challenges facing the public sphere today.

Left: (Rany Shalit), Right: (Dr. Lisa Singh)

Tagged: MDI News