Posters

Fall 2023 Project Descriptions

Unpacking the Distribution of Environmental Pollution

Advisor: Le Bao, Post Doctoral Fellow in MDI, McCourt School of Public Policy
Description: As a part of the Environmental Impact Data Collaborative (EIDC), this project investigates the spatial distribution of environmental pollution, with a specific focus on the geographical concentration of toxic release facilities and air pollution in relation to community demographics and political factors. This involves mapping environmental pollution sources with census information and election data. The aim of the study is to uncover patterns in environmental pollution and investigate how various factors, including community characteristics and political influences, affect the distributional dynamics of environmental pollutants.
Poster: Exploring the Patterns of Toxic Release: Towards Environmental Equity, Raunak Advani ’24


Environmental Impact Data Collaborative

Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
Description: The Massive Data Institute (MDI) is developing the Environmental Impact Data Collaborative (EIDC) to enable community groups, policymakers, and researchers to discover, access, merge, transform, analyze, visualize, and discuss data in ways that support them to make environmental policy more effective and just. Learn more here: https://mdi.georgetown.edu/eidc/
Poster: Environmental Justice Data Solution: A Holistic Approach, Madhvi Malhotra ’24, Minh Quach ’24, and Fanni Varhelyi ’24
Poster: Automated pipeline to extract wetland damage data from US Army Corps of Engineers notices using LLMs, Himangshu Kumar ’24
Poster: Spatial and demographic patterns of building-level emissions in Washington D.C, Himangshu Kumar ’24 and Anthony Moubarak ’24


Using LLMs to assess ideology

Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
Description: Assessing the ideology of politicians is a useful way to summarize their political values. In this project we build off the work of Wu (2023) to use LLMs to scale members of Congress based on their website or Twitter content. The goal is to build a tool that can compare ideology in many contexts based on what people say or write.
Poster: Deciphering Lawmakers’ Ideologies: Integrating Machine Learning and Large Language Models, Zhiqiang Ji ’24


Learning Lessons from Incident Reporting

Advisor: Robin Dillon-Merrill, Professor and the Operations and Analytics Area Chair in the McDonough School of Business
Description: Both public agencies and private industry are learning from prior incidents using incident reporting systems to improve safety performance and reduce accidents. Analyzing data from incident reporting systems can help an organization identify emerging risks, and this approach has been advocated for decades to prevent larger failures, but trends are difficult to find when there are thousands of reports per year. This project is currently examining data from commercial aviation and coal mining.
Poster: Learning Lessons from Incident Reporting, Brian Holland ’24


Tracking Cross-National Score Trajectory in Large-Scale Educational Assessments

Advisor: Qiwei Britt He, Associate Professor in Data Science and Analytics Program and Director of AI-Measurement and Data Science Lab
Description: International large-scale educational assessments provide a comprehensive data source with a view to enabling countries to improve their education policies and outcomes. The investigation on cross-national score tendency is an important approach to identify similar growth patterns by countries, thus supporting the comparative study on education policy with new evidence. In this project, we employed sequence mining methods to track the scores, ranks and performance disparities between boys and girls in the Programme for International Student Assessment (PISA), the biggest global assessment organized by OECD, for 15-year old students. The meaningful trajectory patterns are extracted and compared across 37 countries in three core subjects (math, reading and science) during the past two decades.
Poster: Tracking Cross-National Score Trajectory in PISA with Dynamic Time Warping Methods, Kefan Yu ’24


DistrictView: A Data Pipeline to Study School Boards and K-12 Inequality

Advisor: Rebecca Johnson, Assistant Professor, McCourt School of Public Policy, Affiliate with Massive Data Institute and Sociology
Description: Decisions in U.S. school boards impact K-12 inequality, from spending priorities to COVID-19 closures to curricular content. Applying approaches from the study of “digital trace data” to U.S. school districts as the object of study, we describe the data infrastructure for DistrictView, a dataset of N ~ 121,941 school board meeting transcripts and videos (as of July 2023) from N = 1,579 U.S. school districts, representing about 1 in 8 districts nationwide. We describe the data scraping using the YouTube API and manual validation, processing using text mining to find relevant videos, and representativeness of the districts that publish videos.
Poster: DistrictView: Building a First-of-Its-Kind Database of U.S. School Board Meeting Transcripts, Corrina Calanoc ’24 and Maggie Sullivan ’24


Evaluating the Reach & Efficacy of Head Start Locations

Advisors: Amy O’Hara, Research Professor in MDI and Executive Director of the Federal Statistical Research Data Center at the McCourt School for Public Policy; Gabriel Taylor, Research Specialist, MDI 
Description: The National Head Start Association (NHSA) is a non-profit organization in the United States that advocates for and supports the Head Start program. In 2022, the NHSA released a report describing the top barriers to access to Head Start programs for children and families. This project uses the mapping of geospatial data to visualize and assess some of the identified barriers.
Poster: Evaluating the Reach & Efficacy of Head Start Locations, Amanda Hao ’26


Exploring Methods for Privacy-Protecting Administrative Record Linkage

Advisors: Amy O’Hara, Research Professor in MDI and Executive Director of the Federal Statistical Research Data Center at the McCourt School for Public Policy; Nathan Wycoff, Data Science Fellow at MDI
Description: Administrative data hold key insights into the efficacy of educational and social programs, but are under strong privacy protection. In this poster, we conduct simulations to determine the efficacy linking tax data and social programs under various noise assumptions in order to determine estimates of linkage rates for real world data.
Poster: Exploring Methods for Privacy-Protecting Administrative Record Linkage, Alicia Gopal ’25


Forensic Use of Genetic Data

Advisors: Elissa Redmiles, Clare Luce Boothe Assistant Professor; Lisa Singh, MDI Director and Professor in the Department of Computer Science; Ioannis Ziogas, Assistant Teaching Professor at the McCourt School of Public Policy and Assistant Research Professor at MDI
Description: In this project we seek to investigate the technologies used for genetic data analysis in forensic contexts. Through technical analysis, we seek to identify potential risks and limitations of existing and emerging genetic technology used in forensic analysis.


Long-Term Trends in the U.S. Federal Workforce

Advisor: Mark D. Richardson, Assistant Professor in the Department of Government
Description: This project involves wrangling data on 10 million careers in the civil service spanning hundreds of agencies and 40 years to create a data set useful for studying long-term trends in the U.S. federal workforce. Once cleaned and formatted, the data will be placed in a data commons to provide scholars and policymakers a single interface for accessing these data and the tools to put them to use improving our understanding of recruitment, retention, promotion, and internal labor markets.
Poster: The Graying of the Federal Workforce, Haiyang Chen ’24 and Linlin Wang ’25


Real Time School District Staffing and Enrollment Trends

Advisor: Marguerite Roza, Edunomics Lab Director, and Research Professor in the McCourt School of Public Policy
Description: School district finances are changing rapidly in the wake of the pandemic. To navigate the unstable finances, school district leaders desperately need real time trend data on major financial factors, including staffing and enrollments. This project uses data scraping tools and custom visualizations to build an automatic website that accesses localized trend data, producing visualizations for use by school district leaders, journalists, and policymakers.
Poster: Using Selenium to automate education finance data pipelines, Andrew Lee ’24


Private distributed data collection to inform policymaking

Advisors: Micah Sherr, Callahan Family Professor of Computer Science; Lisa Singh, MDI Director and Professor in the Department of Computer Science and McCourt School of Public Policy
Description: Useful information is oftentimes distributed among multiple independent data owners. While such data may be sensitive (e.g., exposing individuals’ particular health information or Internet usage), the ability to securely compute aggregate statistics over this data can be critical for understanding high-level patterns that can inform technical development and policymaking. This project aims to facilitate the secure collection of distributed data, while protecting the privacy of individuals in the distributed dataset.
Poster: A Data Collection Protocol that Protects Individual Privacy and is Resilient to Adversarial Data, Jamie Spoeri ’25 and Jason Yi ’26


Blending Data to Improve Prediction of Forced Migration

Advisor: Lisa Singh, MDI Director and Professor in the Department of Computer Science and McCourt School of Public Policy
Description: In 2022, the number of forcibly displaced people reached a record high of over 110 million according to the United Nations High Commissioner for Refugees (UNHCR). In an effort to respond efficiently and effectively to conflicts, sentiment expressed on social media has been used to predict movement. As such, the objective of our research is to improve predictive models by considering the application of more nuanced emotion indicators. In our analysis, we considered 3 recent displacement events across 3 languages: Ukraine 2022–2023 (Ukrainian), Sudan 2023 (Arabic), and Venezuela 2014–2023 (Spanish).
Poster: Sentiment and Emotion as Tools for Forced Migration Prediction?, Kate Liggio ’24, Bernardo Medeiros ’24, and Rich Pihlstrom ’24


French Racism and Misrepresentation on Social Media

Advisor: Lisa Singh, MDI Director and Professor in the Department of Computer Science and McCourt School of Public Policy
Description: Our research examines the dynamics of online interactions, behaviors, and perceptions, with a specific focus on the interplay of gender, race and identity in the digital realm. As social media platforms increasingly serve as arenas for
cultural expression and societal discourse, our exploration delves into the representations and experiences related to identity in these virtual communities. Ultimately, we want to contribute to an understanding of how gender and race are negotiated and represented in the evolving landscape of online discourse.
Poster: Measuring French Racism and Misrepresentation on Social Media to Better Understand Online Perception, Xinyu Li ’24