The Massive Data Institute at the McCourt School of Public Policy is excited to welcome the Spring 2024 Cohort of the MDI Scholars Program. This semester, thirty Georgetown undergraduate and graduate students from five Schools will be working alongside Georgetown Faculty on seventeen research projects on topics ranging from environmental justice, social media bias, civic tech, election dynamics, and education.
The MDI Scholars Program was launched in 2019 through MDI as an experiential learning opportunity for undergraduate and Master’s students to work alongside researchers and practitioners and engage in interdisciplinary data science and public policy research across Georgetown.
Spring 2024 MDI Scholar Research Teams
(listed in alphabetical order of advisor)
Environmental Impact Data Collaborative
Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
Description: The Massive Data Institute (MDI) is developing the Environmental Impact Data Collaborative (EIDC) to enable community groups, policymakers, and researchers to discover, access, merge, transform, analyze, visualize, and discuss data in ways that support them to make environmental policy more effective and just. Learn more here: https://mdi.georgetown.edu/eidc/
MDI Scholars: Aastha Jha ’25, Himangshu Kumar ’24, Katharyn Loweth ’25, Madhvi Malhotra ’24
Using LLMs to Assess Ideology
Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
Description: Assessing the ideology of politicians is a useful way to summarize their political values. In this project, we build off the work of Wu (2023) to use LLMs to scale members of Congress based on their website or Twitter content. The goal is to build a tool that can compare ideology in many contexts based on what people say or write.
MDI Scholar: Zhiqiang Ji ’24
Unpacking the Distribution of Environmental Pollution
Advisor: Le Bao, Postdoctoral Fellow at MDI
Description: As a part of the Environmental Impact Data Collaborative (EIDC), this project investigates the spatial distribution of environmental issues, focusing particularly on how environmental impacts are unevenly distributed based on community demographics and political factors. This involves mapping environmental pollution sources with census information and election data. The study aims to reveal patterns in environmental issues and understand how different factors influence the distribution of environmental impacts.
MDI Scholar: Raunak Advani ’24
Text Analysis of Legal Agreements between the US Dept of Ed and School Districts
Advisor: NaLette Brodnax, Assistant Professor in the McCourt School of Public Policy
Description: Black students face significantly higher school discipline rates than their white peers. In investigating these disparities, this project considers a “top-down” perspective on school discipline, where schools create disciplinary policy regimes that shape student behavior. We characterize these policy regimes based on schools’ use of carceral ideology—the propensity to solve problems through surveillance, coercion, confinement, and correction. An accumulation of carceral practices in schools may alter the education, beliefs, and identities of poor and racially minoritized students in harmful ways. This project entails developing a school-level measure of carceral ideology based on computational analyses of text from approximately 30,000 school handbooks.
MDI Scholar: JaeHo Bahng ’25
Investment Returns and Distribution Policies of Non-Profit Endowment Funds
Advisor: Sandeep Dahiya, Akkaway Professor of Entrepreneurship in the McDonough School of Business
Description: The project is designed to efficiently process and condense downloaded IRS Form 990 XML files (required to be submitted by non-profit organizations), estimated at 20GB per annum, into a singular spreadsheet that extracts and presents only the specified elements. This involves iterating through each file within the zipped archives, selectively retaining relevant data while discarding the rest. The harvested data forms the basis for a research study on how well the endowments of non-profit organizations have done.
MDI Scholar: Yunhan Zhang ’24
Learning Lessons from Incident Reporting
Advisor: Robin Dillon-Merrill, Professor and the Operations and Analytics Area Chair in the McDonough School of Business
Description: Both public agencies and private industry are learning from prior incidents using incident reporting systems to improve safety performance and reduce accidents. Analyzing data from incident reporting systems can help an organization identify emerging risks, and this approach has been advocated for decades to prevent larger failures, but trends are difficult to find when there are thousands of reports per year. This project is currently examining data from commercial aviation and coal mining.
MDI Scholars: Brian Holland ’24, Gabriel Soto ’25
Exploring Behavioral Patterns of Immigrants to Support Lifelong Learning
Advisor: Qiwei Britt He, Associate Professor in Data Science and Analytics Program and Director of AI-Measurement and Data Science Lab; Katherine Donato, Donald G. Herzberg Professor of International Migration, Sonneborn Chair for Interdisciplinary Collaboration
Description: International migration flows are changing the labor market in many countries. The distance of culture and languages between immigrants’ original countries and destination countries may impact their choices in jobs, social life and developing a sense of belonging with the community and beyond. This project will focus on exploring the behavioral patterns and problem-solving strategies of immigrants from 17 countries to better understand how the original background and new culture environment influence their cognitive process.
MDI Scholars: Kefan Yu ’24
Assessing the Impact of Civic Tech to Reduce Administrative Burdens and Increase Trust
Advisors: Sebastian Jilke, Provost’s Distinguished Associate Professor at the McCourt School of Public Policy; Donald Moynihan, McCourt Chair and Professor at the McCourt School of Public Policy; Pamela Herd, Professor at the McCourt School of Public Policy
Description: The project aims to analyze data from a collaboration with Code-for-America and the Better Government Lab on the efficacy of Civic Tech interventions, and how they impact access to social safety net programs. The project includes analyses of various RCTs conducted with state government, as well as survey data assessing the perceived burdens that beneficiaries experience when trying to (re-)enroll in social safety net programs.
MDI Scholars: Sunaina Kathpalia ’25, Holt Cochran ’25
DistrictView: A Data Pipeline to Study School Boards and K-12 Inequality
Advisor: Rebecca Johnson, Assistant Professor at the McCourt School of Public Policy, Affiliate with MDI and Sociology
Description: Decisions in U.S. school boards impact K-12 inequality, from spending priorities to COVID-19 closures to curricular content. Applying approaches from the study of “digital trace data” to U.S. school districts as the object of study, we describe the data infrastructure for DistrictView, a dataset of N ~ 121,941 school board meeting transcripts and videos (as of July 2023) from N = 1,579 U.S. school districts, representing about 1 in 8 districts nationwide. We describe the data scraping using the YouTube API and manual validation, processing using text mining to find relevant videos, and representativeness of the districts that publish videos.
MDI Scholars: Corrina Calanoc ’24, Maggie Sullivan ’24
Compiling Contextual Information to Inform Head Start Programs
Advisors: Amy O’Hara, Research Professor at MDI and Executive Director of the Georgetown Federal Statistical Research Data Center at the McCourt School for Public Policy; Gabriel Taylor, Research Specialist at MDI
Description: The National Head Start Association (NHSA) is a non-profit organization in the United States that advocates for and supports the Head Start program. Program directors need data about their communities, ranging from information about potentially eligible families, area labor market conditions affect labor supply and teacher retention, housing and transportation issues that could affect attendance, as well as characteristics about the environment, such as tree cover and air quality. This project indexes data sources and proposes methods to make the data accessible to Head Start program managers.
MDI Scholar: Amanda Hao ’26
Collaborator: Rosemary Rhodes MPP-E ’24, Richmond Fellow
Optimizing Match Rates in a Secure Query System for Federal Earnings Data
Advisors: Amy O’Hara, Research Professor at MDI and Executive Director of the Georgetown Federal Statistical Research Data Center at the McCourt School for Public Policy; Nathan Wycoff, Data Science Fellow at MDI
Description: MDI is exploring methods to securely and accurately match state administrative data to federal earnings data in an automated query system. We consider how data from state programs can be matched to earnings records using exact and probabilistic matching. We consider how to overcome the challenges faced when input files have outdated information in name, sex, and address fields. We note where additional data elements could improve the accuracy of linkages.
MDI Scholar: Alicia Gopal ’25
Forensic Use of Genetic Data
Advisors: Elissa Redmiles, Clare Luce Boothe Assistant Professor in the Department of Computer Science; Lisa Singh, MDI Director, Professor in the Department of Computer Science and McCourt School of Public Policy, Sonneborn Chair for Interdisciplinary Collaboration; Ioannis Ziogas, Assistant Teaching Professor at the McCourt School of Public Policy and an Assistant Research Professor at MDI
Description: In this project we seek to investigate the technologies used for genetic data analysis in forensic contexts. Through technical analysis, we seek to identify potential risks and limitations of existing and emerging genetic technology used in forensic analysis.
MDI Scholars: Roy Hwang ’25, Julia Nonnenkamp ’24
Long-Term Trends in the U.S. Federal Workforce
Advisor: Mark Richardson, Assistant Professor in the Department of Government
Description: This project involves wrangling data on 10 million careers in the civil service spanning hundreds of agencies and 40 years to create a data set useful for studying long-term trends in the U.S. federal workforce. Once cleaned and formatted, the data will be placed in a data commons to provide scholars and policymakers a single interface for accessing these data and the tools to put them to use improving our understanding of recruitment, retention, promotion, and internal labor markets.
MDI Scholars: Haiyang Chen ’24, Linlin Wang ’25
Private Distributed Data Collection to Inform Policymaking
Advisors: Micah Sherr, Callahan Family Professor of Computer Science in the Department of Computer Science; Lisa Singh, MDI Director, Professor in the Department of Computer Science and McCourt School of Public Policy, Sonneborn Chair for Interdisciplinary Collaboration
Description: Useful information is oftentimes distributed among multiple independent data owners. While such data may be sensitive (e.g., exposing individuals’ particular health information or Internet usage), the ability to securely compute aggregate statistics over this data can be critical for understanding high-level patterns that can inform technical development and policymaking. This project aims to facilitate the secure collection of distributed data, while protecting the privacy of individuals in the distributed dataset.
MDI Scholars: Jason Yi ’26, Alivia Castor ’25
Blending Data to Improve Prediction of Forced Migration
Advisor: Lisa Singh, MDI Director, Professor in the Department of Computer Science and McCourt School of Public Policy, Sonneborn Chair for Interdisciplinary Collaboration; Katherine Donato, Donald G. Herzberg Professor of International Migration, Sonneborn Chair for Interdisciplinary Collaboration; Ali Arab, Associate Professor in the Department of Mathematics and Statistics, Sonneborn Chair for Interdisciplinary Collaboration
Description: In 2022, the number of forcibly displaced people reached a record high of over 110 million according to the United Nations High Commissioner for Refugees (UNHCR). In an effort to respond efficiently and effectively to conflicts, sentiment expressed on social media has been used to predict movement. As such, the objective of our research is to improve predictive models by considering the application of more nuanced emotion indicators. In our analysis, we considered 3 recent displacement events across 3 languages: Ukraine 2022–2023 (Ukrainian), Sudan 2023 (Arabic), and Venezuela 2014–2023 (Spanish).
MDI Scholars: Kate Liggio ’24, Bernardo Medeiros ’24, Jenny Park ’24, Rich Pihlstrom ’24
French Racism and Misrepresentation on Social Media
Advisor: Lisa Singh, MDI Director, Professor in the Department of Computer Science and McCourt School of Public Policy, Sonneborn Chair for Interdisciplinary Collaboration; Andrew Sobanet, Professor in the Department of French and Francophone Studies; Rokhaya Diallo, Journalist, Writer, Filmmaker, Georgetown University Gender+ Justice Initiative Researcher in Residence
Description: Our research examines the dynamics of online interactions, behaviors, and perceptions, with a specific focus on the interplay of gender, race and identity in the digital realm. As social media platforms increasingly serve as arenas for cultural expression and societal discourse, our exploration delves into the representations and experiences related to identity in these virtual communities. Ultimately, we want to contribute to an understanding of how gender and race are negotiated and represented in the evolving landscape of online discourse. Our goal is to contribute to an understanding of how gender and race are negotiated and represented in the evolving landscape of online discourse.
MDI Scholar: Xinyu Li ’24
Collaborators: Amy Li ’26, Noah Aire ’24
Digital Transformation of Public Services
Advisors: Evagelia Tavoulareas, Managing Chair at the Georgetown Initiative on Tech + Society, Adjunct Professor at the McCourt School for Public Policy; Rajesh Veeraraghavan, Associate Professor of Science Technology and International Affairs (STIA) Program at the School of Foreign Service
Description: This project explores how approaches to “digital transformation” for the delivery of public services differ based on the system of government. Through the compilation of case studies from the United States, United Kingdom, India and Greece, this project seeks to document the actors involved in shaping the digital turn, historical and contingent factors that led to digitization, resistance and safeguards, and key turning points in the timeline. These case studies will form the basis of a comparative study of institutional histories, as well as provide a blueprint for studying digital transformation in other countries.
MDI Scholars: Jahnavi Mukul ’24, Liel Zino ’24
- MDI Scholars