Massive Data Institute Announces Fall 2022 MDI Scholars

With the start of the academic year, the Massive Data Institute (MDI) is pleased to announce our Fall 2022 MDI Scholars Cohort. The MDI Scholars program was launched in 2019 and this year we have our biggest cohort yet with 35+ students, coming from many undergraduate and graduate programs across the University, including Data Science in Public Policy, Data Science and Analytics, Computer Science, Math/Stats, Gender Studies, and Public Policy. This Fall, MDI Scholars will work with faculty and practitioners to help advance research in different areas, including forced migration, environmental science, social media measurement, scientific misinformation, and politics and polarization.

a group photo of all the new fall MDI scholars

Kick-off meeting for the MDI Scholars Fall 2022 Cohort.

Fall 2022 Project Descriptions

Blending Data to Improve Prediction of Forced Migration 
Advisor: Lisa Singh, MDI Director and Professor in the Department of Computer Science
The Massive Data and Displacement (MaDD) project develops approaches for using big data in conjunction with traditional administrative and survey data to understand and eventually forecast mass movement of people who are forced to migrate. Students will work on developing new signals from organic data sources, as well as modeling migration in Ukraine, Venezuela, and Iraq. 

Civil Justice Data Commons 
Advisor: Amy O’Hara, Research Professor in MDI and Executive Director of the Federal Statistical Research Data Center at the McCourt School for Public Policy
MDI Scholars will be helping with the Civil Justice Data Commons project, where we clean and analyze civil court records to study eviction and consumer/medical debt. The insights from this project could improve access to civil justice in local communities, protect the privacy of data subjects, and help the courts become more efficient and accountable. This project will also study the American Community Survey, noting how its sample, data collection, and how privacy protection policies affect data products.

Cost-Effectiveness Analysis of Preventive Healthcare Decisions 
Advisor: Maria Alva, Assistant Research Professor in MDI, McCourt School of Public Policy
This study will help inform the decision-making of industry players relating to preventive interventions by using simple decision tree models and Markov models. Members of the project will conduct literature reviews and synthesize information from the literature on costs and outcomes. This information will be used as input parameters in the models. The models will then be used to estimate the cost-effectiveness of novel healthcare interventions in the areas of diabetes and cancer prevention.

Environmental Impact Data Collaborative 
Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
The Massive Data Institute (MDI) is developing the Environmental Impact Data Collaborative (EIDC) to enable community groups, policymakers, and researchers to discover, access, merge, transform, analyze, visualize, and discuss data in ways that support them to make environmental policy more effective and just.

KL Divergence for Probabilistic Programming 
Advisor: Nathan Wycoff, Post Doctoral Fellow in MDI, McCourt School of Public Policy
This project will develop implementations of KL divergences for TensorFlow Probability which will facilitate the next generation of statistical computing via Variational Bayes as well as general probabilistic programming via optimization. Software like TensorFlow, PyTorch, JAGS and Stan enable complex statistical models by automating routine mathematical calculations such as differentiation and some integrals; however, as statistical methodology advances, gaps are being revealed in their abilities. The work of this project will address these gaps and allow statistical computing to become more powerful. 

Measuring the Spread of Scientific Misinformation
Advisors: Lisa Singh, MDI Director and Professor in the Department of Computer Science & Dewey Murdick, CSET Director
Scientific misinformation can lead to confusion and distrust in the scientific process, critical life-impacting findings, or belief misunderstanding of a scientific community’s agenda.  This work aims to analyze the ways in which scientific research spreads (or does not spread) in the public via news articles and shared by the public in social media posts.

Polarization in Congressional Elections
Advisor: Michael Bailey, Walsh Professor in the Department of Government and McCourt School of Public Policy
Using comprehensive data set of social media and campaign websites for all general and primary election congressional candidates, this team of MDI Scholars is working to identify the ideology of candidates from their texts and to then analyze if and how ideology affects important political outcomes in the primary and general elections.

Prioritization within K-12 School Districts: Parents, Policy Categories, and Predictive Algorithms 
Advisor: Rebecca Johnson, Assistant Professor in McCourt School of Public Policy
K-12 school districts are facing challenging questions in the wake of COVID-19 when providing high-dosage tutoring and deciding which students need those tutors most urgently. The MDI scholars will be using automated text analysis and other techniques on two corpora that help us understand how parents view the fairness of different ways districts judge student need: open-ended responses from a NSF-funded nationally representative survey experiment of K-12 parents and naturally-occurring data from the news media and parent advocacy groups. 

The Use of Online Obituaries as a Tool for Public Health Surveillance: Representativeness and Bias 
Advisor: Maria Alva, Assistant Research Professor in MDI, McCourt School of Public Policy
This study aims to evaluate the feasibility and accuracy of using open-source data for monitoring COVID-19 and estimate demographic-specific excess mortality from all causes in 2020-2021 using official death records and obituary data. This study will use automated data collection from text mining of openly available online obituaries to derive quick predictions of age and sex distribution of death by location in a cost-effective way. 

Understanding the Measurement Properties of Social Media
Advisor: Lisa Singh, MDI Director and Professor in the Department of Computer Science
The use of organic data are becoming more prevalent in scientific research. However, for many social science research questions, the measurement properties of social media data are not sufficiently understood in order to use them to understand individual behavior or beliefs. This research aims to improve our understanding of the measurement properties of social media, specifically Twitter. Students will work on understanding public opinion related to Covid-19, election dynamics, and firearms.


Mark your calendars for December 7, 2022 for the MDI Scholar Showcase where MDI Scholars will present their research projects and current findings to the Georgetown and DC communities.

About the MDI Scholars Program: The MDI Scholars program is an experiential learning opportunity for undergraduate and Master’s students from across the university to work on interdisciplinary research projects with professors and practitioners. These projects connect new methods, new forms of data and/or large-scale computing infrastructure to different societal scale issues, leading to improved public policy.

Written by Elizabeth Ledwith ’24

MDI Scholars