Uses of MDI’s Infrastructure

Georgetown faculty use the MDI compute infrastructure to advance research in many different ways. Here we highlight some of the projects that have used our infrastructure in recent years.

Mark Richardson, Assistant Professor in the Department of Government, is researching the long-term effects of politics on the livelihoods of civil servants across four decades of federal personnel data, comprising nearly 330 million person-quarters, to evaluate the relative influence of merit, civil service rules, and changes in presidential administrations on civil servants’ career trajectories. Their findings will improve our collective understanding of the formation of human capital in the federal government and identify opportunities for reform.

Leslie Ries, Associate Professor in the Department of Biology, is using the Ries Lab of Butterfly Informatics combines environmental, physiological, species trait and large-scale distributional data to understand how human activity is driving global shifts in biodiversity and whether we can balance preserving ecosystem function with the needs of a growing human population.

Jennifer Tobin, Associate Professor in the McCourt School of Public Policy, created a searchable database of parliamentary debates over time (for different periods) for Argentina, Belize, Chile, Costa Rica, Mexico, Peru, and Uruguay with the help of MDI. Some of the most widely studied questions in Political Science focus on what politicians say and how they say it. Scholars who study the United States and Great Britain have access to the daily records of the proceedings and debates in the US Congress and the UK Parliament. In developing countries, access to parliamentary proceedings is much more limited. In Latin America, many countries allow public access, but the ability to search the proceedings tends to be quite onerous. The Latin American Parliamentary Debates Database is a first step towards creating the equivalent, for Latin America, of the US Congressional Record and the UK’s Official Report of Parliamentary Debates. To begin with, we have scraped the available parliamentary debates for all Spanish- and English-speaking countries in Latin America.

Shareen Joshi, Associate Professor in the Edmund A. Walsh School of Foreign Service, is drawing on data from more than 10,000 World Bank projects to investigate whether the backing of multilateral institutions amplifies or mitigates the perverse incentive structure that often leads expensive investments to fail in achieving their ultimate purpose. This data includes many details of the projects, including organization, finance, longevity, number of partners and contributions made. Joshi’s team uses statistical models to explore the country-level factors that determine the size and structure of the PPPs themselves, and the impact of the PPPs on long-term private investment.

Stipica Mudrazija, Adjunct Professor in the McCourt School of Public Policy, is examining the relationship between digital propaganda and change in attitudes toward political candidates over time. Starting from a set of Reddit users that the company identified as engaging in “coordinated, inauthentic activity” – or propaganda – Mudrazija’s team has obtained several thousand users who were exposed at varying levels. By then gathering those users’ comment histories for the year 2016, they can classify statements that explicitly mention parties or candidates as positive or negative and then look for changing patterns over time. If exposure to propaganda had an impact on these opinions, it should show up as a valid predictor of change over time.

Nitin Vaidya, Professor in the Department of Computer Science, is using the MDI’s servers to perform distributed computation experiments on optimization tasks to test and compare different fault-tolerant algorithms. Vaidya’s team plans to experiment on privacy models of distributed optimization in the future.

Amy O’Hara, Research Professor in the Massive Data Institute, will obtain sensititve data from L2 Political and Infutor. This commercially available data will contain PII which can be deduplicated and linked across variables. O’Hara’s team will review these linkage and deduplication methodologies and develop testing criteria, then compare the methods for publication.

Joel Simmons, Associate Professor in the Edmund A. Walsh School of Foreign Policy, is researching whether it is possible to identify election fraud based on the modality of the election data; the intuition being that the joint density of voter turnout and the winner’s vote share will have a unimodal and approximately normal distribution in unproblematic elections, but will exhibit specific forms of bi- or multimodality in fraudulent elections. This intuition is formalized by Mebane (2016) in a finite mixture likelihood that can be estimated with an EM algorithm, with the maximization being handled by a combination of evolutionary processes and quasi-Newtonian methods (Mebane and Sekhon 2004).

Shweta Bansal, Provost’s Distinguished Associate Professor in the Department of Biology, will use MDI servers for data storage, access, and extraction for large-scale healthcare datasets. Bansal’s team will also conduct exploratory data analytics on the server before extracting aggregated data for in-depth analysis. Bansal’s team uses statistical and mathematical model computations for problems of health behaviors and infectious disease transmission. More information on her research can be found at

Sarah Bargal, Assistant Professor in the Department of Computer Science, is developing novel algorithms to train different network architectures (using MDI provided GPU infrastructure) to better understand and address the challenge of occlusions in deep network image classification.

Rebecca Johnson, Assistant Professor in the McCourt School of Public Policy, is partnering with the relevant government entities to focus on a specific provision within an agreement between the U.S. Department of Housing and Urban Development (HUD), the U.S. Attorney’s Office for the Southern District of New York (SDNY within DOJ), the New York City Housing Authority (NYCHA), and New York City (the City) signed to help NYCHA significantly improve housing conditions for its residents. Johnson’s team will use select results from this work to illustrate two challenges with the role of data in civil rights remedies. First, the tradeoff between direct measure of issues versus more easily-obtained proxies. Second, the shifting population served by NYCHA, simulating how different degrees of resident turnover may bias estimates over time. Ultimately, using these two factors to illustrate how they could bias measurements of improvement and how to reduce this bias.

Krista Ruffini, Assistant Professor in the McCourt School of Public Policy, is using data from HHS and household servers to evaluate how increases in the minimum wage affect the quality of care provided in nursing home facilities.

Neel Sukhatme, Professor of Law at the Georgetown University Law Center, is using consumer history and demographic data from the State of Florida to determine the long-term impact of criminal sanctions on formally incarcerated individuals and their families.

Scott Ganz, Associate Teaching Professor of Strategy in the McDonough School of Business, is developing a new algorithm for evaluating u-shaped / inverse u-shaped relationships in data, which are exceedingly common hypothesis tests in management and strategy. Unlike existing methods, which are based on estimating moments from basis splines, Ganz’s approach relies on applying shape-constraints to a piecewise-linear approximation of the data.

The Social Science and Social Media Collaborative (S3MC) is an interdisciplinary collaboration between the University of Michigan and Georgetown University with the goal of harnessing the opportunity to use new data to gain a better understanding of social and political phenomena. This collaborative has five broad sub-projects (Methodology, Misinformation, Political Communication, Parenting, and Economic Indicators) which have specific substantive focus areas, but are linked through the use of data science methods, big data resources, and the use of high-performance computing.

GU Faculty involved: Lisa Singh, Leticia Bode, Rebecca Ryan, and Jonathan Ladd

This project seeks to dramatically improve the data available to better understand gun-related outcomes and, in turn, help support rigorous gun policy research through the novel use of social media data, mostly Twitter. Initial findings from this research suggest that Twitter data may hold value for tracking dynamics in gun-related outcomes in certain locations, and future directions for this project include expanding this work to learn more about so-called “ghost guns,” or firearms that are homemade.

GU Faculty Involved: Lisa Singh and Carole Roan Gresenz

Using interdisciplinary methods, this project evaluates popular online social movements with the goal of broadly understanding how they can lead to lasting change. This project compares Twitter data, protest data, and analysis of local, state, and national policy changes related to the #BlackLivesMatter and #MeToo movements to see what online movements are successful in driving on-the-ground change.

GU Faculty Involved: Lisa Singh, Jamillah Williams, Naomy Mezey