McCourt School of Public Policy
Massive Data Institute
Featured Research

Building a Library of Past Research for The Environmental Impact Data Collaborative

The Massive Data Institute (MDI) is developing the Environmental Impact Data Collaborative (EIDC) to enable community groups, policymakers, and researchers to discover, access, merge, transform, analyze, visualize, and discuss data in ways that support them to make environmental policy more effective and just.

To make the data more useful, I reviewed past literature to identify the most useful datasets and research in the following key research areas:

  • air quality (air toxics, particulate matter, and traffic density) 
  • energy accessibility (affordability, energy burdens on low-income households)
  • health-related issues (incidence of asthma, lead exposure, and cardiovascular disease)
  • urban sustainability (accessibility to public resources)
  • toxic waste (toxic releases from facilities, proximity to waste facilities)
  • water quality (groundwater quality, drinking water contaminations)

Users of the data collaborative will not only have access to data sets, code templates, and transformation tools, but they will also be able to quickly access past research that used a specific data set. This will support community groups, policymakers, and researchers in understanding both the potential and the limitations of the data they use. 

For example, the Toxic Release Inventory (TRI) database published by the U.S. Environmental Protection Agency documents chemical releases reported by facilities. This data has been used in important research. Bowen et al. (1995) found an association between race, income, and toxic emissions using TRI, which shows the link between pollution and the demographic characteristics of communities. Mastromonaco (2015) demonstrated that listing an existing firm in the TRI decreases housing prices up to 11% within approximately 1 mile, suggesting housing market prices are impacted by information from the TRI.

Linking specific datasets with existing research helps users to learn about methodologies and data characteristics, thereby helping them move rapidly toward generating new research.

The EIDC allows users to search for research literature based on subject matter and datasets. For instance, when typing in “Toxic Release Inventory,” a frequently used database for chemical releases, users are linked to papers that have used this data set.

Written by Ya Gao, an MDI Scholar working on EIDC

Tagged
EIDC
environmental justice