Data Blending

More and more unstructured, organic data related to human behavior, beliefs, and opinions are being shared online. Because of their availability and richness, these data are an important source of information for social scientists attempting to characterize and predict human and societal dynamics. They give insights that traditional survey data can miss and are less costly to collect. To help facilitate a broader reach of text analytic methods and tools across social, behavioral and economic research, we are creating a community of social scientists across disciplines that will work on different data blending projects to advance their research.

Data Blending: Tackling the Obstacles

In April, 2019 MDI hosted a panel discussion and conversation about Data Blending in the Bioethics Research Library featuring scholars from multiple institutions. This panel discussion and a conversation moderated by Provost Robert Groves provided a unique look into both the promises and challenges of combining traditional and new forms of data, especially organic data and resulted in a white paper outlining some of these findings. For more information on the event and to read the white paper, please visit the event page.

The MDI Data Blending Portal

Given the range of data maintained at the MDI, we have developed a portal that integrates data from different text data sources to create variables that social scientists can use within their traditional research portfolios. This allows researchers to blend knowledge obtained from unstructured text data, including social media data, with more well-structured variables. The portal gives researchers the flexibility to generate variables at different time scales (daily, monthly, annually), for different subsets of data, by using different data matching, data mining and machine learning algorithms. The portal is being used by researchers analyzing the 2016 U.S. Presidential Election and researchers investigating movement patterns in Iraq and Syria.

The Social Science and Social Media Collaborative (S3MC)

Have established models of social and political processes lost their predictive power? Recent events, such as incorrect predictions of the 2016 election outcome and the spread of misinformation, present an opportunity to challenge old models with new sources of data. The abundance of data online and from social media allows social scientists to better understand today’s social and political phenomena. Research teams at the University of Michigan and Georgetown work on five parallel projects that each have specific substantive focus areas but are linked through the use of data science methods, big data resources, and the use of high-performance computing. Explore more about S3MC here.