Fall 2023 MDI Workshops

Each semester MDI sponsors technical workshops to provide faculty, students, and staff the opportunity to be exposed to new methods, programming paradigms, and technologies. Learn more and RSVP for the Fall 2023 MDI Workshops below (please note there is a separate registration for each month).

For more information on MDI events, check out our events calendar. Any questions, please reach out to

September 2023: Text as Data: Measurement and Inference Issues with Text Data with Dr. Le Bao

When: Monday, September 18 & Tuesday, September 19, 2023, 4:00 pm – 5:30 pm (each day)
Where: Main Campus — please note this event is only in-person

Abstract: Text as data has become a transformative approach of producing insights about human behavior and society. How do we use text as data? This workshop provides an overview of different applications of text as data. From constructing variables using text to employing Large Language Models (LLM) to scale variables, we will discuss the strengths and weaknesses of using text as data in the contexts of measurement, statistical models, and causal inference. This workshop serves as an introductory session for the other fall MDI Data Workshops that will focus on specific machine learning and natural language processing (NLP) techniques. Basic familiarity with programming and statistical methods is expected. No NLP background is required. 

October 2023: Advanced Models Using Text with Dr. Helge Marahrens

When: Monday, October 23 & Tuesday, October 24, 2023, 4:00-5:30pm (each day)
Where: Main Campus — please note this event is only in-person

Abstract: Textual data are abundant, but to extract meaningful insights from them we need strong tools. Even simple tasks such as identifying the most significant words in a text require careful pre-processing and modeling. In this workshop, we cover several advanced models, including identifying the most impactful words, categorizing documents by their thematic content, and predicting emotions in text. Our agenda includes a deep dive into “fightin’ words,” a survey of topic and topic-noise models, and an exploration of word embeddings. To conclude, we will touch upon the role of neural networks using the example of sentiment analysis and emotion detection.

November 2023: Cutting Large Language Models Down to Size with Dr. Nathan Wycoff

When: Monday, November 13 & Tuesday, November 14, 2023, 4:00-5:30pm (each day)
Where: Main Campus — please note this event is only in-person

Abstract: Large Language Models such as ChatGPT have captured the zeitgeist and portend a new wave of digital disruption across all sectors. In this session, we will start with a close look at what large language models are and what makes these transformer-based architectures different from previous probabilistic language models. We will accomplish this by building our own tiny language model. Then we will discuss strategies for exploiting existing LLMs as is, using them as sophisticated text embeddings and prompt answerers. We will conclude by fine-tuning LLMs for a specific task, highlighting their ability to obtain higher predictive accuracy for text-based application than traditional machine learning methods. 

Any questions, please reach out to

About the Massive Data Institute (MDI): At Georgetown’s McCourt School of Public Policy, the Massive Data Institute is an interdisciplinary research institute that connects experts across computer science, data science, public health, public policy, and social science to tackle societal scale issues and impact public policy in a way that improves people’s lives through responsible evidence-based research. For more information on MDI, please visit To learn about additional upcoming MDI workshops and events, please visit: