Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the United Nations Development Program Fragments of Impact project.
There is surprisingly little known about agenda setting for international development in the United Nations (UN) despite it having a significant influence on the process and outcomes of development efforts. This paper addresses this shortcoming using a novel approach that applies natural language processing techniques to countries' annual statements in the UN General Debate. Every year UN member states deliver statements during the General Debate on their governments' perspective on major issues in world politics. These speeches provide invaluable information on state preferences on a wide range of issues, including international development, but have largely been overlooked in the study of global politics. This paper identifies the main international development topics that states raise in these speeches between 1970 and 2016, and examine the country-specific drivers of international development rhetoric.
We present a database of parliamentary debates that contains the complete record of parliamentary speeches from Dáil Éireann, the lower house and principal chamber of the Irish parliament, from 1919 to 2013. In addition, the database contains background information on all TDs (Teachta Dála, members of parliament), such as their party affiliations, constituencies and office positions. The current version of the database includes close to 4.5 million speeches from 1,178 TDs. The speeches were downloaded from the official parliament website and further processed and parsed. Background information on TDs was collected from the member database of the parliament website. Data on cabinet positions (ministers and junior ministers) was collected from the official website of the government. A record linkage algorithm and human coders were used to match TDs and ministers.
- “Topology Analysis of International Networks Based on Debates in the United Nations” (with Stefano Gurciullo), arXiv:1707.09491 [cs.CL], 29 July 2017.
In complex, high dimensional and unstructured data it is often difficult to extract meaningful patterns. This is especially the case when dealing with textual data. Recent studies in machine learning, information theory and network science have developed several novel instruments to extract the semantics of unstructured data, and harness it to build a network of relations. Such approaches serve as an efficient tool for dimensionality reduction and pattern detection. This paper applies semantic network science to extract ideological proximity in the international arena, by focusing on the data from General Debates in the UN General Assembly on the topics of high salience to international community. UN General Debate corpus (UNGDC) covers all high-level debates in the UN General Assembly from 1970 to 2014, covering all UN member states. The research proceeds in three main steps. First, Latent Dirichlet Allocation (LDA) is used to extract the topics of the UN speeches, and therefore semantic information. Each country is then assigned a vector specifying the exposure to each of the topics identified. This intermediate output is then used in to construct a network of countries based on information theoretical metrics where the links capture similar vectorial patterns in the topic distributions. Topology of the networks is then analyzed through network properties like density, path length and clustering. Finally, we identify specific topological features of our networks using the map equation framework to detect communities in our networks of countries.
- “Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings” (with Stefano Gurciullo), arXiv:1707.03490 [cs.CL], 11 July 2017.