- “Artificial intelligence for the public sector: opportunities and challenges of cross-sector collaboration” (with Marc Esteve and Averill Campion), Philosophical Transactions of the Royal Society A, 2018, Volume 376, Issue 2128. [Pre-print version arXiv:1809.04399]
Public sector organizations are increasingly interested in using data science and artificial intelligence capabilities to deliver policy and generate efficiencies in high uncertainty environments. The long-term success of data science and artificial intelligence (AI) in the public sector relies on effectively embedding it into delivery solutions for policy implementation. However, governments cannot do this integration of AI into public service delivery on their own. The UK Government Industrial Strategy is clear that delivering on the AI grand challenge requires collaboration between universities and public and private sectors. This cross-sectoral collaborative approach is the norm in applied AI centres of excellence around the world. Despite their popularity, cross-sector collaborations entail serious management challenges that hinder their success. In this article we discuss the opportunities and challenges from AI for public sector. Finally, we propose a series of strategies to successfully manage these cross-sectoral collaborations.
- “Big Data & AI – A Transformational Shift for Government: So, What Next for Research?” (with Irina Pencheva and Marc Esteve), Public Policy and Administration, forthcoming.
Big Data and Artificial Intelligence will have a profound transformational impact on governments around the world. Thus, it is important for scholars to provide a useful analysis on the topic to public managers and policymakers. This study offers an in-depth review of the Policy and Administration literature on the role of Big Data and advanced analytics in the public sector. It provides an overview of the key themes in the research field, namely the application and benefits of Big Data throughout the policy process, and challenges to its adoption and the resulting implications for the public sector. It is argued that research on the subject is still nascent and more should be done to ensure that the theory adds real value to practitioners. A critical assessment of the strengths and limitations of the existing literature is developed, and a future research agenda to address these gaps and enrich our understanding on the topic is proposed.
- “The Lancet countdown on health and climate change: from 25 years of inaction to a global transformation for public health” (with Nick Watts et al.), The Lancet, 2018, 391(10120): 581-630. [This article is available free of charge from The Lancet]
Abstract: The Lancet Countdown tracks progress on health and climate change and provides an independent assessment of the health effects of climate change, the implementation of the Paris Agreement, and the health implications of these actions. It follows on from the work of the 2015 Lancet Commission on Health and Climate Change, which concluded that anthropogenic climate change threatens to undermine the past 50 years of gains in public health, and conversely, that a comprehensive response to climate change could be "the greatest global health opportunity of the 21st century".
- “Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings” (with Stefano Gurciullo), IEEE Proceedings of the 2017 International Conference on the Frontiers and Advances in Data Science (FADS), 23-25 October 2017, Xi’an, China: 74-79. [Pre-print version arXiv:1707.03490]
Abstract: Foreign policy analysis has been struggling to find ways to measure policy preferences and paradigm shifts in international political systems. This paper presents a novel, potential solution to this challenge, through the application of a neural word embedding (Word2vec) model on a dataset featuring speeches by heads of state or government in the United Nations General Debate. The paper provides three key contributions based on the output of the Word2vec model. First, it presents a set of policy attention indices, synthesizing the semantic proximity of political speeches to specific policy themes. Second, it introduces country-specific semantic centrality indices, based on topological analyses of countries' semantic positions with respect to each other. Third, it tests the hypothesis that there exists a statistical relation between the semantic content of political speeches and UN voting behavior, falsifying it and suggesting that political speeches contain information of different nature then the one behind voting outcomes. The paper concludes with a discussion of the practical use of its results and consequences for foreign policy analysis, public accountability, and transparency.
- “Database of Parliamentary Speeches in Ireland, 1919-2013” (with Alexander Herzog), IEEE Proceedings of the 2017 International Conference on the Frontiers and Advances in Data Science (FADS), 23-25 October 2017, Xi’an, China: 29-34. [Pre-print version arXiv:1708.04557] [Database available from Harvard Dataverse]
Abstract: We present a database of parliamentary debates that contains the complete record of parliamentary speeches from Dáil Éireann, the lower house and principal chamber of the Irish parliament, from 1919 to 2013. In addition, the database contains background information on all TDs (Teachta Dála, members of parliament), such as their party affiliations, constituencies and office positions. The current version of the database includes close to 4.5 million speeches from 1,178 TDs. The speeches were downloaded from the official parliament website and further processed and parsed. Background information on TDs was collected from the member database of the parliament website. Data on cabinet positions (ministers and junior ministers) was collected from the official website of the government. A record linkage algorithm and human coders were used to match TDs and ministers.
- “Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus” (with Alexander Baturo and Niheer Dasandi), Research & Politics, 2017, 4(2). [Replication materials] [UN General Debate Corpus]
Abstract: Every year at the United Nations, member states deliver statements during the General Debate discussing major issues in world politics. These speeches provide invaluable information on governments' perspectives and preferences on a wide range of issues, but have largely been overlooked in the study of international politics. This paper introduces a new dataset consisting of over 7,300 country statements from 1970-2014. We demonstrate how the UN General Debate Corpus (UNGDC) can be used to derive country positions on different policy dimensions using text analytic methods. The paper provides applications of these estimates, demonstrating the contribution the UNGDC can make to the study of international politics.
Abstract: Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
Abstract: Examining the careers of democratic heads of state and government from 1960–2010, we find that one in every seven turns to the private sector after office. Distinguishing between the factors that attract leaders to business and those that render leaders attractive, we find that the global CEO compensation rates, cultural norms, having served in office in Anglo-Saxon countries as well as their personal background, matter. We also find that certain economic outcomes and policies in office such as economic growth and reduction in state spending are often associated with post-tenure business careers. We do not find evidence, however, that leaders are able to implement policies with future careers in mind, which would in turn raise concerns over accountability.
Abstract: In the absence of public information on the inner workings of the Russian political regime, especially during Medvedev's presidency, outside observers often have to rely on politicians' unguarded comments or subjective analysis. Instead, we turn to quantitative text analysis of political rhetoric. Treating governors as a quasi-expert panel, we argue that policy positions revealed in regional legislative addresses explain how elites perceived the distribution of power between Putin and Medvedev. We find that governors moved from a neutral position in 2009 to a clearly pro-Putin position in 2011, and that policy initiatives advocated by Medvedev all but evaporated from the rhetoric of governors in 2012.
Abstract: The 2011 election in Ireland was one of the most dramatic elections in European post-war history in terms of net electoral volatility. In some respects the election overturned the traditional party system. Yet it was a conservative revolution, one in which the main players remained the same, and the switch in the major government party was merely one in which one centre right party replaced another. Comparing voting behaviour over the last three elections we show that the 2011 election looks much like that of 2002 and 2007. The crisis did not result in the redefinition of the electoral landscape. While we find clear evidence of economic voting at the 2011 election, issue voting remained week. We believe that this is due to the fact that parties have not offered clear policy alternatives to the electorate in the recent past and did not do so in 2011.
Abstract: Recent literature models leadership as a process of communication in which leaders’ rhetorical signals facilitate followers’ co-ordination. While some studies have explored the effects of leadership in experimental settings, there remains a lack of empirical research on the effectiveness of informational tools in real political environments. Using quantitative text analysis of federal and sub-national legislative addresses in Russia, this article empirically demonstrates that followers react to informational signals from leaders. It further theorizes that leaders use a combination of informational and non-informational tools to solve the co-ordination problem. The findings show that a mixture of informational and non- informational tools shapes followers’ strategic calculi. Ignoring non-informational tools — and particularly the interrelationship between informational and non-informational tools — can threaten the internal validity of causal inference in the analysis of leadership effects on co-ordination.
Abstract: We address leadership emergence and the possibility that there is a partially innate predisposition to occupy a leadership role. Employing twin design methods on data from the National Longitudinal Study of Adolescent Health, we estimate the heritability of leadership role occupancy at 24%. Twin studies do not point to specific genes or neurological processes that might be involved. We therefore also conduct association analysis on the available genetic markers. The results show that leadership role occupancy is associated with rs4950, a single nucleotide polymorphism (SNP) residing on a neuronal acetylcholine receptor gene (CHRNB3). We replicate this family-based genetic association result on an independent sample in the Framingham Heart Study. This is the first study to identify a specific genotype associated with the tendency to occupy a leadership position. The results suggest that what determines whether an individual occupies a leadership position is the complex product of genetic and environmental influences, with a particular role for rs4950.
Abstract: Coding non-manifesto documents as if they were genuine policy platforms produced at election time clearly raises serious issues with error when these codings are used in the standard manner to estimate left-right policy positions. In addition to the long term solution of improving the document base of the Manifesto Project identified by Gemenis (2012), we argue that immediate gains in manifesto-based estimates of policy positions can be realised by using the confrontational logit scales from Lowe et al. (2011), which addresses the problems of scale content and scale construction that are exacerbated by but not unique to the problems found in proxy documents.
Abstract: The paper explores a question raised by the 2011 Irish election, which saw an almost unprecedented decline in support for a major governing party after an economic collapse that necessitated an ECB/IMF ‘bailout’. This seems a classic case of ‘economic voting’ in which a government is punished for incompetent performance. How did the government lose this support: gradually, as successive economic indicators appeared negative, or dramatically, following major shocks? The evidence points to losses at two critical junctures. This is consistent with an interpretation of the link between economics and politics that allows for qualitative judgements by voters in assigning credit and blame for economic performance.
Abstract: All methods for analyzing text require the identification of a fundamental unit of analysis. In expert-coded content analysis schemes such as the Comparative Manifesto Project, this unit is the ‘quasi-sentence’: a natural sentence or a part of a sentence judged by the coder to have an independent component of meaning. Because they are subjective constructs identified by individual coders, however, quasi-sentences make text analysis fundamentally unreliable. The justification for quasi-sentences is a supposed gain in coding validity. We show that this justification is unfounded: using quasi-sentences does not produce valuable additional information in characterizing substantive political content. Using natural sentences as text units, by contrast, delivers perfectly reliable unitization with no measurable loss in content validity of the resulting estimates.
Abstract: The Comparative Manifesto Project (CMP) provides the only time series of estimated party policy positions in political science and has been extensively used in a wide variety of applications. Recent work (e.g., Benoit, Laver, and Mikhaylov 2009; Klingemann et al. 2006) focuses on nonsystematic sources of error in these estimates that arise from the text generation process. Our concern here, by contrast, is with error that arises during the text coding process since nearly all manifestos are coded only once by a single coder. First, we discuss reliability and misclassification in the context of hand-coded content analysis methods. Second, we report results of a coding experiment that used trained human coders to code sample manifestos provided by the CMP, allowing us to estimate the reliability of both coders and coding categories. Third, we compare our test codings to the published CMP “gold standard” codings of the test documents to assess accuracy and produce empirical estimates of a misclassification matrix for each coding category. Finally, we demonstrate the effect of coding misclassification on the CMP’s most widely used index, its left–right scale. Our findings indicate that misclassification is a serious and systemic problem with the current CMP data set and coding process, suggesting the CMP scheme should be significantly simplified to address reliability issues.
Abstract: Scholars estimating policy positions from political texts typically code words or sentences and then build left-right policy scales based on the relative frequencies of text units coded into different categories. Here we reexamine such scales and propose a theoretically and linguistically superior alternative based on the logarithm of odds- ratios. We contrast this scale with the current approach of the Comparative Manifesto Project (CMP), showing that our proposed logit scale avoids widely acknowledged flaws in previous approaches. We validate the new scale using independent expert surveys. Using existing CMP data, we show how to estimate more distinct policy dimensions, for more years, than has been possible before, and make this dataset publicly available. Finally, we draw some conclusions about the future design of coding schemes for political texts.
Abstract: The decision to establish direct elections to the European Parliament was intended by many to establish a direct link between the individual citizen and decision making at the European level. Elections were meant to help to establish a common identity among the peoples of Europe, to legitimise policy through the normal electoral processes and provide a public space within which Europeans could exert a more direct control over their collective future. Critics disagreed, arguing that direct elections to the European Parliament would further undermine the sovereignty of member states, and may not deliver on the promise that so many were making on behalf of that process. In particular, some wondered whether elections alone could mobilise European publics to take a much greater interest in European matters, with the possibility of European elections being contested simply on national matters. Evaluating these divergent views, the subject of this article is to review the literature on direct elections to the European Parliament in the context of the role these elections play in governance of the European Union. The seminal work by Reif and Schmitt serves as the starting point of our review. These authors were the first to discuss elections to the European Parliament as second-order national elections. Results of second-order elections are influenced not only by second-order factors, but also by the situation in the first-order arena at the time of the second-order election. In the 30 years and six more sets of European Parliament elections since the publication of their work, the concept has become the dominant one in any academic discussion of European elections. In this article we review that work in order to assess the continuing value of the second-order national election concept today, and to consider some of the more fruitful areas for research which might build on the advance made by Reif and Schmitt. While the concept has proven useful in studies of a range of elections beyond just those for the European Parliament, including those for regional and local assemblies as well as referendums, this review will concentrate solely on EP elections. Concluding that Reif and Schmitt’s characterisation remains broadly valid today, the article allows that while this does not mean there is necessarily a democratic deficit within the EU, there may be changes that could be made to encourage a more effective electoral process.
Abstract: Political text offers extraordinary potential as a source of information about the policy positions of political actors. Despite recent advances in computational text analysis, human interpretative coding of text remains an important source of text-based data, ultimately required to validate more automatic techniques. The profession’s main source of cross-national, time-series data on party policy positions comes from the human interpretative coding of party manifestos by the Comparative Manifesto Project (CMP). Despite widespread use of these data, the uncertainty associated with each point estimate has never been available, undermining the value of the dataset as a scientific resource. We propose a remedy. First, we characterize processes by which CMP data are generated. These include inherently stochastic processes of text authorship, as well as of the parsing and coding of observed text by humans. Second, we simulate these error-generating processes by bootstrapping analyses of coded quasi-sentences. This allows us to estimate precise levels of nonsystematic error for every category and scale reported by the CMP for its entire set of 3,000-plus manifestos. Using our estimates of these errors, we show how to correct biased inferences, in recent prominently published work, derived from statistical analyses of error-contaminated CMP data.
Abstract: Following Easton’s conceptual framework discussed in the introductory chapter, a hierarchical relationship exists between three objects of support: output support, support for institutions, and support for the community. The latter two objects of support are examined in turn in two subsequent chapters on trust in European political institutions and the relationship between citizenship and identity in the European Community. This chapter focuses on the first object of support – support derived from the accrued material benefits of EU membership.
Abstract: In his recent article, Soderlund [Soderlund, P.J., 2003. The significance of structural power resources in the Russian bi-lateral treaty process 1994–1998. Communist and Post-Communist Studies 36, 311–324] tests structural factors that influence the order in which the Russian regions gained a bi-lateral agreement with the federal centre, emphasizing the importance of ethnicity, religion and economy. We replicate his results, and provide an extension where we argue instead that the only significant determinants of the bi-lateral process have been economic issues. Our results are substantiated by an improved methodology that addresses several debatable choices made by the author in the original article.