UN General Debate Corpus


GA opening day

Every year since 1947, representatives of UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics.

These statements are akin to the annual legislative state-of-the-union addresses in domestic politics. No other international forum provides all member states with the opportunity to deliver their state-of-the-world addresses in a comparable format.

GWB. Remarks to United Nations General Assembly.

The General Debate provides a unique and invaluable source of information on the preferences of states on the widest range of policy issues. Issues of strategic importance for the world.

This new dataset, the UN General Debate Corpus (UNGDC), introduces the corpus of texts of General Debate statements from 1970 (Session 25) to 2017 (Session 72). For a full description of the UNGDC, please see the accompanying paper.

The most up-to-date version of the United Nations General Debate Corpus data is available to download from Dataverse or GitHub.

A browsing and visualization tool that allows users to explore individual documents and the topics covered, including the top words that characterise topics, the evolution of topics over time, word distributions across topics, the underlying digitised texts of speeches, and the source PDFs is available here.

When using the data, please cite: Alexander Baturo, Niheer Dasandi, and Slava Jankin Mikhaylov, Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus Research & Politics, 2017, Volume 4, Issue 2.