UN General Debate Corpus

  • Every year since 1947, representatives of UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics. These statements are akin to the annual legislative state-of-the-union addresses in domestic politics. No other international forum provides all member states with the opportunity to deliver their state-of-the-world addresses in a comparable format.The General Debate provides a unique and invaluable source of information on the preferences of states on the widest range of policy issues. This new dataset, the UN General Debate Corpus (UNGDC), introduces the corpus of texts of General Debate statements from 1970 (Session 25) to 2016 (Session 71). For a full description of the UNGDC, please see the accompanying paper.


  • The most up-to-date version of the UNGDC data is available to download from the Harvard Dataverse “United Nations General Debate Corpus“.
  • A browsing and visualization tool that allows users to explore individual documents and the topics covered, including the top words that characterise topics, the evolution of topics over time, word distributions across topics, the underlying digitised texts of speeches, and the source PDFs is available here.
  • When using the data, please cite: Alexander Baturo, Niheer Dasandi, and Slava Mikhaylov, “Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus” Research & Politics, 2017.