Methodology

Table of contents:

Unit of analysis

The unit of analysis in the UCDP GED is the ‘event’ – an instance of fatal organized violence. Specifically an event is defined as:

The incidence of the use of armed force by an organized actor against another organized actor, or against civilians, resulting in at least 1 direct death in either the best, low or high estimate categories at a specific location and for a specific temporal duration (Sundberg and Melander, 2013)

Each instance of organized violence that meets these criteria is recorded as a single observation in the dataset and constitutes a unit of analysis. From this definition of an event, it follows that the dataset contains only events in which it was possible to deduce fatality estimates; incidents where it is unclear how many, or if there were any, fatalities are not included. These criteria are adapted from the UCDP’s base definition of what constitutes an armed conflict/non-state conflict/ one-sided violence, but with the removal of the 25 deaths criterion (in the calendar year) so as to place the definition on the level of the individual event.

The dataset contains events for all dyads and actors that, per calendar year, surpass the 25 deaths threshold for inclusion – the same threshold that is applied across all of the UCDP’s data on organized violence.

Data Collection

The foundation for the data collection is based on search strings run through the Dow Jones Factiva aggregator. This events data search retrieves all articles within specified parameters, in this case, all news reports which contain information about individuals killed or injured. The search is done globally, with “intelligent indexing” used for further filtering when feasible. Which media sources are consulted varies, depending on how extensively they cover the specific conflict and/or region. As a minimum, UCDP uses at least one of the global newswires (AFP, Reuters, Xinhua, or Agencia EFE) in addition to BBC Monitoring. BBC Monitoring was specifically chosen because it supplies text of local news reports, thus providing UCDP with a mixture of reports from international news bureaus and from local sources, including non-English ones. Other local and specialized news-sources are also added to further improve coverage (e.g. Radio Okapi for the DRC). We consider media reports to be an indispensable resource for identifying and documenting conflict as they provide the only viable basis for annual, global coverage. Every year, this results in a pool of approximately 50,000 news reports which are then evaluated by human coders. Because many of these duplicate coverage of the same incident or report on incidents which fall outside of UCDP’s inclusion criteria, approximately 10,000-12,000 events are coded annually.

UCDP also consults reports and data from non-governmental organizations (NGO) and international organizations (like the UN), case studies, truth commission reports, historical archives and other sources of information. Many of these sources are local, such as INSEC (Informal Service Sector) in Nepal or SOHR (Syrian Observatory of Human Rights) in Syria. For the period 2013-2016, approximately 20% of the events in the UCDP GED are coded using non-media sources.

Source Evaluation

Both the independence and transparency of the origins of the sources is crucial. Each source is judged according to the context in which it was published, that is, according to the potential interests of the source in misrepresenting violent events. Since most information comes from secondary sources, the project attempts to trace reports back to the primary source (e.g. witness, warring party, journalist, etc.) in order to assess its reliability.

The fatality numbers given here are based on publicly accessible sources. Due to the lack of available information in many conflict zones, it is quite likely that there are more fatalities than given in the best estimate, but it is very unlikely that there are fewer. The fatality estimate is thus best interpreted as creating a baseline, and users should keep in mind that the precision of the numbers belies the uncertainty of the estimates.

Our data are likely to provide seemingly low estimates for two reasons. The first has to do with definitions. A number of factors can preclude a potential conflict event from inclusion in the UCDP GED: if it is unclear which actor was involved, or the status of that actor (e.g. level of organization); unclear status of the incompatibility (for intrastate and interstate conflict events); uncertainty about whether fatalities occurred; too little information to exclude the possibility of double-counting; or event descriptions which do not provide sufficient context to meet coding requirements.

The second reason relates to the reliability of other estimates. For many conflicts, commonly cited estimates are repeated so frequently as to become unquestioningly accepted as truth. In many cases the origin of these estimates are unknown or come from one of the warring parties; even where this information is available, the methodology and definitional guidelines used in generating the estimates are rarely transparent. UCDP employs clear criteria and using a systematic approach to data collection in order to increase transparency and reliability.

UCDP can only include events which are reported publicly. However, not all conflict events becomes public in the first place. All publicly available sources of conflict data—media as well as local and international organizations—are likely to be non-random with respect to which events they report. There is a nascent research field concerned with the measuring journalistic coverage and unpacking its implications for the study of organized violence (Baum & Zhukov, 2015; Croicu & Kreutz 2017; Davenport & Ball, 2002; Drakos & Gofas, 2006; Galtung & Ruge, 1965; Gohdes & Price, 2013; Price & Ball, 2014; Weidmann 2016). Less attention has been paid to alternative sources of conflict information, such as NGOs and IOs, which must also make strategic decisions on how to allot limited resources. Ultimately, it is impossible to know how many conflict events go undocumented and are therefore missing from conflict datasets. Note, however, that the risk set for this uncertainty is circumscribed to conflict zones: we have high confidence that most countries truly have no or few violent events (e.g. Sweden, Norway), but in countries with ongoing armed conflict there is uncertainty regarding coverage. We encourage end-users to familiarize themselves with these reporting processes, and to consider the implications of this issue on their use of UCDP data.

Coding

Each news report/alternative source is individually read, and any event that contains information on organized violence is inputted into the system. The sources are coded by staff which have extensive case knowledge, and vetted by project leaders with decades of experience working with the UCDP.

The data contained within the UCDP GED are those events of organized violence that, in their aggregated form, constitute the UCDP’s country-year datasets on (1) state-based armed conflict (Gleditsch et al., 2002), (2) non-state conflict (Sundberg, Eck & Kreutz, 2012), and (3) one-sided violence (Eck & Hultman, 2007). The UCDP GED thus contains information on three types of organized violence: violence between two organized actors of which at least one is the government of a state, violence between actors of which neither party is the government of a state, and lastly, violence against unarmed civilians perpetrated by organized non-state groups or governments. The UCDP GED contains each building block of the corresponding aggregated country-year datasets in the form of its constituent events. In order for an instance of violence to qualify for inclusion, the circumstances must not only live up to the previously listed definition of an event, but must also fulfill the definitional criteria of a specified type of violence (for instance, for state-based conflict two parties must dispute an incompatibility).

These categories of UCDP organized violence are mutually exclusive, and thus a single event cannot be coded as being an instance of, for example, both non-state and state-based conflict. If instances of violence occur on the same date in the same location, but take place between different actors, these instances will be coded as separate events. It is, for example, possible for police to kill unarmed civilians (one-sided violence) in location x on date y, while battles rage in the same x on the same y between rivaling criminal factions (non-state violence). Since these two instances fulfill different coding criteria they are coded as separate events. These three categories of violence capture a wide spectrum of different forms of organized violence, but of course not all types that are sometimes seen to be related to the general concept of ‘conflict’. The theoretical categories leave out phenomena such as ‘rioting’ as well as clashes between police/army and individuals and/or groups that are armed but not sufficiently organized.

Each event comes complete with several spatial and temporal locators, such as place name, administrative division, and geographic coordinates, as well as start and end dates, to allow for fine grained spatial and temporal analysis. Each event is also given a unique ID for easy reference, as well as conflict, dyad, and actor IDs. These IDs for conflicts, dyads, and actors correspond with identifying information found in other UCDP datasets and allow ample possibilities for dataset integration.

Post-Estimation Validation

The sheer number of events included and the many variables that accompany each event have necessitated the introduction of additional quality control instruments to ensure data accuracy and quality. Data quality is at least triple-checked, where the coder first runs through a checklist of consistency and streamlining tests. Secondly, a project manager performs similar tests, as well as controls of the geocoding through a set routine of visualization. Thirdly, PHP and Python scripts are run on the data to check consistency across IDs, coordinates, fatality counts, and more. While these routines cannot identify all errors in the base coding they nevertheless ensure high quality and consistency in the final product.

References

This text was adapted from the following sources:

  • Eck, Kristine & Lisa Hultman (2007) One-Sided Violence against Civilians in War: Insights from New Fatality Data. Journal of Peace Research, 44(2): 233-246.
  • Ralph Sundberg, Kristine Eck & Joakim Kreutz (2012) Introducing the UCDP Non-State Conflict Dataset. Journal of Peace Research, 49(2): 351-362.
  • Sundberg, Ralph & Erik Melander (2013) Introducing the UCDP Georeferenced Event Dataset. Journal of Peace Research 50(4): 523-532.

Additional sources cited:

  • Baum, Matthew A & Yuri M Zhukov (2015) Filtering revolution: Reporting bias in international newspaper coverage of the Libyan civil war. Journal of Peace Research 52(3): 384-400.
  • Croicu, Mihai & Joakim Kreutz (2017) Communication technology and reports on political violence: Cross-national evidence using African events data. Political Research Quarterly 70(1): 19-31.
  • Davenport, Christian & Patrick Ball (2002) Views to a kill: Exploring the implications of source selection in the case of Guatemalan state terror, 1977-1995. Journal of conflict resolution 46(3): 427-450.
  • Drakos, Konstantinos & Andreas Gofas (2006) The devil you know but are afraid to face: Underreporting bias and its distorting effects on the study of terrorism. Journal of Conflict Resolution 50(5): 714-735.
  • Galtung, Johan & Mari Holmboe Ruge (1965) The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research 2(1): 64-90.
  • Gleditsch, Nils Petter, Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Håvard Strand (2002) Armed conflict 1946-2001: A new dataset. Journal of Peace Research 39(5): 615-637.
  • Gohdes, Anita & Megan Price (2013) First things first: Assessing data quality before model quality. Journal of Conflict Resolution 57(6): 1090-1108.
  • Price, Megan & Patrick Ball (2014) Big data, selection bias, and the statistical patterns of mortality in conflict. SAIS Review of International Affairs 34(1): 9-20.
  • Urlacher, Brian R. (2009) Wolfowitz conjecture: a research note on civil war and news coverage. International Studies Perspectives 10(2): 186-197.
  • Weidmann, Nils B (2016) A closer look at reporting bias in conflict event data. American Journal of Political Science 60(1): 206-218.