T maze

## Motivation

• Conspicuous failures of existing methods
• Success of forecasting models in other behavioral domains
• Increased processing power

## Predicting vs forecasting

• Sound theory, but do not know whether the antecedent conditions have been satisfied.
• Even with info + theory, randomness can play a role
• Prediction is possible without explanation

## A problem is that these excuses are often used to justify poor forecasts

• Explanation is possible without prediction:
• Pacifists do not abandon Gandhi's worldview just because he said in 1940 that Hitler is not as bad as "frequently depicted" and that he seems to be gaining his victories without much bloodshed'
• Martin Feldstein predicted that the legacy of the Clinton 1993 budget would lead to stagnation for a decade.
• Prediction is possible without explanation when people have forecasting successes

## What is a good judge?

Two criteria:

• Getting it right
• Thinking the right way

## Getting it right

How do we measure it? - Accuracy - True positives at the cost of false alarms? - Risks of overpredicting vs underpredicting Should false alarms and hits be weighed equally? E.g., what is riskier: - in the 1980s, - underestimate the Soviet Union, tempting them to test the US's resolve? - Overestimate them and pay high military costs I.e., the risk here is to treat as wrong' forecasters those who have made value-driven decisions to exaggerate certain possibilities. - How early?

## thinking the right way

• Do not violate basic probability theory. i.e., probabilities should sum to 1

## Ontological Skeptics

Interdeterminacy is due to the properties of the external world. A world that would be just as unpredictable if we were smarter. - Path dependency, aka increasing returns - QWERTY - Polya's urn: Small initial advantages accumulate - Rise of the West - Tiny advantages that Europe had: property rights, rule of law, market competition - Hard to know whether we face an increasing- or decreasing returns world. Ie., does history have a diverging branching structure that leads to a variety of possible worlds, or a converging structure that channels us into destinations predetermined long ago - Cleopatra's nose.

• Complexity theorists Aka, the butterfly effect

• Gabriel Prinzip
• Great oaks from little acorns. Problem: impossible to pick the influential little acorn before the fact.
• Game theorists Multiple or mixed strategy equilibria
• Players will second-guess each other to the point where political outcomes, like financial markets, resemble random walks.
• Financial geniuses are statistical flukes

## Psychological Skeptics

We mispredict because of the way our minds work

• Preference for simplicity: "Bachar al Assad is like Hitler"
• Aversion to ambiguity and dissonance
• People are overconfidence in their counterfactual beliefs
• People dislike dissonance. They like to couple good causes with good effects. But detested policies can sometimes have positive effects. E.g., valued allies can have a frightful human rights record.
• People hate randomness.
• e.g., rat experiment
• When we know the base rate and not much else, we'd be better off predicting the most common outcome

## Skeptics views: 6 hypotheses

• Humans perform no better than chimps to predict turbulences
• Diminishing marginal returns: casual reader of news will perform as well as expert
• Reversion to the mean: lucky streaks of predictions will not last
• As expertise rises, confidence in forecasts should rise faster than the accuracy of forecasts

## METRICS. The risk to reward overly cautious forecasters: calibration vs discrimination

• Perfect calibration if there is precise correspondence between subjective and objective probabilities. But Calibration rewards cautious forecasters, i.e., those who predict a base rate strategy
• Discrimination: perfect scores when assign different probabilities to events that happen and to those that don't.

(optional) For those interested, see discussion on calibration vs discrimination:

## What is the right baseline of comparison?

• Crude algo: assign same probability as historically
• Predict continuation of past state
• Formal statistical equations

## Results

• Most existing research makes no effort at testing their theory on future data
• "isms"
• statistical models
• Tetlock: let's see how well experts perform. 284 participants,
• most with doctorates, almost all with postgraduate training in polsci, econ, international law, diplomacy, journalism
• avg of 12 years of work experience
• academia, think tanks, governments, IOs
• Very thoughtful and articulate
• Broad cross-section of political, econ and national security outcomes

## Results

Source: Tetlock, p. 51

• Humans overpredict rare events
• Experts no better than dilettantes
• All humans far worse than algorithms, even simple ones

## The experts fight back

• Perhaps we didn't select the right experts? But little evidence of that: equally poor regardless of seniority or domain (academia, government, etc.),
• No better at short term vs long term, domestic v international, econ v political.
• Perhaps our dilettantes are really experts. I.e., slightly less specialized, but still well read.
• So let's look at briefly briefed UG students. They are worse, so expertise does matter to an extent.
• Maybe experts are very cautious. I.e., better safe than sorry. So we can correct for various such mistakes. In short, we take out the difference between their average forecast and the base rate for the outcome.

## Why hedgehogs are here to stay

• media attention

## What Data to use?

• Structural indicators are too slow
• Social media too fast
• Event data

## Existing projects

• DARPA ICEWS (2007-present)
• IARPA's
• Peace Research Center Oslo (PRIO) and Uppsala University UCDP models
• etc.

## Convergent results

• Temporal autoregressive effects are huge: the challenge is predicting onsets and cessations, not continuations
• Spatial autoregressive effects—“bad neighborhoods are also huge
• 80% accuracy—in the sense of AUC around 0.8— in the 6 to 24 month forecasting window occurs with remarkable consistency: few if any replicable models exceed this, and models below that level can usually be improved
• Measurement error on many of the dependent variables—for example casualties, coup attempts—is still very large
• Forecast accuracy does not decline very rapidly with increased forecast windows, suggesting long term structural factors rather than short-term“triggers” are dominant. Trigger models more generally do poorly except as post hoc “explanations.”

## Where algorithms do well

• Nate Silver performed very well in the 2008 election (not so well in 2016…)
• Routine elections in rich countries like the United States are some of the softest targets in political forecasting. Rules are transparent; high-quality data, including surveys of would-be voters, are often available; and the connection between those data and the outcome of interest is fairly straightforward.

## Where algorithms do less well

• Nate Silver fails too, even for elections
• for international events, we often lack data. We might know the predictors, but be unable to get the data
• Even simple indicators are tricky
• GDP is produced by government agencies
• Some don't even report national economic statistics
• Events are rare
• Most states are "safe"
• Many states are obviously at risk
• a small set is uncertain
• Note: rare events $$/neq$$ Black swans
• Heterogenous environment
• is the system changing significantly while we are trying to model it? How far back are data still relevant?
• Changing nature of conflict

## Irreducible sources of errors

Specification error: no model of a complex, open system can contain all of the relevant variables; I Measurement error: with very few exceptions, variables will contain some measurement error I presupposing there is even agreement on what the “correct” measurement is in an ideal setting; I Predictive accuracy is limited by the square root of measurement error: in a bivariate model if your reliability is 80%, your accuracy can’t be more than 90% I This biases the coefficient estimates as well as the predictions I Quasi-random structural error: Complex and chaotic deterministic systems behave as if they were random under at least some parameter combinations . E.g., rabbit population

• Rational randomness such as that predicted by mixed strategies in zero-sum games
• Arational randomness attributable to free-will I Rule-of-thumb from our rat-running colleagues: “A genetically standardized experimental animal, subjected to carefully controlled stimuli in a laboratory setting, will do whatever it wants.”
• The effects of natural phenomenon I the 2004 Indian Ocean tsunami dramatically reduced violence in the long-running conflict in Aceh

## Feed-forward

Effective policy response: in at least some instances organizations will have taken steps to head off a crisis that would have otherwise occurred.

## Going further

• Nassem Nicholas Taleb. The Black Swan
• Daniel Kahneman. Thinking Fast and Slow
• Philip Tetlock. Expert Political Judgment
• Nate Silver. The Signal and the Noise