Chance and Counterfactuals[1]

On those interpretations of Quantum Mechanics according to which the wave function for a system delivers probabilities of location, it seems that in any mundane situation, there is always a small chance of some extremely bizarre course of events unfolding. Suppose I drop a plate. The wave function that describes the plate will reckon there to be a tiny chance of the particles comprising that plate flying off sideways.

Suppose we embrace some such scientific theory. What then should we make of counterfactuals? We shall certainly be tempted to think that most ordinary counterfactuals are false. After all, having assimilated the theory, we shall be led to accept:

(1) If I had dropped the plate, it might have flown off sideways.

This in turn will induce us to think that (2) is incorrect:

(2) If I had dropped the plate, it would have fallen to the floor.

It would seem that we should instead embrace:

(3) If I had dropped the plate, it would very likely have fallen to the floor.

We then conclude that those propositions expressed by ordinary counterfactuals like (2) are false.

The threat can be recast in terms of the standard semantics for counterfactuals,[2] which tells us that

(4) ‘P > Q’ is true iff all closest P worlds are Q worlds

Don’t we learn from our chancy science that while at most of the closest worlds where I drop the plate, it falls to the floor, there are a few worlds just as close but where the plate flies off sideways? It thus seems that we should conclude that the ordinary counterfactual is false.

How do we resist this pressure towards an error theory of ordinary counterfactual judgments while retaining the relevant type of scientific theory? I know of two strategies: the one replaces ‘all’ in (4) with ‘most’; the other suggests a particular understanding of ‘closest’ according to which the possibility expressed in (1) can be discounted. After a few remarks about the first strategy, I will devote the bulk of this paper to discussing the second.

First Strategy

The first strategy maintains that the truth of (3) is in fact sufficient for the truth of the English sentence (2). In effect the first strategy suggests replacing the standard semantics by

‘P > Q’ is true iff most of the closest P worlds are Q worlds

(We need not worry here about exactly what threshold corresponds to ‘most’. Presumably it is vague. Most likely it will also be reckoned context-dependent. ) There is a striking intuitive cost of the first strategy, one which flows from the fact that it is perfectly possible that most of the closest P worlds are Q worlds and that most of the closest P worlds are R worlds without it being the case that most of the closest P worlds are Q and R worlds. (This is just an instance of the more general fact that for any n less than 100, it is perfectly possible that n % or more of a given class is F and that n % or more of that class is G, while less than n % is F and G). The cost is that we have to deny the following inference rule:

Agglomeration

(6) P >Q, P > R├ P > Q and R

Agglomeration is overwhelmingly intuitive.[3] A speech of the form ‘If I had dropped the cup, thus-and-so would have happened and if I had dropped the cup, such-and-such would have happened, but it is not the case that if I had dropped the cup, thus-and-so and such-and-such would have happened’ strikes us as profoundly odd.

Second Strategy

The aim of this paper is to examine in some detail a second strategy, one which offers the hope of saving Agglomeration. It is inspired by some remarks of David Lewis (1986), and is based on the simple idea that worlds with bizarre low probability outcomes are, ceteris paribus, more distant than worlds without such outcomes. Thus even if the bizarre outcomes do not violate laws of nature, they will be reckoned too distant to undermine ordinary counterfactual judgments. Lewis introduces the notion of a “quasi-miracle”, an event which is both low probability and which has a pattern which is, by our lights, remarkable. Mere low probability does not by itself make an event a quasi-miracle. A good thing too. Assuming our scientific theory, any high probability outcome will divide into a set of low probability subcases. The notion of quasi-miracle will not help if any sequence of events counts as a quasi-miracle on account of it being of low probability that that very sequence, in all its detail, would occur. Rather, quasi-miraculousness consists of low probability in combination with remarkableness. Here is one of Lewis’ examples:

If the monkey at the typewriter produces a 950-page dissertation on the varieties of anti-realism, that is at least somewhat quasi-miraculous….. If the monkey instead types 950 pages of jumbled letters, that is not at all quasi-miraculous. But, given suitable assumptions about what sort of chance device the monkey is, the one text is exactly as improbable as the other (1986, p. 60).

Lewis’ thesis is that if a world contains a quasi-miracle, that detracts from its similarity to this world.

We now have a recipe for salvaging mundane counterfactuals like (2). Worlds in which a bizarre chance event unfolds are quasi-miraculous worlds and as such are, ceteris paribus, further from the actual world than worlds in which such events do not occur. The worlds in which the plate flies off sideways are thus, by virtue of containing a quasi-miracle, more distant from the actual world than worlds in which the plate falls to the floor. Thus (2) is not undermined by the facts postulated by our scientific theory. What of (1)? Lewis is happy to concede that if I had dropped the plate, there would have been a small chance of it flying off sideways. Suppose we read (1) as:

(7) If I had dropped the plate, it would have been the case that its flying off sideways was (nomically) possible

On the semantic proposal at hand, this is perfectly compatible with (2). Lewis points out that there is another reading of claims like (1), where ‘might’ is equivalent to ‘not would not’. On that reading, (1) is false. But on that reading (1) is not secured by the scientific theory and in particular is not secured by (7).

Some will reckon the notion of “remarkableness” too woolly to serve as the basis for an account of the truth value of counterfactuals. Others will have a principled objection to any semantics that appears to tie the truth value of counterfactuals to the contingent make-up of human psychology – which will inevitably be the basis of any articulate distinction between remarkableness and unremarkableness that can do the job here. I shall not pursue these general methodological concerns. Rather I wish to point to four (related) problems that any development of Lewis’ view will run into. For my purposes then, I shall treat “remarkableness” as something of a primitive, assuming a rough and ready sense on the part of readers as to what Lewis had in mind.

Problem 1

Recall that neither remarkableness nor low probability is alone sufficient to render an event a quasi-miracle. Consider then a remarkable event of reasonably high probability. Suppose, as it happens some monkey at a typewritten is currently so configured that, if left alone, there would be a 20 per cent chance of it typing something that looks much like a novel. You take away the typewriter. Nothing remarkable actually happens. Clearly, the counterfactual

(8) If you hadn’t taken away the typewriter, the monkey wouldn’t have typed something that looks much like a novel

is false. So far, no problem. A monkey’s writing something much like a novel is remarkable but in the situation described does not seem to count as a quasi-miracle, since it is not of low probability. But there is a problem lurking. Recall that high probability outcomes invariably divide into low probability subcases. The same will be true of remarkable high probability outcomes. So consider each particular nomically possible sequence of events (e¹….eⁿ) in which the monkey types something that looks much like a novel. The disjunction of e¹….eⁿ is of reasonably high probability. But each of e¹ to eⁿ is of very low probability. Moreover each of e¹ to eⁿ is remarkable. After all, each of e¹to eⁿ is a sequence of events in which the monkey writes something very much like a novel. Let us compare those closest worlds w in which the typewriter is not taken way and the monkey types something much like a novel to those closest worlds w* in which the typewriter isn’t taken away and nothing looking much like a novel is produced. Each world w will contain some particular one of the sequence e¹….eⁿ. Thus each world w will contain a quasi-miracle. Apply Lewis’ similarity metric and we will reckon various w* worlds as closer than any w world on account of the occurrence of a quasi-miracle in each w world. But now (8) comes out true. An intolerable result. General lesson: Whenever remarkable non-low probability outcomes divide into remarkable low-probability subcases, Lewis’ account, as it stands, will deliver unacceptable results.

(Proposals for a fix should be tested against the following simple counterexample recipe: Properties that make for remarkableness in a long sequence of coin flips include: All heads; All tails; Being all of the same orientation. For any sequence of fair coin tosses, having the latter property will be twice as likely as having either of the former pair. Suppose being a quasi-miracle requires remarkableness plus being below threshold N. Then one can easily describe a case in which a counterfactual sequence of coin flips is just low enough that the chance of that sequence being all of the same orientation is higher than N but where each subcase – All heads, All tails – is lower than N and hence quasi-miraculous. . . .)

Problem 2

Lewis tells a story according to which (7) is perfectly compatible with (2). But it also predicts other compatibilities that are far more jarring, intuitively. Consider the following case. A coin flipper is poised to flip a fair coin a million times. You steal the coin. Consider the counterfactual:

(9) If you hadn’t stolen the coin, the coin flipper wouldn’t have tossed all heads.

A natural enough claim to make in the midst of ordinary thought and talk.[4] And Lewis tells us that it is true. (Granted, there is no nomic prohibition on a world containing a sequence of coin flips which comes up heads each time. But that is a world where a paradigmatically remarkable low probability event occurs, and that world thus contains a quasi-miracle.) Digest Lewis’ similarity metric and we can happily assert (9) while also being willing to assert

(10) If you hadn’t stolen the coin, there would have been a small chance of the coin flipper’s tossing all heads.

There are, obviously, many possible Heads/Tails sequences that the coin flipper might have produced (2 ^1,000,000in fact). The sequence All Heads is a remarkable sequence. But there are plenty of other particular sequences that are unremarkable. Call one such sequence S. [5] (All Heads is to S as the Monkey’s 950 page dissertation is to the 950 page jumble in Lewis’ original example.) Even if the combination of (9) and (10) does not immediately strike one as strange, the same cannot be said for various combinations of counterfactuals involving relative likelihood claims in their consequents.

Consider

(11) If you hadn’t stolen the coin, the coin flipper’s tossing all heads would have been exactly as likely as his tossing S.

(12) If you hadn’t stolen the coin, the coin flipper’s tossing either all heads or all tails would have been twice as likely as his tossing S.

Both (11) and (12) are incontrovertibly true.[6] Further, Lewis’ account tells us that

(13) It is not the case that: If you hadn’t stolen the coin, the coin flipper wouldn’t have tossed S

(since S does not constitute a quasi-miracle) . And we have already accepted

(9) If you hadn’t stolen the coin, the coin flipper wouldn’t have tossed all heads.

(since tossing all heads does constitute a quasi-miracle). Further, assuming Agglomeration (whose consistency with the second strategy is to my mind the main positive virtue of Lewis’ proposal), we have

(14) If you hadn’t stolen the coin, the coin flipper wouldn’t have tossed either all heads or all tails.

The trio of (11), (9) and (13) strikes me as very odd indeed. The trio of (12), (13) and (14) strikes me as even worse. Having claimed that one outcome of a non-actual process would have been twice as likely as another, it seems absurd to outright assert that the more likely outcome would not have occurred while denying that the less likely outcome would not have occurred. General lesson: Once the relative likelihoods of remarkable events, as compared with other (less or equally likely) unremarkable events, are fully in view, Lewis’ proposal, as it stands, delivers unacceptable results.

Problem 3

Sometimes we realize that it would be pretty surprising if an unremarkable thing never happened. Our scientific theory might say that it is very unlikely indeed that an atom perform a certain patterned motion of geometrical significance at a particular time but that it was pretty likely that sooner or later some atom would perform that patterned motion. Suppose, to illustrate, there are 2^10,000,000coin flippers f¹….fⁿ. Each is poised to flip a coin a million times. I arrange for their coins to be stolen. It is quite clearly false that

(15) If I hadn’t stolen the coins, none of the coin flippers would have flipped all heads.

It would , after all, have been quite surprising if none of them had flipped all heads. However, bearing in mind (9), Lewis’ account would have it true that

(16) If I hadn’t stolen the coins, f¹ wouldn’t have flipped all heads.

(After all when (9) is asserted, one needn’t worry, it would seem, about whether there are other similar coin flipper elsewhere, in distant lands or times.)

Similarly,

(17) If I hadn’t stolen the coins, f² wouldn’t have flipped all heads.

And so on, for each individual coin flipper.

By Agglomeration we get

(18) If I hadn’t stolen the coins, none of f¹ to fⁿ would have flipped all heads.

But this contradicts what we noticed earlier, namely that (15) is obviously false. General lesson: There are remarkable events types such that it would be surprising if that event type never occurred in some suitably long patch of history. Combine this observation with Agglomeration and Lewis’ theory, as its stands, delivers unacceptable results.

Problem 4

A related worry. In passing, Lewis tells us that we needn’t concern ourselves very much with the possibility that the actual world contains lots of quasi-miracles:

What if, contrary to what we believe, our own world is full of quasi-miracles? Then other-worldly quasi-miracles would not make other worlds dissimilar to ours. But if so, we would be very badly wrong about our own world, so why should we not turn out to be wrong also about which counterfactuals it makes true? I say that the case needn’t worry us (1986, p. 61).

But isn’t it obvious that the world is full of quasi-miracles, construed as low probability events that we would find remarkable once pointed out? Consider, for example, the fact that the apparent size of the sun is that of the moon, the often discussed coincidences between the life of Kennedy and Lincoln[7], the fact that the acceleration of gravity at the surface of the earth multiplied by one period of the earth's orbit is equal to the speed of light, Bode’s Law concerning the relationship of the mean distances of the planets from the sun (misnamed because it describes a coincidence, not a law), facts describing a particular person’s getting thirteen cards of the same suit in bridge hand (at odds of 4 in 635,013,559,600), that such and such drew in a single breath one or more molecules from each of the last gasps of the twelve disciples ….. and so on.[8] It is hard to see why facts such as these should not count as quasi-miracles. But if they do, then the world contains lots and lots of them. General Lesson: If low probability remarkable events make for dissimilarity, that had better not be because one supposes that the actual world does not itself contain plenty of them.

Conclusion

Can Lewis’ account be fixed? I hope that the problems make clear that any satisfactory development of Lewis’ approach will require selective appeal to contextualism concerning remarkableness, perhaps even in combination with a denial of Agglomeration.[9] We can well anticipate progress being made on our problems by suitable appeal to a context-dependent grain of description that determines which quasi-miracles are relevant, or to some rule of attention according to which the salience of some low probability event enhances its closeness.[10] One is reminded here of discussions of knowledge where selective appeal to contextualism, sometimes in combination with a denial of epistemic closure, is used to ward off the threat of scepticism. I hesitate to claim that no such package can be made more palatable than the error-theoretic alternative.

In closing I might mention that my own preference is to opt for the most straightforward version of Robert Stalnaker’s semantics for counterfactuals in which, for any possibility that P, and any world w, there is a unique closest world to w where P. I realize, of course, that this is to give up altogether on the Lewisian idea of analyzing counterfactual closeness in terms of similarity, and to give up on a thesis of Humean Supervenience (since becomes hard to resist allowing for pairs of worlds which are intrinsic duplicates but not counterfactual duplicates). It is also to give up on all neo-verficationist analyses of counterfactual discourse, since the closeness relation between worlds and the counterfactual operator on propositions form a family into which there is no entering reductive wedge. From this perspective, of course, matters are very different when it comes to the problems at hand. Suppose I do not drop a plate at t and the world is chancy. There is a closest world where I drop the plate at t. If it goes off sideways at that world the counterfactual is false. Otherwise it is true. But how then can we know the truth of the counterfactual that if I had dropped the plate it would have fallen to the floor? Doesn’t this require an utterly mysterious kind of modal insight? Consider a happy case in which I make a counterfactual judgment of this sort and the closest world where the antecedent is true is one where the plate I am speaking of falls to the floor. Consider an unhappy case in which the plate I am speaking of flies of sideways at the closest world at which the plate is dropped (at the time I am speaking of). Suppose that there are many more happy cases than unhappy cases, but that there are unhappy cases. The skeptical challenge, when articulated, will have a familiar shape: ‘We cannot discriminate the happy case from the unhappy one. Since we are making a mistake in the unhappy case, we do not know in the happy one.’ It remains to be explained why, if we are not skeptics, we should nevertheless succumb to skepticism in this particular case.

I hope that this paper has provided at least some motivation for my preferred orientation. For those intent on pursuing other strategies, I hope at least to have illustrated some of the myriad pitfalls than any such strategy must try to avoid.

References

Hawthorne, John. 2004. Knowledge and Lotteries. Oxford University Press.

Lewis, David. 1973. Counterfactuals. Blackwell.

Lewis, David. 1986. ‘Counterfactual Dependence and Time’s Arrow,’ in Philosophical Papers Volume II. Oxford University Press. 32-66.

Lewis, David. 1999. ‘Elusive Knowledge,’ in Papers in Metaphysics and Epistemology, Cambridge University Press. 418-445.

Kahneman, Daniel and Tversky, Amos. 1982. ‘Subjective probability: A judgment of representativeness,’ in Judgment under uncertainty: Heuristics and biases. Kahneman, Slovic, and Tversky (eds.)

Stalnker, Robert. 1968. A Theory of Conditionals. Blackwell: Oxford.

Stalnaker, Robert and Thomason, Richard. 1970 A semantic analysis of conditional logic. Theoria, 36:23-42.

[1] I am grateful here for discussions with and comments from Frank Artzenius, Adam Elga Tamar Gendler and David Manley.

[3] The standard semantics for counterfactuals reckons Agglomeration valid. That is one intuitive virtue of that semantics.

[4] Though even here, of course, it is not so hard to induce retraction by, e.g. telling a story in which we build a fair lottery around our coin flipper, such that for each particular sequence S of coin flips, there would be a lottery ticket which wins iff S occurs. Suppose we decide not to build a uniform lottery around the coin flipper. We steal the coin instead. Still, we could have built a fair lottery around the coin flipper in which, say, ticket number 1 corresponds to All Heads. And it seems very bad to outright assert that had we run such a lottery, ticket number 1 would have lost. (Thanks to Adam Elga here.) Moreover, it is also quite easy to get into a frame of mind according to which on think that, in effect, a wave function is a fair lottery, and thus a frame of mind in which an assertion of mundane counterfactuals like (2) is tantamount to an assertion that if a certain fair lottery had been run, certain tickets would have lost. The puzzles that arise are similar to those that I have written about at length elsewhere.

[5] While it is not my purpose here to unpack the notion of remarkableness, it is obvious enough that our comparative sense of remarkableness is connected to the fact that our conditional probability that a series of coin flips is the outcome of a chance process is much lower on it being All Heads than it is on it being S.

[6] Notwithstanding the fact that ordinary people are notoriously bad at comparative likelihood judgments when one sequence seems more “representative” of the randomness of the process than another to which it is being compared. See Kahneman and Tversky (1982).

[7] See, for example, the Skeptical Inquirer, Sept/Oct 1998.

[8] A rich source of examples like these can be found on the Dartmouth College “Chance” website at http://www.dartmouth.edu/~chance.

[9] Though the later concession would lead one to wonder whether there was any issue of substance between the first and second strategies.

[10] Cf .The “Rule of Attention” in ‘Elusive Knowledge,’ in Lewis (1999). Note, though, that adding a dose of contextualism along these lines might very well indict various of the claims made in Lewis (1986), such as ‘If Nixon had pressed the button, there would not have been a quasi-miracle,’ where the nomic possibility of a quasi-miracle is obviously very salient and yet discounted.