Tuesday, February 05, 2013

Preliminary Evidence That the World Is Simple (An Exercise in Stupid Epistemology)

Is the world simple or complex? Is a simple hypothesis more likely to be true than a complex hypothesis that fits the data equally well? The issue is gnarly. Sometimes the best approach to a gnarly issue is crude stupidity. Crude stupidity is my plan today.

Here's what I did. I thought up 30 pairs of variables that would be easy to measure and that might relate in diverse ways. Some variables were physical (the distance vs. apparent brightness of nearby stars), some biological (the length vs. weight of sticks found in my back yard), and some psychological or social (the S&P 500 index closing value vs. number of days past). Some I would expect to show no relationship (the number of pages in a library book vs. how high up it is shelved in the library), some I would expect to show a roughly linear relationship (distance of McDonald's franchises from my house vs. MapQuest estimated driving time), and some I expected to show a curved or complex relationship (forecasted temperature vs. time of day, size in KB of a JPG photo of my office vs. the angle at which the photo was taken). See here for the full list of variables. I took 11 measurements of each variable pair. Then I analyzed the resulting data.

Now, if the world is massively complex, then it should be difficult to predict a third datapoint from any two other data points. Suppose that two measurements of some continuous variable yield values of 27 and 53. What should I expect the third measured value to be? Why not 1,457,002? Or 3.22 x 10^-17? There are just as many functions (that is, infinitely many) containing 27, 53, and 1,457,002 as there are containing 27, 53, and some more pedestrian-seeming value like 44. On at least some ways of thinking about massive complexity, we ought to be no more surprised to discover that third value to be over a million than to discover that third value to be around 40. Call the thesis that a wildly distant third value is no less likely than a nearby third value the Wild Complexity Thesis.

I can use my data to test the Wild Complexity Thesis, on the assumption that the variables I have chosen are at least roughly representative of the kinds of variables we encounter in the world, in day-to-day human lives as experienced in a technologically advanced Earthly society. (I don't generalize to the experiences of aliens or to aspects of the world that are not salient to experience, such as Planck-scale phenomena.) The denial of Wild Complexity might seem obvious to you. But that is an empirical claim, and it deserves empirical test. As far as I know, no philosopher has formally conducted this test.

To conduct the test, I used each pair of dependent variables to predict the value of the next variable in the series (the 1st and 2nd observations predicting the value of the 3rd, the 2nd and 3rd predicting the value of the 4th, etc.), yielding 270 predictions for the 30 variables. I counted an observation "wild" if its absolute value was 10 times the maximum of the absolute value of the two previous observations or if its absolute value was below 1/10 of the minimum of the absolute value of the two previous observations. Separately, I also looked for flipped signs (either two negative values followed by a positive or two positive values followed by a negative), though most of the variables only admitted positive values. This measure of wildness yielded three wild observations out of 270 (1%) plus another three flipped-sign cases (total 2%). (A few variables were capped, either top or bottom, in a way that would make an above-10x or below-1/10th observation analytically unlikely, but excluding such variables wouldn't affect the result much.)

So it looks like the Wild Complexity Thesis might be in trouble. Now admittedly a caveat is in order: If the world is wild enough, then I probably shouldn't trust my memory of having conducted this test (since maybe my mind with all its apparent memories just popped into existence out of a disordered past), or maybe I shouldn't trust the representativeness of this sample (I got 2% wild this time, but maybe in the next test I'll get 50% wild). However, if we are doubtful about the results for either of those reasons, it might be difficult to escape collapse into radical skepticism. If we set aside radically skeptical worries, we might still wonder how wild the world is. These results give us a preliminary estimate. To the extent the variables are representative, the answer seems to be: not too wild -- though with some surprises, such as the $20,000 listed value of the uncirculated 1922 Lincoln wheat penny. (No, I didn't know about that before seeking the data.)

If we use a Wildness criterion of two (two times the max or 1/2 the min), then there are 33 wild instances in 270 observations, or about 12%, overlapping in one case with the three flipped-sign cases, for 13% total. I wouldn't take this number too seriously, since it will presumably vary considerably depending on the variables chosen for analysis -- but still it's smaller than it might have been, and maybe okay as a first approximation to the extent the variables of interest resemble those on my list.

I had meant to do some curve fitting in this post, too -- comparing linear and quadratic predictions with more complex predictions -- but since this is already a good-sized post, we'll curve fit another day.
I admit, this is a ham-handed approach. It uses crude methods, it doesn't really establish anything we didn't already know, and I'm sure it won't touch the views of those philosophers who deny that the world is simple (who probably aren't committed to the Wild Complexity Thesis). I highlight these concessions by calling the project "stupid epistemology". If we jump too quickly to clever, though, sometimes we miss the necessary groundwork of stupid.

Note: This post was substantially revised Feb. 6.


Michael Caton said...

I read this when you posted it and I've been thinking about it since then. It's a pretty interesting experiment, but I think there is a clear and non-mysterious answer here.

You state: "I can use my data to test the Wild Complexity Thesis, on the assumption that the variables I have chosen are at least roughly representative of the kinds of variables we encounter in the world, in day-to-day human lives as experienced in a technologically advanced Earthly society. (I don't generalize to the experiences of aliens or to aspects of the world that are not salient to experience, such as Planck-scale phenomena.)"

We only perceive and understand a very narrow slice of the universe. By restricting the variables to those things relevant to human experience, you're introducing a massive bias. And that bias is likely to be toward non-wild variables. Why?

It seems trivial to say that the relationships among the extremely man-made phenomena you've listed (library books, McDonald's) are likely to present as simple, highly clustered sets of datapoints. But that this should be so even for the non-man-made things like stars is more interesting. But I think the reason why this is has to do with the way our nervous systems came to be and how they're designed. Something as complex as a tissue capable of representing these things, i.e. the nervous system, is more likely to organize itself along the lines of simpler, easier-to-predict (more clustered, less wild) variables.

That is to say, evolution is more likely to produce replicators that get and act on information about non-wild variables, and that restricts what we as products of evolution perceive in the first place. For some variable where the next value is likely to be wildly distant, what's the advantage in developing sense organs to detect it or a nervous system that can store and compare it? Why bother? Consequently even by picking the natural objects that we can notice, we can't escape enriching the set of chosen variables for non-wildness, because we're not built to experience or notice the patterns of wild variables in the first place.

The obvious next question is how we would go about decreasing our bias toward non-wildness when selecting these variables. We might want to do exactly the opposite of what you suggested, and include ONLY Planck-scale phenomena. If we're able to choose variables far from the domain of human experience and they're still non-wild, then that does a better job of making the Wild Complexity Thesis unlikely.

Eric Schwitzgebel said...

Michael: Yes! I have been thinking a similar thing. I see two options: Find a rigorous, non-humanocentric way of choosing variables. That seems hard, and the technique seems likely to be contentious, but maybe it could be done. Or stick with the "stupid" haphazard chose of confessedly humanocentric variables and confess the relativity of the conclusion.

Eric Schwitzgebel said...

(That last option might be broadly Kantian.)

Michael Caton said...

I think critics of trying to find less humanocentric variables (your first option) would be hard-pressed to say that we must assume any one set of variables must be equally humanocentric, or that we can't tel, as any other. Or at least, couldn't assert this without also making a pretty strong statement about the possibility of knowledge.