During discussions on various algorithms communities, and with friends and colleagues, I found many people having skewed notions of the words “probability”, “randomness” and non-determinism. They use the words out-of-context or with inaccurate semantics.
This blog was motivated in part to dismiss some inaccurate notions regarding these phrases, and also to present some interesting examples which may help you appreciate the subtleness of expression science requires. I am writing this on a very tight schedule, it may end abruptly (I shall follow up with entries in the near future).
This topic is highly “academic”, and you may appreciate that academicians, not throwing fancy words all the time, when they do use a word, they mean it! And I hope to cultivate some regard amongst CS students for the highly precise nature of the mathematical science they chose to study.
Let’s begin with a fun example my dad used to tell me (he’s a statistician). Imagine there are two people reciting numbers progressively. And there are two observers writing down the numbers spoken by the former candidates. Some information regarding the numbers they’ve provided:
1. Candidate one has said: 1, 5, 2, 9, 7, 3, 2, 1
2. Candidate two has said: 2, 2, 2, 2, 2, 2, 2
Checkpoint1: Now I ask you, can you predict the next number that either of the candidates will utter? Also, please do write down the reason(s) for your answer.
Keep that answer to yourself for the moment. In the meantime, some more info about our contestants:
1. Candidate one is a professor of statistics who specialises in pseudo-random number generation.
2. Candidate two is a mentally retarded person (so far as our contemporary understanding of his condition goes).
Checkpoint2: Now I ask – can you predict the next number either of them will utter? Keep your answer to yourself once more.
In the meantime, some more gossip about these candidates: both people were initially told to produce random numbers.
Checkpoint3: Again, note down your answer and the reason for thinking so.
Let us now try and debate which of the next number to be obtained from either candidate would qualify as “random”. Let us assume that randomness is the property of being unable to predict the next number that will be produced.
1. Candidate one: He knew he was supposed to generate “random numbers”. He is a professor aware of the properties of randomness of a sequence of numbers. And the sequence he has generated does exhibit some of those properties. I shall venture to make a comment here and you can make up your own mind – given enough numbers that he has generated, and if we assumed he is going to follow the “statistical properties of randomness”, it would fairly narrow down the possibilities of the next number he would generate (since it would try to maximise the adherence to the properties of randomness).
2. Candidate two: He probably has no idea what he is supposed to do beyond generating numbers. While we do know that he has generated a sequence of ’2′s consequtively, we cannot assume any properties or any process by which he is generating the sequence. Maybe he has OCD and is generating a specific number of ’2′s before he switches to another number? Maybe the only number he knows is ’2′? Maybe he’ll stop at the next number just for the heck of it?
Based on the definition of randomness that I stated above (conceding that it is my subjective definition), I am compelled to find that the series of ’2′s is a random series, whereas the first series is not, in fact, random.
This is a critical property of randomness that very few people seem to grasp. Randomness of an outcome is a property of the process used to generate the outcome, and not the outcome that was actually generated.
To provide a simpler analogy, will you assume that just because your home had tap water for the previous one hour, you are going to get tap water indefinately? Or conversely that because tap water at your house keeps getting disconnected abruptly, it will never come back? As you can see, the way you’d predict the outcome of the supply of tapwater is based on the process used to provide it. It is the same with random series, and making assumptions based on the numbers themselves is synonymous with predicting tap water availability based on how much water you collected in the bucket.
Sidebar: Frequently, people use the word “random” to indicate “high variance”, which is technically inaccurate.
Law of Averages:
Another fun statistician joke: A guy goes to his doctor for heart surgery. The doc tells him his chances of survival are 100%. The happy, yet skeptical patient asks his doc, “How come? I was told that the survival rate in this surgery is only 10%.” The doc promply replies, “That is precisely correct. And it is because all my previous nine patients have died during this surgery, that I am supremely confident of your survival!”
This example in part shows the danger of predicting future outcomes of a process, based on purely the past outcomes, without looking into the process. Look at it this way, would you yourself go to a doc, who’s had a 100% fatality rate for all his prior patients? However, if it were the best heart surgeon in the world who’s had extreme-stage cases for the past nine times, would you rather trust him or another doc who’s has a zero fatality rate but has left patients with a permanent disabilities where any other doc would have been able to provide a full recovery?
Essentially, I want to reinforce two points:
1. Randomness is a property of the process, not the outcome of the process. Never ever make assumptions of randomness based on purely the outcomes of the process (cryptographers would tell you the extreme dangers of that).
2. Randomness is about unpredictability! It is not about a high variance in outcomes. It is about not being able to predict the outcome. Basically, even if you get a million ’2′s in a row, if you are unable to predict the million-and-first number, it is a random sequence. On the other hand, if you find a sequence of four statistically random numbers, and can predict the 5th number, it is NOT a random sequence.
Non-determinism of outcome vs. Non-determinism of process:
Everyone, I hope is familiar with Schr��dinger’s cat. I keep facing many queries in algorithms forums for “non-deterministic” algorithms, when people really cannot distinguish between determinism and accuracy.
Schr��dinger’s cat is a perfect example of non-determinism or unpredictability or randomness. There is really no parameter in the experiment that allows us to explain the state of the cat while the box is closed. The outcome of this experiment is non-deterministic in the sense that a decision based on this outcome will change drastically since the outcome is discrete and binary.
Another example of non-determinism is that of probabilistic primality testing. Given a number, you may have false positives and false negatives.
Let me put it this way:
In non-deterministic outcomes, the outcomes themselves are well-defined and you know all possible outcomes, only that you don’t know whether the outcome you obtained was correct or incorrect. A non-deterministic Turing machine is a good example of this. Like Schr��dinger’s cat, an NTM can be in multiple states at the same time. However, for each state, the decision to be taken next is well-defined and deterministic. Similarly in a probabilistic primality testing algorithm, each step is well-defined and accurately executed. The algorithm itself has no non-determinism built into it. Given the same input, the algorithm will behave exactly the same every time (although the answer you get may be different). To put in another way, an algorithm to test for the survival of the cat calls for the opening up of the box. This alg
orithm is invariant. You always deterministically open the box and look inside to obtain the outcome. The outcome itself is something you don’t know until the algorithm has completed.
Now don’t get me wrong. I’m not saying a non-deterministic-outcome algorithm has to be deterministic in it’s process. These two are orthogonal properties. Let’s look at non-deterministic-process algorithms:
Random algorithms, on the other hand, use some kind of entropy within the algorithm itself. The processes of such algorithms are themselves non-deterministic. Ideally, until you reach a decision point in the algorithm, you’d have no way of forecasting what decision would be taken at the point. Monte-Carlo methods are an example of such algorithms (but pseudo-random numbers do in fact allow us to predict their decision points). Let me put that another way – if you run the same Monte Carlo method twice, you’d follow different steps. So a Schr��dinger’s cat experiment with Monte Carlo methods would vary based on the parameter you’re making non-deterministic – for example, it may open the the box after different time intervals each time you ran it and you’d have no way (ideally) of predicting when the box would be opened.
An example of a non-deterministic process leading to a deterministic outcome is the use of Monte Carlo methods used to approximate PI (the most common example of Monte Carlo methods), or to get more deterministic, roots of polynomials. The algorithms used generally won’t ever follow the same state transitions twice, but they always lead to the results which are always deterministic and identical (in any case, even if the algorithm doesn’t reach the deterministic answer, there does exist a deterministic answer and the algorithm gets _closer_ to it progressively).
Which brings us finally to:
Approximations or near-optimal solutions are the third orthogonal property we need to consider. The notion of “close to” the answer comes into play here. And this is difficult to differentiate from the property of high-probability of an answer.
An easy way to understand this contrast is to think of prime number generation vs. primality testing.
1. Assume a function F(x) which returns the x-th prime number. Due to computational limitations, there will be loss in numerical accuracy and the number we obtain won’t exactly be a prime itself (and in all likeliness not an integer at all). In this case, the better the numerical accuracy we provide during our computations, we arrive “closer” to the prime number. If our numerical accuracy is infinite, we obtain the perfect integer. But in every case, we are aware that there exists in fact a prime number without any doubt in a neighbourhood of the number we obtain. You can bet your cat’s life expectancy to be propertional to closeness of the outcome, and you’d get a pretty healthy cat.
2. On the other hand, consider an F(x) that returns 1 if x is prime, and 0 if x is non-prime, and any x in (0, 1) by attributing a confidence level to the primality of x. Essentially if you tested a 100 Xi’s, (define Yi = F(Xi)) and if Yi > Yj, then more Xi’s are primes than Xj’s on average over time. However, by increasing Yi, it doesn’t make the number Xi any primer. There is no such thing as “more prime” or “less prime”. Even with a Yi of 0.99, Xi may turn out to be composite. Hence, this is not an “approximation”. This is non-deterministic, probabilistic and perhaps random. But it is not approximate. To say that “X is approximately a prime number” would be a gross misrepresentation (and I’ve heard this statement made more than once). If you bet your cat’s life on this, you’d be playing russion roulette with your cat.
Essentially, in a surgery with chance of 20% tissue damage, all 100% patients would come out with (upto a maximum of) 20% damage to their muscles. In a surgery with 20% mortality rate, 80% people would come out alive.
Such distinctions become very important when you’re in any kind of business that supports decision-making (programming being one of them). Imagine if someone sold you an investment-advice program that had a 0.01% failure in predictions, and another guy sells you a program with 10% loss of accuracy in predictions. It’s important to know these distinctions before evaluating any decision choices. Don’t misunderstand me – I’m not biasing on either side. I personally would buy the first one (since I’d like to make lots of money fast and risk the total loss of all my money). But that doesn’t mean one shouldn’t understand and appreciate the distinction between the two.
This concludes the post for now. I need to go watch He-Man for a while and get motivated and punch in a lot of code.