Tell me twice

A few days ago, I explained to a colleague why certain communications protocols have a “tell me twice” policy - i.e. to allow for any command to have any effect, the same command - or a command to the same effect - has to be received twice (from the same master). In human parlance, this would be the equivalent of Jean-Luc Picard saying “ensign, I’m about to tell you to lower the shields” … “ensign, shields down!” in which the ensign (Wesley Crusher?) wouldn’t be allowed to obey the second command unless he had heard, understood and acknowledged (HUA!) the first. Now for the math..

This kind of “tell me twice” policy is usually found in serial protocols and is based on the idea that if a bit flips in a message, the chance of the same bit flipping in two consecutive messages is far lower. Bit flipping is a common error in serial communications - which is why parity bits exist, for example - and even if the chance for any given bit flipping is relatively low, the chance of a bit flipping somewhere in any message is relatively high. I’ll show you how that works:

Say the chance of any given bit flipping is $p$, the number of bits in a byte is $b$ and the number of bytes in a message is $n$. This means that the chance of a bit not flipping is $1-p$ and the chance of getting a message through without any flipped bits is $(1-p)^{bn}$, which, depending on the values of $p$, $b$, and $n$ can either be very high, or very low, so let’s fill in a few values to get an idea of what this means. Let’s say that the chances of a bit flipping are about one in a million, we have eight bits per byte and our messages are 256 bytes in length. The message may be a bit larger than what you’d usually find on a serial connection, but we’re going for a case in which there’s an actual payload to send, so I think it’s reasonable. With these numbers, our little formula, $P = (1-p)^{bn}$ works out to a value for $P$ of 0.997954. That is: with a one-in-a-million chance of any bit flipping, one message out of ever 500 is corrupt. If that one message was “shields down” and, due to the bit flip, becomes “eject the warp core”, and the Enterprise is dead in space.

So, the idea of “tell me twice” is to reduce the chances of such errors happening by reducing the chance that a bit flip would have any effect. This is based on the idea that the chances of the same bit flipping in two consecutive chances are far lower than the one-in-five-hundred we just came up with, even it the chances of a bit flipping are still the same. To understand this, we need to know what the chances are that a specific bit in a message is flipped. That chance is $P = p(1-p)^{bn-1}$, which is smaller that the chance of any individual bit flipping (because all the other bits have to not flip). In our case, this works out to $9.980 * 10^{-7}$ which is about one in 1,002,049.

Now, the probability that a message is followed by another message that has the same bit flipped is the product of the probability that a message has a flipped bit (one in 500) and the probability that a message has a given bit flipped (one in a little over a million): $P = (1 - ((1-p)^{bn}))(p((1-p)^{bn-1}))$, which in our case works out to $2.03963 * 10^{-9}$ which is about one in a half billion.

Of course, added to this is the fact that most serial protocols also have some kind of CRC checking, there’s usually a parity bit in there somewhere, etc. All these individually rather weak methods, together, make it nearly impossible for a corrupt message to come through undetected - because for any broken message to come through undetected, all of the error detection features have to fail, so as long as they don’t all check the same thing the same way, the chances of that happening is the product of the chances of each individual method failing (i.e. checking the parity five times doesn’t make your chances of finding an undetected error any better, but checking the parity and four other things does).