Consider a random event \(X\) having probability \(p(X)=0\). Does \(p(X)=0\) mean \(X\) cannot occur, i.e. is impossible?
At face value, the question seems harmless, but as soon as one starts to attempt an answer to it using mathematical logic alone - one realizes that probability theory is a symphony of both logic and philosophy.
The notion of probabilities is subject to interpretation, and can be argued in a philosophical sense, as it is an abstraction of reality. Consequently, there are various interpretations and approaches to probability theory. However, given a specific interpretation of what probability is, we base our logic on certain axioms of probability, which are mathematical constructs and not subject to interpretation.
An informal approach
I would like to motivate an answer to this question using a relatively informal approach - while still allowing use of elementary mathematical notations of probabilities.
An example random process
Let’s consider a random process which assigns values to the random variable \(X\). Let \(\Omega\) be the set of all possible outcomes of \(X\), i.e. we draw \(X\) from \(\Omega\) or that \(X \in \Omega\). Now, \(\Omega\) can be a finite set with only \(N\) elements or an infinite set.
Given this information, we can’t say what the probability of \(X\) is, i.e. \(P(X)\). The probability function is one that we have assigned to the random process, which we are confident will describe the relative frequency of different \(X\) values from repeated trials of the random process. It is therefore dependent on what elements are in \(\Omega\).
A finite number of possible outcomes
Suppose that \(\Omega\) contains all natural numbers between 1 and 6, i.e. \(\Omega = \{1, 2, \dots , 6\}\). This could describe the possible outcomes of a single standard die. The random variable \(X\) then describes the value obtained in each trial, i.e. each die cast. The probability of \(X\) taking a specific value in \(\Omega\) depends on the die itself and how it is cast.
Let’s assume the die to be perfect and the cast process to be fair, then one value in \(\Omega\) will not occur more often than any other value in repeated trials. For such a process, the frequency of all values approaches the same value as the number of trials increases. For such a result, a discrete uniform probability distribution would be an appropriate probability function for this process. We would then say that the die value in each cast follows a uniform distribution, i.e. \(X \sim \mathcal{U}(N_{\Omega})\) and the probability of obtaining a value \(X=a\), where \(a \in \Omega\), is \(P(X=a)=1/N_{\Omega} = 1/6\).
This is the same interpretation as the classical definition of probability a la Bernoulli and Laplace:
The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.
Back to the question…
Note that each possible outcome of \(X\) is equally probable, i.e. \(P(X=a)=1/6\) for all \(a \in \Omega\). So only values not in \(\Omega\) will have probability zero, i.e. \(P(X=a)=0\) only if \(a \notin \Omega\). That is not an interesting result, and I would say it to be trivial. The only such results which one could possibly argue to be theoretically possible is for the die not to return any element in \(\Omega\), i.e. to return the empty set. That would describe the case where the die ends up balanced on an edge, without any particular side facing up. However, such an outcome has most likely never been observed and is accurately assigned a zero probability by the uniform distribution, i.e. \(P(X=\emptyset)=0\). However, an outcome of values \(X>6\) is absurd and will have zero probability since \(P(X>6) = 1 - P(X \leq 6) = 1-1 =0\) - so this result is indeed impossible.
Therefore, for the discrete random variable case, i.e. \(X\) is discrete, with finite possible outcomes in \(\Omega\), the only outcomes with zero probability are those which are impossible by definition.
What if \(N_{\Omega} \to \infty\)?
Consider now that the size of the finite outcome set \(\Omega\) increases towards \(\infty\). We can do that in essentially two ways, either \(\Omega\) includes more and more natural numbers until \(\Omega\) equals the set of all natural numbers, i.e. \(\Omega = \mathbb{N}\), or it includes more values between \(1\) and \(6\) such that \(\Omega \to [1, 6]\).
A key difference in \(\Omega\) between these two options, is that the first case where \(\Omega \to \mathbb{N}\) is that \(\Omega\) remains countably infinite, i.e. there exists a one-to-one mapping between each element of \(\Omega\) and the natural numbers. In the latter case, where \(\Omega \to [1, 6]\), is that \(\Omega\) becomes uncountably infinite.
In both cases, the number of possible outcomes approaches infinity with increasing size of \(\Omega\). As the uniform probability assumption still holds, each element in the set is equally likely to be obtained in each trial. Therefore, we can state that \[ P(X=a) = \lim_{N_{\Omega} \to \infty} \frac{1}{N_{\Omega}} = 0. \]
However, if \(N_{\Omega}\) remains finite, although approaching infinity the probability \(1/N_{\Omega}\) is defined and finite, although possibly very small or infinitesimal, but not zero. When \(\Omega\) has become either \(\mathbb{N}\) or \([1,6]\) the discrete uniform distribution is no longer defined.
However, when \(\Omega = [1,6]\) the outcome set is a continous range of reals. For such a problem, we can use a continous uniform distribution. We can use a probability density function \(f_X(x) = 1/(6-1)=1/5\) where the probability of obtaining a value \(X\) such that \(1 \leq X \leq a\) is \[ P(1 \leq X \leq a) = \int_{1}^{a}f_X(x)dx \]
Note here that the probability of obtaining \(X=a\) is \[ P(X = a) = P(a \leq X \leq a) = \int_{a}^{a} f_X(a)dx = F_X(a) - F_X(a) = 0. \]
So the probability of obtaining exactly a specific remains zero, but the probability of obtaining a value in an interval is non-zero, with \(P(1 \leq X \leq 6) = 1\). This matches our previous conclusion, that as the number of possible outcomes increases to infinity the probability of a specific value approaches zero.
Applying the same logic to the case when \(\Omega = \mathbb{N}\) is not as well defined, as it now has become a countably infinite set of discrete values.
A discrete uniform distribution cannot be applied on such a set, since the probability of each value would be zero while the sum of the probability of each value must sum to unity. A density function cannot be defined as well since the possible values are not continuous. Therefore, such a probability distribution cannot satisfy all axioms of probabilities. However, for the sake of this argument, we can use the limit approach as \(\Omega \to \mathcal{N}\), i.e. as a countably finite \(\Omega\) approaches a countably infinite \(\Omega\), and say that \(P(X=a) = 1/N_{\Omega} \to 0\) for all \(a \in \Omega\).
This conclusion for countably infinite sets applies only to the discrete uniform probability distribution. Any other distribution, non-uniform, can be defined as certain values are more probable to occur than others, so we can satisfy all of the axioms.
But something must happen..right?
Even though our outcome set becomes infinite, either countably or uncountably, we will get outcomes from repeated trials of our random process. Therefore, even though a specific value has zero probability of being obtained, it can still be the outcome.
Using our example process, then obtaining, say, a value of \(X=4\) regardless of \(\Omega = \mathcal{N}\) or \(\Omega = [1,6]\) is possible, yet has probability \(P(X=4)=0\). This outcome is absolutely reasonable, just extremely improbable. It is extremely improbable to obtain this value, or any specific value, more than once from repeated trials when the possible outcomes are infinitely many and none being more likely to occur than others.
Conclusion
Given the preceding informal discussion, we can argue that an event with zero probability does not have to be impossible - but an impossible event will have zero probability. Impossible events are those which are not in our outcome set \(\Omega\) for a random process.
Instead, a value of \(P=0\) for an event in the set of possible outcomes should be read as improbable not impossible.