In the previous post I explained how Copulas can be used to sample distributions with correlation. A nice extention of that logic is to simulate discrete outcomes with correlation as well.
Independent Events
The modelling of independent and identically distributed trials is easily done with the Binomial distribution. For example, we can simulate tossing a coin 50 times like this:
The above plot shows the number of heads thrown out of 50 total tosses, repeated 5000 times.
This can be made a bit more flexible by generating our own random uniform samples instead of directly using numpy’s random.binomial function:
Code
# Simulate 2D array of 0-1 values with dims (tosses, n)uniform_simdata = np.random.uniform(0, 1, size=(tosses, n))# Convert to a boolean array based on the p valueoutcomes = np.where( uniform_simdata < p,True,False)# Sum along the tosses axis to get total heads per experimentheads = np.sum(outcomes, axis=0)df = pd.DataFrame({"Heads": heads})plot_disrete_histogram(df, "Heads", color="coral").show()
This might seem a bit inconvenient, but when simulating correlated events it becomes an effective way of getting the desired result.
Correlated Events
Because we are now using uniform samples to create the outcomes, we can just feed correlated uniform samples into the process instead. This will correlate the outcomes.
Here is a correlation of \(0.25\) shared across all coin tosses:
The correlation between events has the effect of bifurcating the results, because events are firing together more often. A correlation of \(1\) using this method would return 0 heads or 100 heads, with nothing in between.
These examples are using a simple correlation matrix filled with the same value, but richer correlation structures can be modelled using covariance functions. The correlations might vary through time or space (or both), which means trials closer together may be more tightly related, for example. This would lead to sampling from a “binomial process” rather than the regular binomial distribution.
Bringing Things Together
The framework outlined in this post and the previous provides a toolkit for numerically simulating many 0-dimensional problems (one where the answer is one number, not a higher dimensional array like a 2D map or 3D grid). Because these problems in simulation return a 1D array of results, I often refer to them as 1D models, but this can be a confusing term as the input space may have many dimensions – 1D refers only to the output space.
Many real world problems boil down to what are called Zero-inflated models, where an event occurs with some probability \(p\), returning a random variable, or else returns zero:
\[
X \sim
\begin{cases}
0, & \text{with probability } 1 - p \\
Y \sim \mathcal{D}(\theta), & \text{with probability } p
\end{cases}
\]
I’ll make things a bit more concrete with an example in my next post.