A common “gotcha” with creating simple Monte Carlo scripts in Python or Excel is the request to add correlation to the distributions being sampled. This is a difficult thing to do in Excel without using plugins like AtRisk, but in Python, correlated sampling can be performed conveniently with the use of Copulas.
The fundamental intuition to build when working with copulas is that all continuous distributions can be remapped to a \(\text{Uniform}[0, 1]\) distribution with their Cumulative Distribution Function (CDF), and transformed back with the inverse CDF, also called the Percent Point Function (PPF). Furthermore, sampling from any distribution can be done by running uniform samples through its PPF (that is how random variate sampling is done in general).
With that trick in mind, we can make use of the multivariate Gaussian distribution and its CDF to easily generate correlated samples of any distributions we want, which is very useful when creating numerical simulation models.
Example
Generate Correlated Samples
Let’s say we have three parameters \(a\), \(b\) and \(c\) that we want to sample with some correlation coefficient \(\rho\). Before we worry about the specifics of those distributions, we can sample from a multivariate Gaussian with that correlation structure.
Code
# Number of samplesn =2500# Correlation matrixrho =0.80corr = np.array([ [1.0, rho, rho], [rho, 1.0, rho], [rho, rho, 1.0],])# Mean vector (zeros with standard mv normal)mu = [0.0, 0.0, 0.0]# Generate samples with dims (n, parameter)samples = np.random.multivariate_normal( mean=mu, cov=corr, size=n)
This generates 2500 samples from the standard multivariate Gaussian distribution with correlation \(\rho = 0.8\).
Now the sample are transformed to the uniform domain, but importantly their correlation structure has been preserved.
Code
uniform_df.corr()
a
b
c
a
1.000000
0.785667
0.783459
b
0.785667
1.000000
0.785617
c
0.783459
0.785617
1.000000
Transform to Distributions
Now we have a bunch of uniformly distributed random samples that have our desired correlation structure. All that is left is to map the samples to our desired distributions using their respective PPFs.
Let’s define our distributions as:
\[
\begin{align}
A &\sim \text{Normal}(500, 50) \\
B &\sim \text{Gamma}(2, 5) \\
C &\sim \text{Beta}(5, 8)
\end{align}
\]
Notice I’m not using normal distribtions exclusively, I can use the copula to map to whatever distributions I want. Note only that that the more skewed and kurtotic the distributions, the more warping will occur in their correlations out the other end. There are other types of Copula that can handle this better than the Gaussian Copula used here.
The correlated samples show a wider uncertainty range in the result than the uncorrelated samples, as is expected. This may be an important detail to capture depending on the sensitivity of the analysis. The effect of correlation can also be quite unintuitive, so it is always worth checking the effect it has on results.