Trivariate Probabilities

The standard trivariate normal distribution function is defined by

 \begin{displaymath}
\Phi({\bf b}, \Sigma) =
\frac{1}{{(2\pi)}^{3/2}{\vert\Sigma...
...{b_3} e^{\frac{{\bf x}^T\Sigma^{-1}{\bf x}}{2}}
dx_3dx_2dx_1,
\end{displaymath} (8)

where $\Sigma$ is a symmetric positive definite covariance matrix given by

\begin{displaymath}\Sigma = \left[
\begin{array}{ccc}
\rho_{11} & \rho_{21} & ...
...\\
\rho_{31} & \rho_{32} & \rho_{33} \\
\end{array}\right]
\end{displaymath}

Without loss of generality, it is assumed that an appropriate scaling of the variables and limits has been made so that $\rho_{11}=\rho_{22}=\rho_{33}=1$.

Owen (1956) briefly discussed the evaluation of the trivariate integral for the general case and gave the formula for the standard trivariate normal integral in terms of the bivariate normal integral as:

 \begin{displaymath}
\Phi({\bf b}, \Sigma) =
\frac{1}{\sqrt{2\pi}} \int_{-\infty}^{b_1} e^{-x^2/2} F(x)dx,
\end{displaymath} (9)

where

\begin{displaymath}F(x) = \Phi\bigg(\frac{b_2-\rho_{21}x}{(1-\rho^2_{21})^\frac{...
...(1-\rho^2_{21})^\frac{1}{2}(1-\rho^2_{31})^\frac{1}{2}}\bigg),
\end{displaymath}

and $\Phi(h,k,\rho)$ is the standard bivariate normal distribution function.

Several methods were investigated for calculating (9) using numerical integration. The simplest method that was considered cuts the infinite integration interval to a finite one, so that (9) is replaced with

 \begin{displaymath}
\Phi({\bf b}, \Sigma) =
\int_{c}^{b_1} \frac{1}{\sqrt{2\pi}} e^{-x^2/2} F(x)dx,
\end{displaymath} (10)

and then a Gauss-Legendre integration rule is used for the interval [c, b1]. Assuming $0 \leq F(x) \leq 1$, c = -5.5 and c = -8.5 were found to be sufficient for respective single and double precision absolute accuracy levels.

The second method considered first transforms (9) using $x = \Phi^{-1}(t)$, so that

 \begin{displaymath}
\Phi({\bf b}, \Sigma) =
\int_{0}^{\Phi(b_1)} F(\Phi^{-1}(t)) dt,
\end{displaymath} (11)

and then uses a Gauss-Legendre rule for the interval $[0, \Phi(b_1)]$.

The third method considered, which has been described in more general form by Drezner (1992), uses modified Gauss-Hermite rules (see Steen, Byrne and Gelhard, 1969). These rules are suitable for integrals in the form $\frac{1}{\sqrt{2\pi}}\int_0^\infty e^{-x^2/2} f(x) dx$, so (9) needs to be transformed to a $[0, \infty]$ integral. Let y = x - b1, and then

\begin{displaymath}\Phi({\bf b}, \Sigma) =
\int_{-\infty}^0 \frac{1}{\sqrt{2\pi...
..._{-\infty}^0 \frac{1}{\sqrt{2\pi}}
e^{-y^2/2-yb_1}F(y+b_1)dy.
\end{displaymath}

Now let z = -y, so that

 \begin{displaymath}
\Phi(b, \Sigma)
= e^{-b_1^2/2} \int_0^{\infty} \frac{1}{\sqrt{2\pi}}
e^{-z^2/2+zb_1}F(b_1-z)dz.
\end{displaymath} (12)

It was found that all of the methods were often more accurate for a particular integration rule if the integration limits were arranged so that the shortest integration interval was associated with the outermost integration. In order to accomplish this, first permute the integration limits (and associated $\Sigma$ entries) so that $\vert b_1\vert \geq max(\vert b_2\vert, \vert b_3\vert$. After this rearrangement, if $b_1 \leq 0$, compute $\Phi(b, \Sigma)$numerically using (10), (11) or (12). If b1 > 0, compute $\Phi(b, \Sigma) = \Phi(b_2, b_3, \rho_{32}) -
\Phi((-b_1, b_2, b_3),\hat{\Sigma})$, where $\hat{\Sigma}$ is the same as $\Sigma$ except the signs are changed on the (2,1), (1,2), (3,1) and (1,3) entries. The first term is a BVN probability and the second term is computed numerically using (10), (11) or (12).

In order to test the algorithms a source of highly accurate TVN probabilities was needed for arbitrary ${\bf b}$ and $\Sigma$. Initially, Schervish's MULNOR was tried, but a number of inconsistencies were found, including some clear discrepancies between some exactly known results and the output from MULNOR. Kennedy and Wang (1992) also reported erroneous results from MULNOR, although the version of MULNOR used for these tests (emailed from statlib, and supported by the statlib $\Phi(x)$) did not fail for those TVN cases reported as failures by Kennedy and Wang.

If ${\bf b}= (0, 0, 0)$, $\Phi({\bf b}, \Sigma)$ is an orthant probability that can be accurately computed using the standard formula $\Phi({\bf b}, \Sigma) = 1/2 -
( arcos(\rho_{21}) + arcos(\rho_{31}) + arcos(\rho_{32}))/(4\pi)$. Some limited testing was done with this kind of special case. An orthant probability example of a MULNOR failure occurs when $\rho_{21} = 0.99992$, $\rho_{31} = 0.64627$, and $\rho_{32} = 0.63975$. In this case $\Phi({\bf b}, \Sigma) = 0.3601519...$, but MULNOR, with requested accuracy set at $5\cdot10^{-7}$, gives $\Phi({\bf b}, \Sigma) = 0.3601110$. However, the $\Sigma$ matrix for this example is almost singular, so MULNOR, which depends on accurate matrix inversion, might be expected to lose accuracy for this type of illconditioned problem.

If $\rho_{21} = \rho_{31} = \rho_{32} = \rho \geq = 0$ then (see Tong, 1990)

\begin{displaymath}\Phi({\bf b}, \Sigma)
= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^...
...-t^2/2}
\prod_{i=1}^3\Phi((b_i-t\sqrt{\rho})/\sqrt{1-\rho})dt,
\end{displaymath}

but the numerical evaluation of $\Phi(b, \Sigma)$ in this form is not significantly easier than the numerical evaluation of (10), (11) or (12), although it can be used to provide a partially independent check on the results, and this type of problem was used for some checking of the three algorithms.

In order to carry out a careful tests of the different algorothms, a large set of general problems was needed with variation in all of the six parameters in ${\bf b}$ and $\Sigma$. Let $\Sigma = CC^T$, with

\begin{displaymath}C = \left[
\begin{array}{ccc}
1 & 0 & 0 \\
\cos(\pi\theta_...
...sin(\pi\theta_3)
& \sin(\pi\theta_2)
\end{array} \right] .
\end{displaymath}

The test that was completed used $\theta_1 = .02, .06, ..., .98$, $\theta_2 = .02, .06, ..., .98$, $\theta_3 = .02, .06, ..., .98$; b1 = -5, -4, ..., 5, b2 = -5, -4, ..., 5, b3 = b2, b2+1, ..., 5. A complete test with all of the different combinations of parameters checks approximately 10 million values for $\Phi({\bf b}, \Sigma)$. This test compared the single precision methods with a double precision method, to be discussed later and believed to be highly accurate.

With the initial testing of methods based on (10), (11) and (12) it was found that an integration rule with a fixed number K of integration rule points, with K in the range 15-25, could not reliably achieve results that were single precision (differences < 10-6) consistent with the double precision algorithm. For example, when (12) is used with a fifteen point modified Gauss-Hermite rule, there are some values of $\Sigma$ and b where there were absolute errors as large as $7\cdot10^{-4}$. Increasing the number of points to twenty-five reduces the maximum errors by an order of magnitude, but further progress is difficult because accurate modified Gauss-Hermite rule points and weights are hard to compute when the number of points is large. Similar problems arose with Gauss-Legendre rules applied to (10) and (11), although the maximum errors were somewhat smaller. Therefore the use of (12) was abandoned, and also the goal of trying to attain single precision accuracy with a fixed moderate degree rule applied to (10) or (11).

In order to try to produce a single precision accurate algorithm for all possible ${\bf b}$ and $\Sigma$ an adaptive integration was tried. A simple globally adaptive algorithm similar to what is used in many of the algorithms for QUADPACK (Piessens, deDoncker, Uberhuber and Kahaner, 1983) was selected. This type of algorithm uses a fixed local integration rule that also provides an error estimate. The rule is first applied to the integrand over the whole integration interval. If the error estimate is larger than the requested accuracy, then the interval, with associated error and integral estimates, is used to initialize a list. The algorithm then proceeds in stages. At each stage an interval with largest error is removed from the list and divided in half. The local integration rule is used on each half of the selected interval and results from the two halves are added to the list. The algorithm terminates when the sum of the local error estimates is less than the requested accuracy. The sum of the local integral estimates is returned as the result. After some experimentation, a 23-point Kronrod rule was selected for the local integration rule. This degree thirty-five rule includes an imbedded eleven point Gauss-Legendre rule. The difference between results from the two rules is used to provide a local error estimate.

The adaptive algorithm used with (10) or (11) produced maxiumum absolute errors $ < 3\times 10^{-7}$) for all of the problems in the major test with (10) slightly more accurate and a little faster (with average time approximately 0.0001 seconds per problem using an 800 MHZ Pentium III processor). Algorithms based on Drezner's (1994) paper were also implemented and tested in order to see if it would be easy to find fixed Gauss-Legendre rule algorithms that could be implemented for single precision work. This experiment failed, and it is the author's opinion, based on extensive testing, that algorithms that use (10) with adaptive integration, can provide, overall, the most reliable and efficient methods for TVN probabilities. The double precision TVN algorithm that was used for all of the tests is based on (10), with the double precision BVN algorithm). Single precision BVN, and double precision BVN (BVND) and TVN (TVND) Fortran implementations, with supporting functions, are listed in the appendix for this paper. A single precision TVN can be produced by simple modifications to TVND.



Alan C Genz
2001-02-25