Covariance analysis#
Covariance analysis, also called principal component analysis or
essential dynamics 169, can find
correlated motions. It uses the covariance matrix
where
The columns of
The eigenvalue
When the analysis is performed on a macromolecule, one often wants to remove the overall rotation and translation to look at the internal motion only. This can be achieved by least square fitting to a reference structure. Care has to be taken that the reference structure is representative for the ensemble, since the choice of reference structure influences the covariance matrix.
One should always check if the principal modes are well defined. If the first principal component resembles a half cosine and the second resembles a full cosine, you might be filtering noise (see below). A good way to check the relevance of the first few principal modes is to calculate the overlap of the sampling between the first and second half of the simulation. Note that this can only be done when the same reference structure is used for the two halves.
A good measure for the overlap has been defined in 170. The
elements of the covariance matrix are proportional to the square of the
displacement, so we need to take the square root of the matrix to
examine the extent of sampling. The square root can be calculated from
the eigenvalues
It can be verified easily that the product of this matrix with itself
gives
where tr is the trace of a matrix. We can now define the overlap
The overlap is 1 if and only if matrices
A commonly-used measure is the subspace overlap of the first few
eigenvectors of covariance matrices. The overlap of the subspace spanned
by
The overlap will increase with increasing
Another useful check is the cosine content. It has been proven that the
the principal components of random diffusion are cosines with the number
of periods equal to half the principal component
index 170, 171.
The eigenvalues are proportional to the index to the power
When the cosine content of the first few principal components is close to 1, the largest fluctuations are not connected with the potential, but with random diffusion.
The covariance matrix is built and diagonalized by gmx covar. The principal components and overlap (and many more things) can be plotted and analyzed with gmx anaeig. The cosine content can be calculated with gmx analyze.