Description
CCA
Wine Data-set
Latent Variables
Loadings
Correlation Circles
\(\DeclareMathOperator{\cor}{cor}\) \(\newcommand{\matrice}[1]{\mathbf{#1}}\) \(\newcommand{\X}{\matrice{X}}\) \(\newcommand{\Y}{\matrice{Y}}\) \(\newcommand{\p}{\matrice{p}}\) \(\newcommand{\q}{\matrice{q}}\)
Originally defined by Hotelling in 1936 (Hotelling, 1936), canonical correlation analysis (CCA) is a statistical method whose goal is to extract the information common to two data tables that measure quantitative variables on a same set of observations. To do so, CCA computes two sets of linear combinations –called latent variables– (one for each data table) that have maximum correlation. To visualize this common information extracted by the analysis, a convenient way is
CCA generalizes many standard statistical techniques (e.g., multiple regression, analysis of variance, discriminant analysis) and can also be declined in several related methods that address slightly different types of problems (e.g., different normalization conditions, different types data).
If the two data matrices are called \( \X \) and \( \Y \), CCA looks for loading \(\p\) and \(\q\) such that they solve the following maximization problem:
\begin{equation} \max \cor(\p^\top \X^\top \Y \q) \label{cca} \end{equation}