PCA

PCA is especially helpful when dealing with datasets with a large number of variables, as it allows you to transform these variables into a new set of uncorrelated variables, called principal components

  • Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables
  • These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components

Principal components

-dimensional data gives you principal components n n

  • PCA tries to put the maximum possible information in the first component, then the maximum remaining information in the second and so on
  • Organising information in principal components this way, will allow you to reduce dimensionality without losing much information, and this by discarding the components with low information and considering the remaining components as your new variables
  • An important thing to realise here is that the principal components are less interpretable and don’t have any real meaning since they are constructed as linear combinations of the initial variables

Example

In this example, all training instances lie close to a plane: this is a lower dimensional (2D) subspace of the high-dimensional (3D) space.

22

PCA’s main idea

  • PCA finds the axes (principal components) along which the data varies the most
  • These principal components are orthogonal (uncorrelated) and are ranked by the amount of variance they explain in the data
  • The main idea behind principal components is to select the axis that preserves the maximum amount of variance, i.e. it’s the line that maximises the average of the squared distances from the projected points (red dots) to the origin
  • The second axis is that accounts for the largest amount of the remaining variance And so on

How PCA works

  • Standardisation: The first step in PCA is to standardise the data to have a mean of 0 and a standard deviation of 1
  • Calculating Covariance Matrix: PCA calculates the covariance matrix of the standardised data. The covariance matrix represents the relationships between all pairs of variables in the dataset
    • The aim of this step is to understand how the variables (features) of the input data set are varying from the mean with respect to each other, or in other words, to see if there is any relationship between them

$$\sigma(\mathbf{x}_1, \mathbf{x}2) = \frac{1}{m-1} \sum{i=1}^{m} (\mathbf{x}_1^{(i)} - \bar{\mathbf{x}}_1)(\mathbf{x}_2^{(i)} - \bar{\mathbf{x}}_2) \text{ where } m \text{ is the number of samples}$$

How PCA works

Eigenvectors are special vectors that, when a matrix is applied to them, only get scaled (stretched or compressed) by a scalar factor called the eigenvalue: An symmetric matrix always has exactly orthogonal eigenvectors Av = λv n × n n

25

Eigendecomposition: PCA then performs eigendecomposition on the covariance matrix. This step yields eigenvalues and eigenvectors

  • The eigenvectors are unit vectors representing the direction of the largest variance of the data, while the eigenvalues represent the magnitude of this variance in the corresponding directions
  • Selecting Principal Components: PCA sorts the eigenvalues in descending order. The eigenvectors corresponding to the top k eigenvalues (where k is the desired number of dimensions) are selected as the principal components
  • Projection: The original data is projected onto the selected principal components, creating a new lowerdimensional representation of the data

source: Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. " O’Reilly Media, Inc."

Choosing the right number of dimensions

Instead of arbitrarily choosing the number of dimensions to reduce down to, one might want to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%)

Unless, of course, you are reducing dimensionality for data visualization, in that case, you will want to reduce the dimensionality down to 2 or 3

this code computes the minimum number
of dimensions required to preserve 95%
of the training set’s variance
setting n_components=d to be a float between 0.0 and
1.0, indicating the ratio of variance you wish to preserve

Plotting the variance

Yet another option is to plot the variance as a function of the number of dimensions

  • There will usually be an elbow in the curve, where the variance stops growing fast

27

Benefits of PCA

  • Dimensionality Reduction: PCA reduces the number of variables, making computations more efficient and effective
  • Visualization: Data with reduced dimensions can be visualized more easily, allowing for better understanding and interpretation
  • Feature Engineering: PCA can be used for feature engineering, creating new features that capture the most important information in the dataset

Let’s see how PCA works with the network traffic!