What is PCA in Machine learning?

Photo by Ev on Unsplash

What is PCA in Machine learning?

PCA is a dimensionality reduction technique we use in Data science. PCA is a unsupervised learning technique, meaning it does not rely on labeled data. It has several application like Image compression, Data visualization and Exploratory data analysis ,etc.

To understand PCA, we have to understand projection.

Here Projection of ox onto OZ is P. So P will be

$$\frac{X^TZ}{||Z||^2}Z$$

Why we use projection?

See here, the least distance from X to the line below is ||XP||.

We use this concept in PCA

Let's see now, How we can reduce dimensionality of data. Imagine we have 5 datapoints dataset in 2d. Let's say, to store one intger we need 1 byte of memory, So we need 10bytes of memory. Now if we use PCA , We can reduce it to 7 bytes of data. How?

If we want to store these 5 data points in a file we simply can store the vector

$$\overrightarrow{AB} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$

Then we can store all constant for points a,b,c,d,e why? we can get a, b,c,d or e by doing this

$$a/b/c/d/e =constant \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$

So we have to store 5 constant and one vector which has 2 integer. So in total we have to store 7 bytes of data. Previously we had to store 10 bytes of data