Matt Brems (he/him)
1 min readMay 9, 2020

--

Thank you for the question, Adam! PCA doesn’t know how well each feature will predict Y because, as you correctly point out, Y isn’t involved when we fit PCA to our data.

The eigenvalue is defined to be the importance of each new (Z) feature. That’s because we’re equating importance with how much variability there is in that direction. Imagine a scatterplot of dots. If there’s a lot of spread in one direction, that is likely to indicate something meaningful (e.g. signal). If there’s very little spread in a direction, that may mean there’s not anything meaningful there (e.g. noise). The eigenvalue is, to my knowledge, measuring the spread for that principal component. Thus, a larger eigenvalue → more spread → more importance.

--

--

Matt Brems (he/him)
Matt Brems (he/him)

Written by Matt Brems (he/him)

Chair, Executive Board @ Statistics Without Borders. Distinguished Faculty @ General Assembly. Co-Founder @ BetaVector.

No responses yet