It depends on how you define “effective.”

1 min readOct 30, 2021

It depends on how you define “effective.” If you use “effective” to mean model performance (like accuracy or R-squared), then interpretable variables should have no impact here. That is, applying PCA to variables that are more interpretable or less interpretable doesn’t affect what’s happening with PCA or, as a result, a model fit to variables after applying PCA to them.

However, when you apply PCA to a set of input variables (often denoted X), the newly transformed variables are no longer interpretable in the same way. For example, if two of your input variables are time and money, then you apply PCA to these input variables, the new variables you get Z1 and Z2 are no longer interpretable in the same way.

For example, if you build two linear regression models:

Y = b0 + b1*time + b2*money

Y = c0 + c1*Z1 + c2*Z2

then it is significantly easier to interpret the marginal effect of time on your Y variable in the first model (without PCA applied) than it is to interpret the marginal effect of time on your Y variable using the second model.

Applying PCA to your independent variables makes it significantly harder to interpret those variables later. So if you are tackling a data science problem and want to interpret how your inputs are associated with your output, PCA does make that more challenging.

I hope that this makes sense!

Written by Matt Brems (he/him)

No responses yet