It depends on how you define “effective.” If you use “effective” to mean model performance (like accuracy or R-squared), then interpretable variables should have no impact here. That is, applying PCA to variables that are more interpretable or less interpretable doesn’t affect what’s happening with PCA or, as a result, a model fit to variables after applying PCA to them.
However, when you apply PCA to a set of input variables (often denoted X), the newly transformed variables are no longer interpretable in the same way. For example, if two of your input variables are time
and money
, then you apply PCA to these input variables, the new variables you get Z1
and Z2
are no longer interpretable in the same way.
For example, if you build two linear regression models:
Y = b0 + b1*time + b2*money
Y = c0 + c1*Z1 + c2*Z2
then it is significantly easier to interpret the marginal effect of time
on your Y
variable in the first model (without PCA applied) than it is to interpret the marginal effect of time
on your Y
variable using the second model.
Applying PCA to your independent variables makes it significantly harder to interpret those variables later. So if you are tackling a data science problem and want to interpret how your inputs are associated with your output, PCA does make that more challenging.
I hope that this makes sense!