November 9, 2015

IDR 03: Finding the best features

I'm reviewing some training materials on Statistical Learning from an index of materials here.

I'm going over Chapter 4 again, refreshing my memory (and learning a bunch of new stuff) around classification problems in machine learning (Chapter 4 playlist).

The main problem that I'm having in my research right now is that I'm not confident WHICH of 1000s of possible features are actually contributing to my dependent variable.

So far, I've been reminded that I need to investigate my features with:

  • Box plots
  • Logistic Regression (glm in R), and Multiple Logistic Regression
  • Linear Discriminant Analysis
Also these videos are name-dropping some famous statisticians, which is always really inspiring. Tukey, Fisher, and friends ... maybe my next degree (Master's) will be in Statistics!

jk, family. But this really is fascinating.

