November 21, 2015

IDR 10: Test data initialization

Yesterday, I was humbly reminded just how much I still *don't* know about R.

First, I learned that there is apparently no straightforward way to map a row from one data.frame into the columns of another. This meant that it would be much simpler for my code to store all training and "test" examples for each experiment in a shared data frame, with a new column indicating training vs. test. I finally managed to rewrite that code and now have a single CSV being dumped with the new column and all examples.

Then, I finally dealt with the fact that all of my data was being loaded with 0s and NAs instead of 0s and 1s. I was initializing a data.frame from a matrix of 0s. This led the data.frame to think that the only "level" in every variable in my model was "0" (the variables were at least correctly being interpreted as categorical variables or factors in R). So when I actually wanted to use a new value, this was essentially rejected, and the value set to "NA".

There are apparently some ways around this, but it was easier for me to rewrite my code to set the 0 and 1 values within the matrix instead. Now creation of the data.frame looks to be working correctly (I still have a 2 hour cycle time to run this on a real dataset).

So this turned out to be another frustrating day by the end, but it did have its moments of hope before these problems crept up. Here's hoping that today is a better one!

PS, I also got that dental work fixed that I mentioned a couple of posts ago. It actually hurt a bit more than the same procedure did last time, but hopefully that will go away.

IDR Series

No comments:

Post a Comment