Tuesday, October 09, 2012

Big data, big jobs?

See the article Big data, big jobs? in a recent issue of ComputerWorld.

I teach an undergraduate course in data mining. It is definitely one of my favorite courses to teach.  I've been telling students over the past couple of years that this area is a huge opportunity in the job market, and will continue to blossom for years to come.

Yet, even as this article suggests, data mining is not for everyone.

Here are my observations: Unfortunately, I am not seeing enough students ready or willing to dive into some of the statistical math required to really understand how to get the most interesting information out of the data.  Being a good programmer is not enough for getting into data mining. Knowing a good programming language that is popular for data mining tasks (such as R) will be better.  However, knowing how to analyze, visualize, and extract relevant information from large data sets is important. To do this, you need to know what models to apply when, and why, and what to expect out of them.  You need to know how to evaluate model performance, and select appropriate parameters to improve it.  You need to understand causes of poor performance (e.g. noisy data, lack of preprocessing, etc.) I see too many students blindly meandering around the data mining landscape with programs such as SAS, SPSS, or even Weka, not really understanding why a particular model is behaving the way it is. (BTW, Weka is an absolutely wonderful piece of software for exploring data mining!)

The best foundation you can give yourself for a career in data mining is to give yourself a solid foundation in statistics and probability. Then, take a course in data mining! I find that failure to fully embrace these important topics makes it difficult for a student to understand the strengths and weaknesses of the wide range of statistical models and algorithms for inference and induction in data mining tasks. More importantly -- you will likely miss many hidden gems buried in the data, and this is what your potential employer is after.