Wednesday, November 23, 2011

Hottest major on campus? Computer science


Check out this link: Hottest major on campus? Computer science:

It would seem, at least among the very best computer science undergraduate programs in the country, that computer science is becoming a very hot major again. We are finally seeing the return of enrollments that are approaching numbers that are becoming reminiscent of the Y2K era. While this is fantastic news for those of us that teach this discipline, I cannot help but to wonder why I am not reading more about how we are not going to repeat the past problems in higher ed. We saw a dramatic decline in enrollments between 2001-2005. The numbers stayed depressed, causing a large number of departments to struggle in order to stay in existence. Many programs that failed to find their niche in the post Y2K era, and folded. Some CS departments merged with others to survive. (It happened at my alma mater -- SUNY Albany. Thankfully, they survived and are doing well.) Many struggled to figure out how to reach a very different student body than what represented their own experience in learning CS. They tried (and continue to try) to embrace many new approaches to teaching computer science in an attempt to lure more students. A large fraction of these techniques involved introducing media- and game-based approaches to solving problems. Many found this difficult, and frequently yielded students that lacked a depth in abstract concepts required to do well in the more difficult concepts. (A topic for another time.) Regardless, this continued until around 2009-2010, when we started to see a slight incline in enrollments. And now, here were are today -- the press has once again taken notice of us! Just as the Whos shouted from Whoville, "We are here! We are here! We are here!"

Alas, we are becoming the saving grace of colleges and universities nationwide. Soon we will have the opposite problem: rapidly increasing enrollments. How can we handle this? It is a good problem to have, except when it isn't. I strongly advise and encourage department chairs to remember the past! During the Y2K glut in enrollments, many departments were forgetting the importance of maximizing student learning, because they were up to their foreheads in enormous paperwork and other infrastructure management issues trying to deal problems such as teaching introductory computer science classes that had 400-500 students!

My opinion is that the increase in interest is a very good thing. Let's try to find a balance between embracing this surge of interest and not adding more students to our program with no thought of managing the long-term growth and sustainability of our programs.

There are very few entities out there that can experience unprecedented growth without a correction of some sort. It happens with the stock market. It happens with individual business. And, it happens in higher ed, and in the job market. Without controlled, managed growth, the reinforcement required to support infrastructure that manages the growth will surely fail. It happens every time. All bubbles eventually pop.

Let's see if we can keep this good news lasting as long as possible. Let's remember why we are here. We are not here to increase our enrollment numbers in order to keep our programs surviving and our pockets loaded with cash. We are here to maximize the learning among our students, which will yield the very best computer scientists that our country has seen to date. We want them to not only find excellent jobs, but also be well prepared for graduate school. Yes, perhaps many will argue that it would be nice to graduate the next Jobs, Zuckerberg, or Gates from our departments. But, how about we embrace a bigger challenge among us: Let's ask ourselves what we can do to graduate the next Dijkstra, Hamming, McCarthy, Knuth, Liskov, etc. -- those that have set remarkable milestones in computer science. We do not need more people with their minds set on padding their pockets with multi-million dollars, nor do we need more people that bend toward every whim of the shareholder. We need people that can think about the science of computing itself, and dream big.

Wednesday, November 09, 2011

MIT Center for Mobile Learning

MIT Center for Mobile Learning @ The Media Lab

I definitely need to watch what this group does. If there is an obvious area where Computer Science departments should be investing their time and resources toward, it is mobile development. I know that it is something that some of us here at BU have discussed. It certainly represents a growing niche that companies are investing their resources in. The desktop / laptop model of computing is continuing to fade. We are doing a disservice to our students by not providing opportunities for them to learn more about mobile development before they graduate.

Thursday, November 03, 2011

DARPA Shredder Challenge

DARPA Shredder Challenge:

Now, this is a bit of a different data challenge. Come up with an algorithm that can reassemble a shredded document. The individual shredded pieces are available in TIF format. Then, you need to prove to DARPA that you were able to successfully reassemble the document by answer a question about information contained on the original document, and providing them with a reassembled image.

The award: $50,000

If interested, click on the link above.

Monday, October 10, 2011

NIH Grant Success Rate Likely Hit Historic Low in 2011 - ScienceInsider

I am only slightly surprised by this:

NIH Grant Success Rate Likely Hit Historic Low in 2011

I am an absolute cynic when it comes to our government involvement in education and research, particularly with respect to funding STEM (Science, Tech, Engineering, Math) fields. (Okay, I'll be honest -- I am cynical over all things related to government and politics.) There was some hyped-up hope coming from the liberal voices of our country preaching about how the Obama administration would change things around with respect to increased funding in these areas. My question was always, "With what resources?" As the saying goes, "You can't get blood from a stone." Yet, my observations have opened my eyes a bit more, and shown me that we have much bigger problems. Do not let anyone in government fool you into thinking we do not have resources. We do. They are highly misdirected. My concern is not over the availability of resources, but over the governing bodies in charge of directing those resources.

Historically, we have proven time and again that substantial investment into STEM education and research has yielded substantial economic gains, not to mention establishing ourselves as a formidable player in STEM work. We are rapidly losing ground in the worldwide race in STEM education and research. Not only is the educational system suffering from its substantial loss of funding, but, as evidenced by the current historically low fraction of NIH grants awarded, groups conducting research also are suffering.

I have rising fears over the future of this great country. I find myself often pondering where we will be in 10 years without a major overall of our entire governing body. We spend enormous amounts of taxpayer money guiding the country toward the aims of lobbyists looking out for their big blue chip companies. And, let us not neglect the enormous voice we increasingly give large religious organizations, with the Tea Party currently being their #1 voice. I personally do not have a problem with big companies making money, nor do I have a problem with religion and spirituality. My problem is their persuasive involvement in government leading to unnecessary funding of special interest groups and other unnecessary pork. Their involvement ties up extraordinary amounts of resources and do not reflect objective decision making on behalf of the general public of this country. Specifically in my field, they lead us even further away from fixing our current education and research crisis. We need people in government that can honestly represent current and future general public interests and concerns, and can objectively consider what is best for the long term placement of our country in the world economy.

I believe that our current spending is one of the most important predictors of the direction of our country. How much is going toward education and research?

For those that are interested in this growing concern, visit http://www.researchamerica.org. (Yes, I know... I accept the irony with me sending you to another lobbying group. But, do your own investigation, and let the facts speak for themselves. Simply put, the current spending patterns are wrong for our future.)

Friday, September 02, 2011

Android... in space?

Are you thinking that there must be a better use of that dual core processor in your Android-based phone than simply dealing with phone calls, text messages, and playing yet another round of Angry Birds? Alas! Indeed, there is. Here is one awesome example:



Wednesday, July 13, 2011

Computer Science Tops List of Best Major for Jobs

As reported here, computer science is currently topping the list of the best majors for students to elect for getting a job after you graduate. This is certainly good news for those that choose to pursue this path in college.

Remember that the path can be a difficult one. Fortunately, I do not see as many students coming into the major with the belief that computer science equals game development. However, I still see quite a number of students that are not ready for the required math courses; more often than not, they are not willing to work hard enough for the advanced math classes required for the BS, and often decide to pursue the BA.  I assure you that, for those that work hard and do well, the rewards are plenty. There are quite a number of development positions that prefer students with a solid math and science background.

Computer science is still one of the very few majors where you can land a decent job with good pay without a graduate degree. However, please do not take that as a statement suggesting that I believe grad school is a waste for a computer science student. On the contrary, the right graduate program can substantially help prepare a student for a solid career in a wide range of software development positions, particularly at companies whose primary product is software itself. I pursued a masters degree in the early 90s, and it substantially helped me land the position developing scientific instrumentation software. There are many more similar positions available today than there were back then.  Though these are available for the hard-working student pursuing their BS in Computer Science, I strongly urge good students to consider graduate school. It will open up the opportunities and move your average starting pay up quite a bit.

It is good to be beyond the Y2K bubble burst. It is my hope that we do not repeat the mistakes of the past...

Tuesday, June 14, 2011

Interested in exploring your own genome?

Interpretome is an interesting new project coming out of Stanford University that is designed to help people analyze their own genome (assuming it comes from 20andMe and Lumigenix.) The site lets you perform a variety of genotyping analysis, such as exploring your ancestry, searching for known single-nucleotide polymorphisms (SNPs) to determine your risk to a wide range of diseases, or even performing a pharmacogenomic analysis to determine your response to a wide range of drugs. Of course, the web site places a big disclaimer on its main page, indicating that its results are for educational purposes only, and should not be used for diagnostic purposes.

Thursday, June 02, 2011

Saturday, May 21, 2011

Raising awareness of the need for science and math students

The article I am referring to appeared today in CNN.com:

Why would-be engineers end up as English majors - CNN.com

Sure, I've taken some tough courses as an undergrad. In fact, I recall being in classes where the professor came right out during the first class and said, "About 25% of you will drop this course after the first exam, and only half of you will make it to the end." At first, I recall thinking, "Geez, this guy has a great bedside manner. Thankfully he is not the type of doctor that takes care of people." It was sobering. Despite that very first impression, at least for me, it served as a nice warning to make sure I keep up with all of the material. I got the point he was trying to make. So, how did the rest of that semester go? Did he jump through hoops to make sure I was engaged, interested, and motivated? No. Did he pay attention to everything I was doing to make sure I was keeping up with the material? Nope. Did we watch lots of cool videos? No. He pretty much lectured for the rest of the semester, and gave extensive, stressful assignments, and taught with the expectation that the majority of our learning would happen outside of the classroom. 

Though I might not adopt this scare tactic for classes I teach, I also do not see it as completely out of line. Let's get something straight: college is not supposed to be easy! Earning a bachelor's degree needs to mean something. Listen, if colleges were handing out these degrees with minimal effort on the part of the student, then what value would it place on the degree? Learning is work, and sometimes it is damn hard work. If you aren't working hard in college, then you're probably not learning much. If you are not learning about your field of interest, then how would we be adequately preparing you to get the best job offers in your field? The fact of the matter is that most employers pay attention to which colleges are offering the best programs, and they consider this information in considering prospective employees. Employers and graduate schools want students that are graduating well-prepared for their next phase of life. Thankfully, it is not the only information they consider, but it surely is a big part of it. Having hired many people during my years in industry, and having worked with other employers and "head hunters" shopping for talent, I know for certain that where you obtain your degree from matters. Not all computer science programs are created equal, and most employers are quite aware of this. If you are at a low-ranking school in computer science, then you better make sure you supplement your degree with quality internships. Look for opportunities to do research. Find a temporary job doing IT-related projects during your summers. (Even if you are at a very good school, you should still consider this!)

OK, going back to this article. It's a well known fact that our country is continuing to suffer with our STEM education. The article is certainly pointing the finger at STEM education in our education system, right from K through 12. But, the problem is also within higher education. Educators need to be more aware of just how much students have changed the way they learn and acquire knowledge. Like it or not, we are in a highly interactive, visual, media-based culture. Young kids today are using iPads, cell phones, and laptops for the majority of their communication and education. Educators (I'm preaching to myself here as well) need to carefully consider how students acquire knowledge today, and embrace it in our classrooms, our assignments, and our overall communication with students. I know personally how difficult this is for many of us Ph.D.'s. We busted our behind for many years in graduate school. We had to teach ourselves an extensive amount of material to do well in research. But, we are a unique breed. We should never expect our students to have the same drive for learning as we do. Again, the burden lies within us. We need to provide the motivation for our students to learn, and we need to do it in an interesting, relevant way.

Here was my big "take-away" from this article:

Hrabowski said many people assume they're not smart enough to study science or math. His response?

"No. Your teacher wasn't innovative enough."
I mostly agree with this. Again, I do not necessarily like it. It makes me uncomfortable. It requires me to step out of my zone of comfort and keep thinking of new ways to approach the material in my classroom. I need to consider ways to connect to the student right in the classroom, to encourage open discussion of topics, the relate the material to them and their world so they understand its relevancy. I need to help them better understand and appreciate why my topic is so important and relevant. I love what I teach (most of the time!) So, how can I convey this better? More consistently?

On a somewhat related note, I need to better address the dichotomy of students that I have in many of my classes with respect to their background and "comfort level" in the class. I so often find that in a typical class of 20 or so students, I have about 5 that answer the majority of the questions. I need to level this out.

I think it is extremely important to raise awareness of the problem. The article does that. How about some solutions?

Friday, May 13, 2011

Overstock.com $1 million data mining prize

It is good to be a data miner these days. Here is yet another opportunity to take part in a data mining competition.  The focus of this one is to develop a predictive model based on shopping patterns. The patterns are largely comprised of contextual and behavioral data collected from users while surfing for products (and hopefully making purchases) on online shopping sites.

For this contest, Overstock.com is partnering with RichRelevance to develop improved algorithms that can suggest the best recommendations of products to present to their online shoppers while on the site. The recommendations are based on an initial model induced from collected past patterns. The model is continually improved and refined based on present patterns collected while the customer is shopping. The central idea is to essentially predict the customer's future behaviors on your retail website, thereby allowing your website to quickly present them with the items they are most likely shopping for in the first place. My understanding of the contest is that contest participants will need to work with the base RecLab code for the context, and data is suppled at the context website after registration. See the contest website for more details.

The award: $100,000 for each 1% performance improvement over their existing model, up to a maximum of $1,000,000 possible!

See Overstock RecLab Prize for more information.

Sunday, May 08, 2011

You are not a Software Engineer, except when you are...

You'll have to read this post to understand where I'm going with this:

You are NOT a Software Engineer! - chrisaitchison.com

This is definitely worth a repost. It challenges the notion of the "engineering" aspect of software development. He has many good arguments. However, I must disagree with some of his reasoning for the use of gardening as a metaphor for software development. I cannot reject the notion of software engineering. I agree that most other engineering disciplines are much more regulated than software engineering, that's for sure. Perhaps that is why software bugs cost us billions of dollars every year? Other engineering disciplines are much more rigid in their processes, and rightfully so, most of the time, they need to be. As he argues, engineering a bridge surely needs to be right the first time! Many fields of software engineering can tolerate a certain level of "mistakes" or "bugs" that other engineering fields cannot.... well... can it really?

In justifying engineering with my students, I often ask them to reflect on the software that might be running aircraft, spacecraft, missiles, automobiles, or medical devices, just to name a few examples. Consider medical devices.  For example, a pacemaker is running software that was developed by a team of software engineers. I want to think that they were taking their job just as seriously as an engineering team building a bridge. In many fields of software development, bugs cost not only money, but much more importantly, bugs can cost lives. A gardener metaphor will not help this.

Perhaps we need more regulation in certain areas of software engineering?

Sunday, April 24, 2011

Tech jobs boom like it's 1999 - USATODAY.com

This is great news for students of computer science and related disciplines.

However, as one that was working in the IT industry through the 90s and through the turn of the millenium, one must ask -- are we going to repeat the past? Is this going to be another bubble, causing a overabundance of IT employees in, say, 5 years or so? Well, I do not see that happening. Let me explain.

My justifications are based on working in the industry for 11 years followed with teaching computer science at three different institutions since then. First, as an academic discipline, computer science is surely the most dynamic field there is; this has presented a challenge to higher ed. Many colleges failed to survive dismal enrollments over the past 5-7 years. Moreover, many of those that have survived are failing to keep themselves relevant, allowing themselves to teach outdated skills using even more outdated technology. Even if the student interest reached what it was in the late 90s, on average, I would say that the majority of institutions out there offering bachelors and masters programs in computer science do not have the infrastructure in place to handle the increased enrollments, thus causing the students to consider alternative paths.  Fortunately, enrollments in computer science programs have and will continue to rebound nicely over the next several years. The glut of students running into IT after their degree will not match what we saw in the late 90s simply because we do not have the academic infrastructure and placement in computer science in this country like we had in the 80s-90s.

A second problem (and a bigger problem, in my opinion) is the lack of adequate STEM education in secondary education. We might have much larger enrollments on our doorstep. But, will they be coming in with the same mathematical foundation that they did 15 years ago? My observations are not giving me high confidence that this will be the case. 

I appreciate the fact that in my current position, we meet on a semi-regular basis to review our curriculum. As a department, we bring our experiences and observations of the industry to the table, and discuss what needs to change in the curriculum. Sure, we might conclude that, for this year, we are OK. And that is fine. But, one reason I chose to come here was because my institution values the importance of being dynamic.  We strive to do what we can to ensure that we maintain a certain level of relevancy across the curriculum. Of course, the counterpoint to that is that we must maintain stability in the curriculum. And, there are various factors, both internal and external, that enforce this. But, for individual instructors, I believe that it is essential for us to consider what we might be able to alter to ensure the students have the best chance to learn what they can to get the best jobs and get into the best graduate schools. I am quite aware that, as a selective private liberal arts college with a good engineering school, we are getting very good students with well-above average STEM skills. I am aware that we will not face some of the issues that many other colleges will, though even the best schools are not immune to the continual decay of quality STEM education. I'm much more concerned for our field as a whole, outside of my little academic bubble. I spent many more years outside of this bubble than inside it, and have strong reason to be concerned.

We have a great opportunity before us to embrace the surging interest in CS and get our higher education programs in place to handle it. This is important, not only for higher education, but for ensuring the technological placement of our country. If we hope to regain some of this country's lost respect and placement with respect to technology, this is something we all need to work hard at. I don't care whether you are at a community college, a bottom tier state school, or a prominent R-1 research university. There is a golden opportunity on the doorstep of computer science. I know so many of us are weary and tired. We have had countless budget issues, merging of departments, threats to close programs, many of which were carried through. We have every reason to be weary, and yet every reason to shake off the past and do what we can to make it right.

With that said, I have comparative few answers, and many questions. I'm researching the options...

Wednesday, April 20, 2011

Data mining your gut

A study has been conducted recently by EMBL that involved a large-scale data analysis of, well, the DNA of your gut. Yes, at first glance, you might think that this is not exactly computer science, that is, until you understand the methods.

The article from the NY Times is a decent summary of the work: Gut Bacteria Divide People Into 3 Types

The research team had an enormous amount of data to work with after getting DNA sequence fragments from tissue sampled from the guts of 22 people. They then mapped the fragments to the genomes of 1,511 species of bacteria that have a reference genome publicly available. Doing a clustering of the results among these 22 people revealed an interesting pattern -- all bacteria fell into one of three clusterings over all people they analyzed.

It will be interesting to perform a much larger sample of people to see if this pattern holds. I'd also be interested in seeing if there is any relationship between these clusters and the rate of occurrence of various diseases. Finally, the study suggests that they may have discovered some species of bacteria that were unknown to date. That's not very surprising, given that every one of us hosts 100 trillion microbes. There's a pretty good chance that some of our microbes are generating some interesting mutations over time.

Once again, we have another example of data mining helping medical researchers discover more information about us.


Monday, April 04, 2011

Tech sector faces "serious and pervasive'' skills shortage

It's not just the U.S. that is suffering from a lack of people with skills in the IT sector. Canada also is suffering.

We need to see students returning to pursuing IT degrees with a rate that matches the rate they ran away during the dot-com bubble burst in the early 2000s. Currently, the derivative is going in the right direction, but the rate of increased enrollments in our programs will do little to fill the ever-increasing career opportunities for good computer science majors. We have a dire need for not only more CS majors, but more math and science majors in general, but perhaps not for the reasons you might think. Our need for more majors is not so much for the survival of our programs in academia anymore. More importantly, our need for majors is for the survival of the prominence of the IT sector in this country. I fear that the position of the U.S. as a major technological player will continue its decline against many other countries. Many countries saw the opportunity and substantial importance of STEM education needed for our students today, right from K-12 and certainly well through higher education. Did the U.S.? Have we learned from our mistakes? What are we doing today to increase awareness and improve the education of these important skills?


Thursday, March 31, 2011

Rock-Paper-Scissors: You vs. the Computer

Rock-Paper-Scissors: You vs. the Computer - NYTimes.com

This is a very simple application of Artificial Intelligence, but a great illustration of what so much AI, machine learning and data mining is based on -- inference from prior experience. In this application, the algorithm keeps track of your previous sequence of choices. It then searches for the same pattern of plays you have made against its database of 200,000 rounds of the game. Of course, it keeps your own sequence in its database, thus adding to its database of experience, hopefully improving its outcome against you, as well as others. The more rounds you play, the longer the sequence it has to use and look for an improved guess of what you're most likely to play next.

Wednesday, March 16, 2011

"We need to see ahead" - Dept. of Homeland Security

The government is aware of the enormous need for people that can help analyze large amount of data and extract information. Yes, you read that correctly -- even the U.S. government is aware of the problem! That is how bad the problem has become.

Janet Napolitano spoke at MIT recently, noting that U.S. Intelligence sifts through more digital information than is contained in the Library of Congress' collection of printed texts. She stated,

"It's about discovering meaning and information from millions and billions of data points. ... We therefore can't overstate the need for software engineers, and information systems engineers.... And we need that kind of talent working together to find new and faster ways to identify and separate relevant data from non-relevant data."
I cannot express how refreshing it is to have someone in government speak about the data and information crisis that we are already in the middle of.  I applaud her efforts to start reaching out to the academic community.

As noted in the article, the Dept. of Homeland Security is making a renewed effort to support the research of young scholars in academia whose research is relevant to the country's security goals. They have already started a program, giving 100 scholarships, fellowships and internships for students who pursue research projects that can help them out.

Excellent.


Friday, March 04, 2011

20th Century Classroom Techniques with 21st Century Students

From the perspective of a professor:

Teaching is the practice of conveying knowledge and information to students.

Learning is the process a student follows to acquire said knowledge imparted by the professor.

The art of teaching and learning has, and always will, come down to these definitions. They are independent of time. It might be a discussion of teaching and learning in the year 10,000 B.C., or those practices as they are conducted in the 21st century that we currently live in -- teaching and learning has always been.

For the time, I will define the degree of learning that a student achieves as a qualitative measure that is inversely proportional to rote, short-term memorization. We know that mere memorization does nothing to affect what the student has learned, and immediately departs the student's brain after an exam / quiz / course is finished. (Trying to assess learning in a way that yields an honest, objective, quantitative measure is an interesting topic as well -- one that I'll save for another time.)

Pedagogy - the methods and practices behind teaching - has substantially changed. (And if it hasn't for you, then it probably needs to.) This is true regardless of whether you are teaching computer science, biology, music, or Asian history. That is what this article in the Chronicle is focused on. Professors (including myself) are continually challenged every year to adapt our methods in a way that will maximize the degree of learning for the 21st century student.

Not surprising, this is not an easy task to accomplish. Even though I have never had a student ask me the question, "Uhhh, do I really need to show up?", I have had students consider not coming to class when they learn that I might be putting slides up online. In fact, students often expect their professors to use PowerPoint. And, though I don't put up all of my slides, student definitely expect it. (And, they seem to appreciate it... mostly.)  They certainly expect you to use technology. They appreciate good hi-tech demonstrations, gadgets, and discussions of topics that are relevant in their world, which is infiltrated with modern technology.  (Don't dare use an overhead projector!) With respect to course content management, they expect you to use the campus learning management system. (We use Blackboard and Moodle.) I have strived to learn how to use Blackboard effectively to maximize student learning... or is that student convenience? Students expect to have a central point of access for all class material available to them 24/7. Students expect your class to have an online presence. They appreciate good topic-related videos and other engaging exercises. One might even argue that students expect to be entertained to some degree. The course model of the 20th century is long gone!

I've both read and listened to talks from experts about the importance of active learning. I am a absolute believer in active learning in all disciplines, but I admit that this is something that I have a difficult time trying to incorporate in the classroom. I am still trying to figure out how to come up with engaging, active exercises in an introductory computer science lecture that focuses on learning to program in Java, or in teaching relational algebra in a database systems class. I can come up with several, but to set up the exercise / activity requires far more time than I have available. (The time management dilemma of the dedicated professor vs. the outsider's ignorant perspective that we have it easy - yet another topic for another day.) I try to abstain from pure preaching and talking to them, and repeatedly pause for exercises, have students come up to the board, and so on. I am quite certain that there is more I can do, but it goes back to that time resource!

I attend classes offered by the institutions I've taught at that discuss effective teaching strategies for the 21st century student.  It's interesting to hear the wide range of ideas out there. My colleague (Dr. Perrone) passed this article on to me, and I suppose it has caused me to reflect on my current practices, as well as consider how students learn today.  For me, this article has a few ideas that I have not seriously considered. First, have students take a quiz on their own learning and commitment levels to the course, and give the quiz in class. Sure, I have asked students to reflect on what they have learned, but to actually give them random "quizzes" is an interesting idea. It is a way of making the students actively think and realize that their commitment level to the class can affect their learning, even opening the floor for periodic discussion about the topic.

Another idea involves creating a central, course-wide blog and requiring students to participate in it. I have students keep individual journals online, but they are private, only viewable between myself and the student. A blog would be interesting, but would certainly require some moderation on my part, for sure. The article suggests that perhaps a course-focused Twitter account might be useful, inviting students to tweet about course topics, even during the lecture! This is an interesting way of openly recognizing that students are going to using their cell devices whether you like it or not. So, why not encourage them to do it in a productive way, discussing course topics? I'm not sure how I feel about that one though.

I think the big "take home" for me is a reminder of how students learn today.  It is something I need to strongly consider for a bit before I jump on any bandwagons here. This article is suggesting that most of learning is accomplished outside of the classroom. I don't disagree with that. I've always believed that. After all, we preach that students ought to be spending three hours for ever hour of lecture outside of class.  By no means should anyone read the article and think that the role of the lecture and classroom experience is irrelevant. I'm not sure the author is saying that. We just need to keep considering new pedagogy to keep our lecture relevant, modern, engaging, and active without robbing students of the opportunity to learn both inside and outside of the classroom.

It is not too hard for a professor to try to set aside time once in a while to assess the current tools and technologies out there, and reflect on ways that they can adapt new methods. My observations have been that students appreciate it.

It's certainly hard to try to figure out how to teach to different learning styles, especially today. I was a classically-trained student. I learned through lectures, I was expected to pay attention in classes during a time when students were kicked out of class for disruption. And, I was hardly entertained! I was expected to work outside of the classroom. In fact, I wanted to work outside of the classroom. But why? Why do so many students today expect so much more out of their professors? Why do many fail to recognize their important responsibility in their learning? (Why do I get such blank stares when I suggest that they work on additional exercises to increase their learning, even when I suggest that I will help them?) Alas, let's face it: professors are a very odd breed! We are just different. We went down the grueling path of obtaining a doctorate because we love learning, and find more enjoyment out of challenging ourselves with regard to what we have learned.

Our students are no better or worse than us. They, like us, are just different. They have their own learning styles, agenda, priorities, interest level in our subject, and so on.

Damn, why can't everyone just be like me? :-)

Wednesday, March 02, 2011

2011 KDD Cup - Predict Music Interests

Well, it's another year, and that means there is another opportunity to take part in one of the biggest data mining competitions held every year -- the KDD Cup, which is a part of the ACM SIGKDD conference. (FYI, KDD-2011 is in San Diego this year, August 21-24.)  This year, the competition is based on analyzing music ratings from users of Yahoo! Music. The challenge is to analyze the ratings and find information about how songs are grouped, how hidden patterns link various albums together, which artists complement each other, and so on.  The ultimate goal here is to develop a classifier that can analyze a user's music ratings, and come up with the best suggestions on which songs users would like to listen to next.

So, where does "data mining" come in?  The data consist of 300 million ratings performed by over 1 million anonymized users. The ratings are given to different types of items-songs, albums, artists, genres-all tied together within a known taxonomy.

The prize:
- 1st place: $5,000
- 2nd place: $2,000
- 3rd place: $1,000

(Yes, this is a bit less money than the Netflix prize a few years ago!)

Good luck!

Virtual Choir Joins Voices from 58 Countries


Virtual Choir Joins Voices from 58 Countries

Wow. I know this is not related to my usual topics, but I recommend that you pause for a moment and listen to these beautiful voices. The clip posted [click here for it on Youtube] is only a few minutes long. And for those audiophiles out there, yes, there is certainly a LOT of audio editing and additional effects applied to the raw vocal streams. This is a must in order to smooth out the noise and deal with enormous issues that must come with using using streaming audio over a webcam! I'm sure some of the volunteers perhaps have slightly better microphones, but from what I understand, if you listen to some of the individual recordings available via youtube, even with the very best singers, they are pretty rough sounding. Thus, audio processing is a must. And besides, who cares! This product is nothing short of wonderful. The music is peaceful, melancholy, and a bit haunting.... just wonderful. I'm looking forward to the final product soon.

Tuesday, March 01, 2011

Cutting commutes with data mining - bootstrappr - Blogs

There are enormous opportunities for data mining to help us better understand our data-centric world. Here's another example -- analyzing traffic data in order to predict commute times.


Wednesday, February 16, 2011

Another problem Watson can't solve


 The Brown Daily Herald 

I.B.M.’s Watson - Computers Close In on the ‘Paris Hilton’ Problem - NYTimes.com

I suppose that this is a little premature, given that we have not observed the final showing. Despite this, most people are speculating (rightfully so) that Watson will surely win the competition.

I.B.M.’s Watson - Computers Close In on the ‘Paris Hilton’ Problem - NYTimes.com

I enjoyed this article. What does this competition mean for us? What does IBM's feat say about the current state of artificial intelligence (AI)? Artifical intelligence strives to develop algorithms that mimic human intelligence.

Let's view ourselves as a model of computation. Like most computational systems, there are inputs, computation based on those input values, and subsequent outputs based on the input. We have five classes of input -- vision, audio, touch, taste, and smell. Our brains are constantly executing millions of "algorithms" that are processing these input streams into knowledge and information. We do some sifting to store important information in more readily available way (i.e. touching hot burner -- BAD!) Don't believe me? Read on.

Think about something simple, and keep track of the first thought that comes to mind... ready? Think about the term "dog". What was your first thought? Your current dog, related pictures of dogs, other four-legged animals, types of dogs you've seen in the past, perhaps your dog from your childhood, the smell of wet dog, common names of dogs, doggie messes on the floor, companionship, sadness over the loss of a dog, various breeds, your favorite breed, the fact that you need to pick up dog food for Fido, and so on. Over your lifetime to date, your brain has assembled enormous quantities of data about dogs. It inherently maintains an incredibly efficient knowledgebase of information related to dogs so that you can immediately retrieve such information when the need arises.

I gave you new input. It was a statement. You read a sentence to think about dogs and remember your first thought.  As you read on, you were easily able to recall a vast pool of information in the form of memories of experiences, literature read, etc. When we want to know more information about them, we easily consult the library for literature, surf online, speak with friends, neighbors, etc., continually feeding our five senses with more information. I brought up just one topic. How many other topics has your brain acquired information about?

Define intelligence. (Go ahead... look it up in Google, and see if you can come up with a consensus definition!) When you think of intelligence, what comes to mind? I would define it as the ability to acquire and store knowledge in a way such that it can be used to reason -- i.e. to think, understand and make sound judgments -- when faced with decisions. Under this definition, surely the majority of us are intelligent to some degree.

Quantifying intelligence is another topic of debate for many.

The current system of quantification being used is a competition on Jeopardy. At face value, it seems to be a fair premise. But, before jumping to enormous conclusions that this is the end of mankind, please carefully consider Watsons fine-tuned set of inputs, processing capabilities, knowledgebase, and output.

WATSON - 2800 CPUs, 15 TB of memory, 10 primary clusters. (I'm sure I'm missing details. Most of this was assembled from a couple of presentations on Watson.) It was fed an initial seed of approximately 200,000,000 pages of text documents, for which it needed to sift through and store knowledge about. This was NOT an easy feat! It needed to be able to extract what is important and discard what is not. For a computer to do this is truly a remarkable feat! It used this initial seed of knowledge and continued to improve itself.  How? As it continued to play repeated matches of jeopardy over the past couple of years, its algorithms would continue to learn from its mistakes, basically making corrections in its knowledgebase and more importantly, fine tune its underlying models and systems of information retrieval, working to continually find tune its parameters in order to improve its performance. Performance is a typical measure in machine learning -- get the best outcome with respect to minimizing risk.  For Jeopardy, we want to maximize the number of questions answered correctly against minimizing risk of losing money by answering questions that it is not 100% confident of the answer. 

With our definition of intelligence, Watson has surely demonstrated an artificial form of intelligence. It can acquire knowledge, store it and efficiently reason with it for personal gain.  A computer can far surpass our capability to recall specific facts about an extremely wide range of topics, limited only by computational resources.  I argue that the model for knowledge management in Watson is not the real win gere.  The only reason why Watson is able to store as much as it has is owed mostly to the availability of powerful systems that we did not have 30 years ago or more -- a time when AI was a real hot area. The real win with Watson is not so much in its ability to recall information from a database of topics, terms, concepts, etc. Again, you can thank its computational resources for that.  The real win with Watson is in its natural language processing!

Let's consider NLP. Watson was fed every question and document in pure text form. It did not have audio input to listen (which is why it would repeat mistakes made by other opponents.) We still have a way to go to get a computer to effectively understand spoken language effectively.

My final thoughts: Given its restricted system for deep analytics, in this context, Watson was a wonderful demonstration of AI for these specific purposes. It's display of deep analytics and knowledge storage and retrieval through natural language processing was amazing.  The real-world applications are obvious, ranging from helping doctors make medical decisions based on symptoms, insurance decisions, technical support assistance, military decisions, court document management, and so on. But, is it in par with all facets of human intelligence? Can it process the same types of inputs? Absolutely not. Watson has no capability for image recognition. (Notice, there were no pictures displayed, unlike real Jeopardy.) There is no ability to recall audio of famous speeches, which so often brings a new light to a standard text (think about MLK's 'I have a dream...' speech). It can not reason about important parts of life such as smell or taste (and sometimes I wish I couldn't either!) And most importantly, humans have other needs and face far more complex problems beside factual recovery. These decisions often require a good measure of intelligence without emotion, and yet the emotion can not be discarded through the process. Many of these decisions have no one specific right answer for all mankind. These decisions often have an enormous amount of individual context and experiences that must be considered.

Consider the complexities of relationships... Watson, help! Perhaps IBM's next project should be a new system called Freud.

PS - I hope to learn more about the algorithms behind Watson soon...












Tuesday, February 08, 2011

Challenges in Data Mining

I always enjoy reading material from Han. If you're interested in learning about some of the challenges still remaining in data mining, I encourage you to read through this paper.

ngdm09_han_gao.pdf (application/pdf Object)