“Big Data” has definitely become an overused term eclipsing even the barrage of vendor connections to sell new solutions. It seems like any group that is dealing with data is now referring to it as “Big Data” and in some situations like large research data sets the term is technically correct. The actual definition “data sets that are too large and complex to manipulate or interrogate with standard methods or tools” does create a broad category. I think of “Big Data” from the manipulate or interrogate standpoint that requires techniques to manage (hadoop) and process the data (MapReduce) using computers with large amounts of RAM. And it gets very confusing as we apply our traditional relational DB and BI concepts. But I’m not the one worrying about how it works, I’m trying to figure out the most effective way to make it work, and that relates to skills, budgets and data centers.
A major stimulus for “Big Data” visibility at Missouri S&T is our commitment to offer new Graduate Certificate Programs through Distance and Continuing Education. This has created a flurry of activity in the Computer Science, Computer Engineering and Business Information Technology programs with respect to the creation of new courses and the associated support of “Big Data” teaching resources. We also have significant growth in the need for high performance and throughput computing so I ask why can’t all of the computing hardware be more effectively utilized across these disciplines. Maybe it can, but today we approach the challenge with our traditional operational methodology and the solutions don’t play well together. I was recently encouraged to find out that others in higher ed are exploring this terrain of HPC and hadoop operations. One of our collaborators, Kansas State Beocat, is running into resource scheduling challenges, but they hold out hope that there must be solutions.
So what can we do with a meager budget and limited infrastructure to become a player in “Big Data”. We start with enhancing our skill sets by adapting our traditional DBA talent to hadoop concepts and we steer our analytics specialists to experiment with these new BI tools. Luckily it is affordable to venture into hadoop based data management and there are plenty of open sources BI add ons to get your feet wet. This is building a strong foundation that may produce valuable breakthroughs for more effective teaching and research. But we are going to take this one step further.
Working with hadoop may establish some “Big Data” concepts that relate to the commercial space, similar to how working with mysql may simulate Oracle DB principles. But is that enough? Does higher education need to be offering teaching and research for what our employers use. I recognized a disconnect a few years back when I took over teaching an “Information Services” class for our business school. They had been teaching basic concepts of spreadsheet, programming and database to students that were being groomed as bean counters. I instead taught them basic concepts of ERP, CRM, BI, DW and had them actively participate in the web by way of blogging and understand SEO. The motivated students thrived and the others survived. I did get some validation from this approach when one of those students now pursuing her MBA commented how far ahead she was because of her understanding these real world solutions.
I mention this correlation between what we teach and research vs what the commercial world relies upon to explain why I am purchasing an SAP HANA platform to support teaching and research at S&T. Today I would equate HANA as the leader for the utilization of “Big Data” in the commercial sector. Sure it is based on hadoop but it is a fine tuned appliance specifically designed to produce results for the “Big Data” market place. I am finally ready to make the purchase but it has not been an easy process. I first got the idea when corporate partners who are always trying hire our SAP ERP trained business students mentioned their need for HANA experience. I then equated that to “Big Data” research partnerships especially with our engineering projects producing large amounts of diverse data. We uncovered some of this with our visualization efforts. But I could not find anyone at SAP that knew how to sell me a HANA solution that was not based on a commercial vertical market. Thankfully Hewlett-Packard who has a strong relationship to the HANA hardware appliance saw the opportunity. They had customers all around us who were cautious about committing to HANA because of the lack of qualified talent to drive it. HP saw the potential of S&T graduating students with actual HANA experience so they helped connect us to the right people in SAP to make this happen.
Is this investment in HANA strategic? That is my hope, but at a minimum I do believe that there will be tremendous value from the exploration. Any exposure for the students will be a win at least as long as HANA remains a commercial leader. And I believe having our own HANA system will open doors for corporate research collaboration by helping us to overcome licensing and intellectual property challenges. The side benefits may be the help for us in understanding how to position “Big Data” processing into our HPC mentality. Or applying this experience to challenges we have in managing our own cyber security, learning analytics, retention and recruiting. Maybe the greatest value is to help our academic culture explore a different path.
Update – 6/27/14 – Support for the HANA purchase is strong so we have moved forward with the purchase.