No, Hadoop Doesn’t Own Big Data Analytics!

December 12, 2011 at 1:41 pm 3 comments

By Terence Craig

A number of folks have asked me if I was concerned about Microsoft’s  recent announcement that they would be partnering with HortonWorks and abandoning their own distributed processing technology for Hadoop.  While I thought this was an unfortunate choice on Microsoft’s part (the Dryad project’s implementation of multi-server Linq was pretty compelling), since HPC is a small part of Microsoft’s business, it probably made sense from a business standpoint.   In any case, we (as in all of us at PatternBuilders) are not concerned and just to be clear: we don’t believe that this announcement (or any other) means that the many Hadoop ecosystem players own the still forming big data analytics market.

That is not to say that the announcement isn’t proof of the strength of the Hadoop ecosystem. Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus.  All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet.

But just because a system can be used for analytics doesn’t make it an analytics system.  This is something that many Hadoop users discover after seeing the cost of the professional services required to turn a Hadoop distribution into an enterprise analytics system. Keep in mind that there are many things required to store and perform big data analytics. Of course, being able to scale across machines to deliver computing power is a significant issue but as you can see here, it is just the tip of the iceberg.   PatternBuilders is in the real time analytics business. We built our own technology on both the front and back ends to support analytics on large streaming data sets. We did this because while there were a lot of interesting technologies that service companies had built their businesses around, there weren’t any PRODUCTS focused on helping the enterprise fill real-time, big data analytics needs that could be implemented and maintained without a huge services effort.  Since our engine was deliberately focused on real time/streaming analytics, we built in integration points to make it easy to integrate data from batch systems (including Hadoop) when it made sense. For example, an individual’s influence in a social graph is an ideal calculation to do in batch mode.

In summary:  Hadoop is a great batch-focused distributing processing engine and I am glad that the work of that community is paying off for them, but they are not an enterprise analytics system! BTW, if this very mild piece gets any Hadoop loyalists screaming hatchet job go read my Twitter buddy Colin Clark’s piece on killing the elephant.

Entry filed under: Data, General Analytics, PatternBuilders Technology, Technology. Tags: , , , , , , .

McKinsey Study: Location, Location, Location, Part 1 Confessions of a Privacy Junkie (and a list of my favorite privacy resources!)

3 Comments Add your own

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

Video: Big Data Made Easy

PatternBuilders Corporate

Special privacy section!

Previous Posts

%d bloggers like this: