Posts filed under ‘Data’

“Hadoopla”

© Marqin Cook

By Terence Craig

I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system.   The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”

As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?).  But I certainly agree that there is too much Hadoopla (a great term).  If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.

To quote from a blog post I wrote a year ago:

“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus.  All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..

Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations.  But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels.  But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.

October 24, 2012 at 1:39 pm 1 comment

Big Data is Coming of Age in the Capital Markets—Wall Street and Technology’s Deep Dive into “Everything You Need to Know to Unlock Big Data’s Secrets” is a Must Read for All

By Mary Ludloff

In the “it’s a small world” category while we were in the midst of launching FinancePBI, the first financial services big data solution built for the cloud and designed to address the needs of the industry, Terence chatted with Melanie Rodier (@mrodier), a Senior Editor at Wall Street and Technology. The topic: big data and the capital markets. That 33-page report is now available and it’s a must read for anyone interested in big data and business.

Why a must read for all? Well, similar to the McKinsey report on big data in 2011, Wall Street and Technology’s big data deep dive covers a lot of ground that applies to any business or organization. In other words, specific industry requirements may be different but big data technology and process challenges are very similar. For example, Wall Street firms—like so many others—find themselves dealing with unstructured data from a variety of sources, including the web, social media, and mobile devices. While there’s value in that data, there are infrastructure issues and a looming talent shortage. Sound familiar? (more…)

May 3, 2012 at 6:41 pm 1 comment

Big Data and Cloud not a fit? Comments on Infoworld Article

By Terence Craig

Since Disqus seems to have completely eaten (bleh) my comment on @davidlinthicum’s very interesting InfoWorld post – Big data and the cloud: A far from perfect fit, I decided to just expand my comments and make a short blog post out of it. IMHO the problems that David is describing are more a reflection of problems with batch oriented technologies like Hadoop (more on my take on Hadoop here) in the cloud than a general problem for cloud based big data solutions.

Computing always has, and probably always will have, a bias towards creating batch focused technologies at the beginning of any large paradigm shift.   But as new technologies are absorbed, understood, and move from early adopter to more mainstream use, the batch paradigm will inevitably start to shift to streaming and real-time. We have seen this again and again (from punch cards to touch sensitive tablets, downloaded media to streaming media, DOM to SAX parsers, HTML to Ajax, paper maps to real-time GPS). The reason this evolution almost always occurs is simple: humans live and think in real-time and when our tools do as well we are more productive and happier.  So why do we have this bias for batch processing in our first generation computational technologies? Simply put, because batch processing is a lot easier.

(more…)

February 23, 2012 at 3:01 pm Leave a comment

McKinsey Study: Location, Location, Location, Part 2

By Mary Ludloff

Greetings one and all and happy new year! As promised, part 2 of my post on McKinsey’s drill-down into the tremendous benefits location data offers to new businesses (and business models) as well as to all of us. If you need to refresh your memory (since the author was a wee bit late in meeting her stated publishing date), part 1 is available here.  Certainly, the report, “Big data: The next frontier for innovation, competition, and productivity,” is chock full of illuminating ways that big data can be leveraged within specific industries, but personal location data is a somewhat different beast as it cuts across  industries. For example, telecom, retail, and media (through location-based advertising) all stand to reap tremendous rewards.

Now, as I said in part 1 and will state again in part 2: I have a bit of angst around the collection and use of personal location data (see my many posts on privacy or our book on “Privacy and Big Data”). But that does not negate what can be gained if it is properly collected and used and with the appropriate regulations and guidance in place (my gosh—I am beginning to sound like one of the privacy policies I hate to read!). Put simply: all company’s data collection and usage policies should be clearly stated and always offered on an opt-in basis. Okay, privacy issues have been dealt with so let’s move on! (more…)

January 9, 2012 at 8:23 pm 2 comments

No, Hadoop Doesn’t Own Big Data Analytics!

By Terence Craig

A number of folks have asked me if I was concerned about Microsoft’s  recent announcement that they would be partnering with HortonWorks and abandoning their own distributed processing technology for Hadoop.  While I thought this was an unfortunate choice on Microsoft’s part (the Dryad project’s implementation of multi-server Linq was pretty compelling), since HPC is a small part of Microsoft’s business, it probably made sense from a business standpoint.   In any case, we (as in all of us at PatternBuilders) are not concerned and just to be clear: we don’t believe that this announcement (or any other) means that the many Hadoop ecosystem players own the still forming big data analytics market.

That is not to say that the announcement isn’t proof of the strength of the Hadoop ecosystem. Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus.  All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. (more…)

December 12, 2011 at 1:41 pm 3 comments

Roundup: About 4 Tech Giants, All Things Private, Social Media Stats, Maps, and Big Data!

By Mary Ludloff

Greetings one and all! It’s been a while since I posted about the more interesting articles, blogs, videos, etc., that I have come across and I thought that now is as good a time as ever to cover some interesting items you may have missed in the past few weeks. The topics are far ranging, thoughtful, illuminating, and at times, contentious, but that’s why they are interesting. So without further ado, let’s get to it!

Four Tech Giants Battle It Out

If you haven’t already, set aside some time to read Fast Company’s take on the (coming soon) great tech war of 2012. The combatants? Apple, Facebook, Google and, Amazon. The prize? Us—I think! This thoughtful piece by Farhad Manjoo looks at how these four goliaths will battle it out on the technology innovation field to, essentially, win the hearts, minds, and wallets of all of us:

“Think of this: You have a family desktop computer, but you probably don’t have a family Kindle. E-books are tied to a single Amazon account and can be read by one person at a time. The same for phones and apps. For the Fab Four, this is a beautiful thing because it means that everything done on your phone, tablet, or e-reader can be associated with you. Your likes, dislikes, and preferences feed new products and creative ways to market them to you. Collectively, the Fab Four have all registered credit-card info on a vast cross-section of Americans. They collect payments (Apple through iTunes, Google with Checkout, Amazon with Amazon Payments, Facebook with in-house credits). Both Google and Amazon recently launched Groupon-like daily-deals services, and Facebook is pursuing deals through its check-in service (after publicly retreating from its own offers product).”

(more…)

October 20, 2011 at 7:45 am Leave a comment

All Together Now: All You Need is a Text Box!

By Terence Craig

All you need is text, Text is all you need (sing to the tune of The Beatles’ All you need is love).   If you are one of our regular readers you will remember that several months ago I wrote a manifesto on what the perfect analytics system would look like.  One of the last points was:

It must be as accessible as Excel (still the number one analytics tool in the world).

I was wrong – Excel is the number one non-specialized analytics tool in the world but in terms of usage, it is dwarfed in comparison to a very well know specialized analytics toolkit. The creators of this tool are a little company that you may have heard of:  it does no evil and analyzes the Internet to bring you back everything on the web based on a simple text query.  But behind that simple text box, Google has one of the most sophisticated analytics infrastructures in the world:

  • It can deduce your interests.
  • Give you the most relevant results.
  • And show you appropriate information based on them, as well as bring back highly personalized ads.

Google is not only the largest big data analytics company in the world, but it also has the easiest to use tools—proof that text is all you really need!

(more…)

October 14, 2011 at 3:22 pm 4 comments

Why I Dislike GPS Tracking (and My SmartPhone): Wired’s Article on Telecoms’ Retention of Personal Data

 By Mary Ludloff

Before I begin, I must admit my own personal bias: I have a love/hate relationship with personal devices and technology. Yes, I love that all the devices I now use have made my life so much easier in more ways than I can count (and keep track of). At the same time, I really do hate how much more information is captured about me and how there are so few regulations regarding the use of it. Now, if you read our (Terence and I co-authored) book on Privacy and Big Data or listened to our recent O’Reilly webcast you might not be surprised by this but, just in case, I needed to come clean before I dived into Wired’s article on how much data our major mobile providers are keeping about all of us. Put simply, it’s a lot.

The ACLU of North Carolina managed, under a Freedom of Information Act claim, to obtain a Department of Justice document entitled “Retention Periods of Major Cellular Service Providers.” This document (one page) was designed to help law enforcement agencies understand what information they could get from the major cellular service providers—Verizon, T-Mobile, AT&T/Cingular, Sprint, Nextel, Virgin Mobile—as well as how long that data was retained:

“Verizon, for example, keeps a list of everyone you’ve exchanged text messages with for the past year, according to the document. But T-Mobile stores the same data up to five years. It’s 18 months for Sprint, and seven years for AT&T… That makes Verizon appear to have the most privacy-friendly policy. Except that Verizon is alone in retaining the actual contents of text messages. It allegedly stores the messages for five days, while T-Mobile, AT&T, and Sprint don’t store them at all.”

(more…)

September 30, 2011 at 3:03 pm 1 comment

Real-time Analytics: It’s Always Decision Time!

By Mary Ludloff

Greetings all! I just came across a great video from eWEEK talking about the growing need for real-time (aka streaming) analytics:

“For years, business intelligence has provided valuable information to help executives and managers make decisions to increase sales, improve operations, and seize new business opportunities. With the quickening pace of business today and the need to make faster decisions based on more timely data, companies are complementing this data using information mined from social networks, mobile sensors, and even location-based information from smartphones. To get the best value from this wealth of new data sources, the data analysis must be done in real time. This allows decisions to be made based on the true conditions at that particular time.”

(more…)

September 23, 2011 at 12:04 pm 4 comments

Privacy and Big Data: Post-Book Thoughts, Mary’s POV

By Mary Ludloff

Well, our book is almost done—it’s now in production phase and Terence and I are finished with most of the heavy writing (unless our editor has some additional thoughts!). In terms of time, it really has not been that long since we signed on to do it—less than six months from initial concept to publication date.  In terms of thought and brain-power, well now, that’s a very different story!

It has been a long, arduous, sometimes acrimonious (in the nicest possible way, of course) journey. You know, working for a small, privately held company means that even in the best of times, you already have multiple jobs so when you add writing a book on top of those, you tend to get a little fractured. This means that your family and friends may get a wee bit irritated with you because you simply do not have time and even when you do, you are usually talking about some aspect of privacy. So, to all my friends and family (Terence can mea culpa in his own post) thank you for being so understanding and for reading and reviewing our chapters! (more…)

August 29, 2011 at 7:46 am 4 comments

Older Posts


Video: Big Data Made Easy

PatternBuilders Corporate

Special privacy section!

Previous Posts