Once reserved for the Internet empires like Google and Yahoo, the most popular and well-known big data management system is now creeping into the enterprise.
Once reserved for the Internet empires like Google and Yahoo, the
most popular and well-known big data management system is now creeping
into the enterprise. There are two big reasons for that:
1) Businesses
have a lot more data to manage, and Hadoop is a great platform,
especially for combining both legacy old data, and new, unstructured
data
2) A lot of vendors are jumping into the game of offering support
and services around Hadoop, making it more palatable for enterprises
.
.
"Hadoop is unstoppable as its open source roots grow wildly and
deeply into enterprise data management architectures," Forrester
analysts Mike Gualtieri and Noel Yuhanna wrote recently in the company's
Wave Report on the Hadoop marketplace. "Forrester believes that Hadoop
is a must-have data platform for large enterprises, forming the
cornerstone of any flexible future data management platform. If you have
lots of structured, unstructured, and/or binary data, there is a sweet
spot for Hadoop in your organization."
So where do you start? Forrester says there are a variety of places
to go, and it evaluated nine vendors offering Hadoop services to find
the pros and cons of each. Forrester concluded that there is no clear
market leader at this point, with relatively young companies in this
market offering compelling services alongside the tech titans.
First, some background: Hadoop is an open source Apache project that
anyone can freely download the core aspects of - these include Hadoop
Common, Hadoop Distributed File System (HDFS), Hadoop YARN, and Hadoop
MapReduce. Many companies from IBM to Amazon Web Services, Microsoft and
Teradata all have packaged Hadoop into more easily-consumable
distributions or services. Each company takes a slightly different
strategy, but the key differentiator for all of these is that Hadoop has
the ability to distribute workloads across potentially thousands of
servers, making big data manageable data.
Note: This list is based on vendors listed in Forrester's Wave report
and is not meant to be all encompassing of Hadoop and big data
management platforms. It is listed in alphabetical order.
Amazon Web Services
Customers looking for a public cloud hosted Hadoop platform needn't
look much further than the company Forrester calls the "King of the
cloud" - Amazon Web Services. The company's Hadoop product is named
Elastic Map Reduce (EMR), which AWS says uses Hadoop to offer big data
management services. It is not pure open source Hadoop though, it's been
tinkered to run specifically on AWS's cloud.
Forrester says that EMR has the largest adoption of the Hadoop
platforms in the market. It already has a wide variety of partners that
offer services on top of EMR, such as ones that specialize in query,
modeling, integration and management. And AWS is innovating; on the
roadmap, according to Forrester, is the ability for EMR to automatically
scale and resize based on workload needs. The company plans to roll out
more robust support for EMR with its other products and services,
including its RedShift data warehouse, its newly announced Kenesis real-time processing engine
and it has plans to offer support for additional NoSQL databases and
business intelligence tools. The one thing AWS does not have is a Hadoop
distribution that users can run on their own premises, but the next two
companies specializes in that.
Cloudera
Cloudera has a distribution of the open source Hadoop, which uses
many aspects of the Apache project, but has a number of advancements on
top of that as well. Cloudera has developed a number of features for its
product, from a management and monitoring tool named Cloudera Manager,
to a SQL engine to run relational data on Hadoop named Impala. Cloudera
uses open source Hadoop for the basis of its distribution, but it is not
a pure open source product. When Cloudera's customers need something
that open source Hadoop doesn't have, they build it, or they find a
partner who has it. "Cloudera's approach to innovation is to be loyal to
core Hadoop but to innovate quickly and aggressively to meet customer
demands and differentiate its solution from those of other vendors,"
Forrester says. The result has been steady adoption of Cloudera's
platform, with more than 200 paying customers, Forrester says, some whom
have more than 1 petabyte under management across more than 1,000
nodes.
Hortonworks
Like Cloudera, Hortonworks is a pure-play Hadoop company. Unlike
Cloudera, Hortonworks sticks to the open source Hadoop code stronger
than perhaps any other vendor. Hortonworks' goal is about building up
the Hadoop ecosystem and Hadoop users, and advancing the open source
code. Its platform sticks closely to the open source code. Company
officials say this benefits users because it prevents vendor lock in (if
a Hortonworks customer ever did need to leave their platform, then they
could easily port applications off of the platform on to the open
source code). That's not to say Hortonworks does not innovate on top of
the open source code though. The company gives all of its work
developing the platform back to the open source community. An example of
this is Ambari, a tool developed by Hortonworks to fill a hole in the
project around cluster management. Hortonworks' approach has garnered strong partnerships for Hortonworks from vendors like Teradata, Microsoft, Red Hat and SAP.
IBM
When enterprises think of big IT projects, many think of IBM, and
rightly so. Because of that, IBM has become a major player in the world
of Hadoop projects. Forrester says IBM already has more than 100 Hadoop
deployments, and many customers with petabytes worth of data. The
company leverages its vast experience in grid computing, a global data
center and enterprise implementation experience to its big data
projects. "IBM's road map includes continuing to integrate the
BigInsights Hadoop solution with related IBM assets like SPSS advanced
analytics, workload management for high performance computing, BI tools,
and data management and modeling tools," Forrester says.
Intel
Like Amazon Web Services, Intel is leveraging and optimizing its
version of Hadoop to run on its hardware, specifically its Xeon chips.
For customers looking to push the limits of their Hadoop system and
looking for the closest affinity between the software and the hardware,
then Intel's distribution of Hadoop could be the one for you. Forrester
notes that Intel just recently rolled this product out though, so the
company is expected to innovate quite a bit on top of the version it has
in the market now. Intel and Microsoft were listed as "strong
performers" in the Hadoop marketplace, compared to the other seven
previously listed companies who were listed as "leaders."
MapR Technologies
MapR Technologies is perhaps the best Hadoop distribution company
that many people haven't heard of. In Forrester's survey of Hadoop users
that is used to compile its Wave report, MapR rated the highest for its
current offering, with the highest scores for its distribution's
architecture and data processing capabilities. The company's secret
sauce is a set of unique capabilities MapR has managed to work into its
version of Hadoop. For example, MapR's distribution supports Network
File Systems (NFS) and MapR has built up disaster recovery and high
availability features into its distribution. Forrester says MapR just
doesn't have the brand name recognition compared to Cloudera and
Hortonworks in the Hadoop market. Increased partnerships and marketing
could turn MapR into a major Hadoop company, though suggests.
Microsoft
Microsoft isn't historically known as being a company that embraces
open source software, but in this case it is taking strides to not only
enable Hadoop to run on Windows, but put forth code toward the open
source project to advance the Hadoop ecosystem more broadly. The fruits
of that labor are seen in Microsoft's public cloud Windows Azure's
HDInsight product. It's a Hadoop as a service offering based on
Hortonworks' distribution of the platform but specifically designed to
run on Azure.
Microsoft has some other nifty projects too, including a
production-ready feature named Polybase that allows information on
SQLServer to also be searched during Hadoop queries. "Microsoft's
significant presence in the database, data warehouse, cloud, OLAP, BI,
spreadsheet (PowerPivot), collaboration, and development tools markets
offers an advantage when it comes to delivering a growing Hadoop stack
to Microsoft customers," Forrester says. Like Intel, Microsoft was
listed as a "strong performer," but not a leader in this industry yet.
Pivotal Software
Last year EMC and VMware combined a handful of assets from each company to form Pivotal,
which is basically a spin-out from the companies. One of the big
aspects Pivotal is working on is a Hadoop distribution, along with the Cloud Foundry PaaS.
In doing so, Pivotal has added some tooling on top of the open source
code, specifically a SQL engine named HAWQ and a Hadoop appliance made
specifically for running the big data platform. Forrester says the
leading advantage of Pivotal's Hadoop platform is the integration
between its distro and other Pivotal, EMC and VMware products. Pivotal
will benefit from its EMC and VMware backing as well. Thus far, however,
the company only has fewer than 100 installations, mostly at small to
midsized customers, according to Forrester.
Teradata
A company like Teradata could see Hadoop as a threat or an
opportunity. The company specializes in data management, particularly on
the SQL and relational database side. So the rise of a NoSQL platform
like Hadoop could threaten the company. Instead, Teradata has embraced
Hadoop. By partnering with Hortonworks, Teradata now offers customers
the ability to use a Hadoop platform that's integrated with its SQL
offerings, giving existing Teradata customers a plug and play-ready
No comments:
Post a Comment