cluster services are required for the distribution and cannot be disabled. The
available Hadoop cluster services include the following:
system that spans all nodes in a Hadoop cluster for data storage. This service
replicates data across multiple nodes to avoid data loss.
specific management service that keeps track of container locations and the
root of volumes.
resource-management platform responsible for managing compute resources in
clusters and using them for scheduling users' applications.
infrastructure for cross-node synchronization that can be used by applications
to ensure that tasks across the cluster are serialized or synchronized.
speed read and write column-oriented database.
engine framework for Hadoop that facilitates easy data summarization, ad-hoc
queries, and the analysis of large datasets stored in HDFS and HBase. With
SQL-like semantics, Hive makes it easy for RDBMS users to transition into
querying unstructured data in Hadoop.
environment for coordinating complex data processing operations.
that aggregates the most common Hadoop components to improve user experience
and to enable users to avoid the underlying complexity and the command line
open-source data analytics cluster computing framework.
Indexer—A method for indexing data across the cluster.
SOLR—A method for
searching data across the cluster.
client-server tool that transfers bulk data between Hadoop and structured
datastores, such as relational databases.
massively parallel processing (MPP) SQL query engine that runs natively in
distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of streaming data into the Hadoop
Distributed File System (HDFS).
for analyzing large data sets that consists of a high-level language for
expressing data analysis programs. It allows developers to write complex
MapReduce transformations using a simple scripting language.
library of scalable machine learning algorithms that provide the data science
tools for clustering, classification, and collaborative filtering to
automatically find meaningful patterns.
Falcon—Simplifies the development and management of
data processing and provides out-of-the-box data management services.
extensible framework for building high performance batch and interactive data
processing applications, coordinated by YARN in Apache Hadoop.
distributed real-time computation system for processing large volumes of
scalable distributed system monitoring tool for high-performance computing
systems such as clusters and grids.