Hadoop Basics

At a high-level, Hadoop has two key components made up of Hadoop Distributed File System (HDFS) and Map Reduce.

Tyler Chessman

October 16, 2014

1 Min Read
green apple

In late 2011, Dr. David Dewitt presented a Big Data keynote session, focused primarily on Hadoop, at the Professional Association for SQL Server (PASS) Summit event. Dr. Dewitt's keynote session is a great primer for learning more about Hadoop. At a high-level, Hadoop starts with two key components:

  1. Hadoop Distributed File System (HDFS) – a distributed, fault tolerant file system.

  2. Map Reduce – a framework for writing/executing distributed, fault tolerant algorithms. Note that Map Reduce has recently undergone an overhaul, and is now referred to as either MapReduce 2.0 (MRv2) or YARN.

Other components, like Hive, Pig, etc., build on top of these components.

Main article: Integrating Hadoop with SQL Server

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like