In late 2011, Dr. David Dewitt presented a Big Data keynote session, focused primarily on Hadoop, at the Professional Association for SQL Server (PASS) Summit event. Dr. Dewitt's keynote session is a great primer for learning more about Hadoop. At a high-level, Hadoop starts with two key components:
- Hadoop Distributed File System (HDFS) – a distributed, fault tolerant file system.
- Map Reduce – a framework for writing/executing distributed, fault tolerant algorithms. Note that Map Reduce has recently undergone an overhaul, and is now referred to as either MapReduce 2.0 (MRv2) or YARN.
Other components, like Hive, Pig, etc., build on top of these components.
Main article: Integrating Hadoop with SQL Server