bar graph showing improvement

Improving Hive Performance

Similar to SQL Server, Hive has a number of performance-related features. Hive supports indexes and, like the clustered ColumnStore index introduced in SQL Server 2014, an optimized table format called Optimized Row Columnar (ORC). Note: the use of ORC likely means abandoning the external table format—and explicitly loading data into an ORC designated table.

Hive also supports table partitioning. Partitions may be applied to external tables—you need to store the files in subfolders and then issue ALTER TABLE statements after creating the table. For example, we can store each state QWI file in a specific subfolder, and then add partitions as follows:

ALTER TABLE censusdb.qwi2 ADD PARTITION (state = ‘TX') LOCATION '/user/hadoop/censusqwi/TX';

After the partition is created, a new column (e.g., state) is added to the table schema and is available for use in queries.

Main article: Integrating Hadoop with SQL Server

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish