Polybase Expansion, Big Clusters Are Key Features in the New SQL Server 2019

Microsoft’s annual Ignite conference kicked off this week, and, in what some saw as a surprise, Microsoft announced the next version of its flagship database platform--to be branded as SQL Server 2019.

In previous years, SQL Server updates were released two to five years after their respective predecessors. More recently, however, there has been a yearly cadence of updates. Many had expected that to continue, but it's easy to see why the next version is SQL Server 2019 and not SQL Server 2018.

Indeed, with SQL Server 2019--the public preview of which can be downloaded here--Microsoft is expanding its bet on both open source technologies and Apache Spark. SQL Server 2017 ushered in the ability to deploy SQL Server in containers and added support for running on the Linux platform. This marked the first time that any of Microsoft’s SQL Server releases had been developed for any OS other than Windows, and served as the kickoff to Microsoft's adoption (or perhaps adaptation is a better term) of open source. We also saw the integration of Python and R programming languages aimed squarely at emerging use in the field of data science. In the 2016 release Microsoft embedded R and Python, as well as added support for JSON and the inclusion of the Polybase feature, which allowed for querying or importing data from sources including Hadoop, Azure Data Lake Store and Azure Blob Storage.

SQL Server 2019 sees an expansion of the external sources open to Polybase to include relational and non-relational data sources such as Oracle, SAP HANA, DB2, Postgres, MySQL, mongoDB, CosmosDB, Teradata and Spark.

Microsoft

Figure 1. The Expansion of Polybase in SQL Server 2019 – Image courtesy of Microsoft

Speaking of Spark, the recent collaboration between Databricks (a company founded by the creators of Apache Spark that aims to assist clients with cloud-based big data processing using Spark) and Microsoft. Azure Databricks, is an Apache Spark-based collaborative analytics service that has been rapidly adopted in the brief time it’s been publicly available. Databricks has set the stage for expansion of the integration between Apache Spark and Microsoft SQL Server in the SQL Server 2019 release, as well as with a feature called Big Data Clusters.

Microsoft

Figure 2. Big Data Cluster – Image courtesy of Microsoft

Big Data Clusters is the merging of many of these innovations and is the first big feature announcement surrounding Microsoft SQL Server 2019. Each cluster is composed of Spark, the SQL Server relational engine and an HDFS storage layer deployed in Kubernetes containers. Big Data Clusters are aimed at enabling improvements in intelligent application development using big data. With Big Data Clusters you can run Spark jobs to analyze both structured and non-structured data, develop and train models from data hosted virtually anywhere using Spark ML or SQL Server Machine Learning Services, and subsequently query the data from anywhere using notebooks in Azure Data Studio.

I never expected a day I’d be discussing release features of Microsoft SQL Sever in the same sentence as Linux, Oracle and Apache Spark, but it’s a brave new world. Microsoft’s SQL Server development is moving at a pace none of its competitors is matching. Now it’s a waiting game to see if the gamble is going to pay off.

Comments

Plain text