PolyBase Guide

SQL Server 2016 and later

Updated: December 8, 2016

THIS TOPIC APPLIES TO:yesSQL Server (starting with 2016)noAzure SQL DatabaseyesAzure SQL Data Warehouse yesParallel Data Warehouse

PolyBase is a technology that accesses and combines both non-relational and relational data, all from within SQL Server. In SQL Server 2016, it allows you to run queries on external data in Hadoop or Azure Blob Storage. Queries are optimized to push computation to Hadoop. In Azure SQL Data Warehouse, you can import data from Azure Blob Storage and Azure Data Lake Store.

Use Transact-SQL (T-SQL) statements to import and export data back and forth between relational tables in SQL Server and non-relational data stored in Hadoop or Azure Blob Storage. You can also query the external data from within a T-SQL query and join it with relational data.

To use PolyBase, see Get started with PolyBase.

PolyBase logical

To make good decisions, you want to analyze both relational data and other data that is not structured into tables —notably Hadoop. This is difficult to do unless you have a way to transfer data among the different types of data stores. PolyBase bridges this gap by operating on data that is external to SQL Server.

To keep it simple, PolyBase does not require you to install additional software to your Hadoop or Azure environment. Querying external data uses the same syntax as querying a database table. This all happens transparently. PolyBase handles all the details behind-the-scenes, and no knowledge about Hadoop or Azure is required to use PolyBase successfully.

PolyBase can:

  • Query data stored in Hadoop. Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.

  • Query data stored in Azure blob storage. Azure blob storage is a convenient place to store data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.

  • Import data from Hadoop, Azure blob storage, or Azure Data Lake Store Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.

  • Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store. Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.

  • Integrate with BI tools. Use PolyBase with Microsoft’s business intelligence and analysis stack, or use any third party tools that is compatible with SQL Server.

  • Push computation to Hadoop.The query optimizer makes a cost-based decision to push computation to Hadoop when doing so will improve query performance. It uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.

  • Scale compute resources. To improve query performance, you can use SQL Server PolyBase scale-out groups. This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.

This guide includes topics to help you use PolyBase efficiently and effectively.

Get started with PolyBaseBasic steps to install and configure PolyBase. This shows how to create external objects that point to data in Hadoop or Azure blob storage, and gives query examples.
PolyBase Versioned Feature SummaryDescribes which PolyBase features are supported on SQL Server, SQL Database, and SQL Data Warehouse.
PolyBase scale-out groupsScale out parallelism between SQL Server and Hadoop by using SQL Server scale-out groups.
PolyBase installationReference and steps for installing PolyBase with the installation wizard or with a command-line tool.
PolyBase configurationConfigure SQL Server settings for PolyBase. For example, configure computation pushdown and kerberos security.
PolyBase T-SQL objectsCreate the T-SQL objects that PolyBase uses to define and access external data.
PolyBase QueriesUse T-SQL statements to query, import, or export external data.
PolyBase troubleshootingTechniques to manage PolyBase queries. Use dynamic management views (DMVs) to monitor PolyBase queries, and learn to read a PolyBase query plan to find performance bottlenecks.

Community Additions