Export (0) Print
Expand All

SQL Server Data Warehousing in Azure Virtual Machines

Updated: April 27, 2015

Use an optimized data warehouse image in Azure Virtual Machines to improve performance for data warehousing workloads by up to 20 percent over non-optimized SQL Server Enterprise virtual machine images.

This information is applicable for images created prior to 6/22/2014.

The instructions for provisioning, configuring, and using an optimized data warehouse image are slightly different than the instructions for SQL Server Enterprise images. This topic describes how to provision and use an optimized data warehouse built on one of the following SQL Server Enterprise images from the Azure Virtual Machine Gallery:

  • SQL Server 2012 SP1 for Data Warehousing on WS 2012

  • SQL Server 2014 CTP2 Evaluation for Data Warehousing on WS 2012

In this topic:

Use the following PowerShell script to provision an optimized data warehousing image. The PowerShell script provisions the virtual machine and attaches the disks.

For best performance:

  • Choose Size A6 for the SQL Server 2012 image.

  • Choose Size A7 for the SQL Server 2014 image.

  1. Before you begin, ensure you have the following prerequisites:

    • Valid active Azure subscription

    • Azure Powershell Version 3 or higher.

  2. Download the New-AzureSqlDwIaasVM.zip file from the Deploy a SQL Server Data Warehouse in Azure Virtual Machines page on the Azure Scripting Center.

  3. In Windows Explorer, right-click New-AzureSqlDwIaasVM.zip nd choose Extract All…. Extract all the files to the directory where you will run the script. The zip file contains the files:

    • New-AzureSqlDwIaasVM.ps1

    • New-AzureSqlDwIaasVM_DwIaasConfigGeneral.xml

  4. Run the script using the parameters and examples described in the script usage.

  5. To troubleshoot the New-AzureSqlDWDWIaasVM.ps1 script, see the ProvisionDetail.log file. This is located on your Windows client in the same folder where the script runs.

  6. Before using the virtual machine, some additional steps are required. You need to verify the data warehousing optimizations have completed successfully, and you need to set Geo-replication OFF. To finish the setup, perform Step 2 and Step 3 in Finish Setup for SQL Server Data Warehouse in Azure Virtual Machines.

To achieve high performance for your SQL Server data warehouse, we recommend the following configuration choices.

  • Use page compression for your data. This conforms to the Fast Track specification for data warehouses up to 400 GB.

  • Use only a single file per filegroup to prevent multilevel striping that can negatively affect throughput performance.

  • Use Windows Server Storage Pools to expose multiple data disks to SQL Server as a single mount point c:\Mount\Data. To use the mount point, store the database files under c:\Mount\Data which is also the default location of your database objects.

  • For most databases under 1 TB, use one filegroup and store it under c:\Mount\Data. This is the default file location. To use a different location, you need to attach a different disk or specifically create a file on the c: or d: or e: drive.

To gain additional benefits, you can explore using multiple filegroups to:

  • Load data faster by loading multiple tables or multiple partitions at the same time. Putting those tables in separate filegroups will prevent fragmentation during the parallel loads.

  • Use the sliding window technique to archive data partitions..

  • Store staged data separately from production data.

  • Store fast-changing data and slowly changing data in different locations.

The following example creates a database with multiple filegroups. Each filegroup has one file and all of the files are located under the Windows Storage Pools mount point c:\Mount\Data.

--If you want to explore multiple filegroups, this shows how to
-create a database with multiple filegroups, one file per filegroup, and 
--all files stored under the Windows Server Storage Pools mount point C:\Mount\Data.
IF EXISTS ( SELECT name from master.dbo.sysdatabases WHERE name = 'DWDB') 
         PRIMARY (
              NAME          = DWDB_root, 
              FILENAME      = 'C:\Mount\Data\DWDB\DWDB_root.mdf', 
              SIZE          = 10MB, 
              FILEGROWTH    = 1GB), 
         FILEGROUP FACT_TABLES (      
              NAME          = FACT_TABLES1,
              FILENAME      = 'C:\Mount\Data\DWDB\DWDB_fact_tables1.mdf',
              SIZE          = 300GB,
              FILEGROWTH    = 1GB),
              NAME          = NONVOLATILE_FG1, 
              FILENAME      = 'C:\Mount\Data\DWDB\DWDB_load1.mdf',
              SIZE          = 100GB,
              FILEGROWTH    = 1GB)
        LOG ON (      
              NAME              = DWDB_log1, 
              FILENAME             = 'C:\Mount\Data\DWDB\DWDB_log1.ldf',
              SIZE                 = 25GB,
              FILEGROWTH           = 1GB)

Before transferring data, both the on-premise SQL Server and the Cloud virtual machine must be able to see each other as if they were in the same network.

Migrating to SQL Server in an Azure Virtual Machine describes several options for migrating data to your virtual machine.

For efficient data transfer, we recommend using a point-to-site or a site-to-site VPN connection. Both connection types are efficient. The point-to-site VPN connection is a direct VPN between your on-premise SQL Server and the Cloud, and is easier to configure than the site-to-site connection. By using the site-to-site connection, you can expand your domain to the Cloud; this might be hard to do if your company has a large IT infrastructure. Site-to-site setup also requires specific hardware and help from your IT team.

For more information, see Azure Virtual Network Configuration Tasks on MSDN. Under that topic, see Configure a Site-to-Site VPN in the Management Portal or Configure a Site-to-Site VPN in the Management Portal.

To migrate data from on-premise to your virtual machine, use either the BCP utility, or SQL Server Integration Services (SSIS). These tools use bulk insert operations to move your data quickly. By using SSIS in combination with Point-to-Site or Site-to-Site connectivity, you can leverage existing SSIS packages to load data or to run your existing ETL packages after your database is running. If you already are using SSIS packages to deploy a production database, you can also use them to deploy your data warehouse in the cloud.

For more information, see the Azure Virtual Network Configuration Tasks on MSDN.

To restore a database backup, use Restore (Transact-SQL) and the WITH MOVE option. Restoring a database backup on a Windows Storage Spaces volume simplifies this procedure for databases that have complex on-premise file and filegroup layouts (e.g. different files sitting on different volumes). You can just move all the files to that single volume (using the WITH MOVE option) and let Storage Spaces stripe IO operations between disks.

© 2015 Microsoft