Table of contents
TOC
Collapse the table of content
Expand the table of content

Analytics Platform System processing and storage capacity

Barbara Kess|Last Updated: 1/6/2017

Your business requirements determine the number of Data Scale Units, and the size of the Compute node disks that you need in your Analytics Platform System (APS) appliance. Use these processing and storage calculations to guide your capacity purchasing and planning decisions.

Planning for processing capacity

Query performance for SQL Server Parallel Data Warehouse (PDW) depends heavily on the number of CPU cores working on your data in parallel. Within limits, increasing parallelism improves the massively parallel processing (MPP) query performance. Even if your data size is relatively small, the power of the MPP query engine is enhanced by having greater parallelism.

For example, an appliance with 12 Compute nodes has 192 CPU cores that process your data in parallel. That’s 192-way parallelism! An appliance with 56 Compute nodes has 896 cores all working in parallel. This magnitude of parallelism is not achievable without MPP computing.

As the number of Compute nodes increases, scaling out the appliance requires adding more than one Compute node at a time to get a noticeable benefit. Hardware vendors support only specific configurations of Data Scale Units to ensure that the benefit of scaling the appliance outweighs the cost of redistributing the data across more Compute nodes.

Data Scale Unit configuration examples - HPE

These are examples of the supported HPE configurations for Data Scale Uunits. They might vary from the most current supported configurations, but are provided as an example of how to increase capacity by approximately 20 percent.

Uplift is the percent capacity gain by increasing the Data Scale Uunits from one row to the next. For example, increasing the Data Scale units from 6 to 8 gives a 33% uplift in CPU cores and memory. It also increases the disk space which isn’t shown in this table.

Data Scale unitsCompute nodesCPU coresMemory (GB)Uplift
1232512-
24641024100%
3696153650%
48128204833%
510160256025%
612192307220%
816256409633%
1020320512025%
1224384614420%
1632512819233%
20406401024025%
24487681228820%
28568961433617%

Explanation:

Data Scale unit configuration examples – Dell, Quanta

These are examples of the supported Dell and Quanta configurations for Data Scale Uunits. They might vary from the most current supported configurations, but are provided as an example of how to increase capacity by approximately 20 percent.

Uplift is the percent capacity gain by increasing the Data Scale Uunits from one row to the next. For example, increasing the Data Scale units from 6 to 8 gives a 33% uplift in CPU cores and memory. It also increases the disk space which isn’t shown in this table.

Data Scale UnitsCompute NodesCPU CoresMemory (GB)Uplift
1348768-
26961536100%
391442,30450%
4121923,07233%
5152403,84025%
6182884,60820%
7213365,37617%
8243846,14414%
9274326,91213%
12365769,21633%
154572011,52025%
185486413,82420%

Planning for storage capacity

This table estimates that you could load and store up to 6 petabytes of uncompressed data onto a fully built Analytics Platform System appliance.

VendorDrive sizePhysical data storage Per Compute nodeMaximum Compute nodes per rackPhysical maximum data storage per rackEstimated maximum user data storage per rackMaximum racksEstimated maximum user data storage per appliance
HPE1 TB16 TB8128 TB320 TB72,240 TB
HPE2 TB32 TB8256 TB640 TB74,480 TB
HPE3 TB48 TB8384 TB960 TB76,720 TB
DELL1 TB16 TB9144 TB360 TB62,160 TB
DELL2 TB32 TB9288 TB720 TB64,320 TB
DELL3 TB48 TB9432 TB1080 TB66,480 TB

Explanation:

  • Drive size is 1, 2, or 3 TB for each Hardware vendor.

  • Physical data storage per Compute node = (Drive size) * (16 disks per Compute node). The mirrored disks are not included since they are for redundancy.

  • Maximum compute nodes per rack is specific to the hardware vendor.

  • Physical maximum data storage per rack = (Physical data storage per Compute node) * (Maximum Compute nodes per rack).

  • Estimated maximum user data storage per rack = (Physical maximum data storage per rack) * (5 for a 5:1 compression ratio) * (50% for logs and tempDB). This is a conservative estimate for the uncompressed user data that can be loaded and stored onto the appliance. This is an estimate and is not enforced by software. The actual user data storage depends on your data and your configuration.

  • Maximum racks is specific for each Hardware vendor.

  • Estimated maximum data storage per appliance = (Estimated maximum data storage per rack) * (Maximum racks). This is a conservative estimate of the grand total size of user data that you could load and store on a fully built appliance.

© 2017 Microsoft