Export (0) Print
Expand All

Run MPI Applications on the A8 and A9 Compute Intensive Instances

Updated: March 5, 2015

You can run parallel Message Passing Interface (MPI) applications in Azure, by choosing the A8 and A9 compute intensive instances for your cloud compute resources. When you configure these instances in clusters of Windows-based worker roles or VMs to run a supported MPI implementation (such as Microsoft MPI, or MS-MPI), MPI applications communicate efficiently over a low latency, high throughput network in Azure that is based on remote direct memory access (RDMA) technology.

  • RDMA connectivity is not currently supported in Linux VMs created in the A8 or A9 size.

  • Azure also provides A10 and A11 compute intensive instances, with processing capabilities identical to the A8 and A9 instances, but without a connection to an RDMA backend network. To run MPI workloads in Azure, you will generally get best performance with the A8 and A9 instances.

In this topic:

Microsoft’s MS-MPI for Windows, starting with MS-MPI 2012 R2, is currently the only MPI implementation supported to access the Azure RDMA network from the A8 and A9 instances. The latest version is available for download here. RDMA communication between the Azure instances is based on the Microsoft Network Direct interface.

MS-MPI v5 is installed automatically with HPC Pack 2012 R2 Update 1 and can also be installed separately.

You can use HPC Pack to deploy a cluster of A8 or A9 instances in a cloud service (PaaS) or in virtual machines (IaaS). Download the HPC Pack 2012 R2 Update 1 installation package from the Microsoft Download Center. For introductory checklists and links to detailed guidance to deploy the A8 and A9 instances, see A8 and A9 Compute Intensive Instances: Quick Start with HPC Pack.

In IaaS deployments, the HpcVmDrivers Extension extension must be added to the compute node VMs to install drivers needed for RDMA connectivity.

To verify an HPC Pack deployment of the compute intensive instances, you can run the mpipingpong command on the cluster. mpipingpong sends packets of data between paired nodes repeatedly to calculate latency and throughput measurements and statistics for the RDMA-enabled application network. This example shows a typical pattern for running an MPI job (in this case, mpipingpong) by using the cluster mpiexec command.

This example assumes you added Azure nodes in a “burst to Azure” configuration. If you deployed HPC Pack on a cluster of Azure VMs, you’ll need to modify the command syntax to specify a different node group and set additional environment variables to direct network traffic to the RDMA network.

  1. On the head node or on a properly configured client computer, start a Command Prompt.

  2. To estimate latency between pairs of nodes in an Azure burst deployment of 4 nodes, type the following command to submit a job to run mpipingpong with a small packet size and a large number of iterations:

    job submit /nodegroup:azurenodes /numnodes:4 mpiexec -c 1 -affinity mpipingpong -p 1:100000 -op -s nul

    The command returns the ID of the job that is submitted.

    On an HPC Pack cluster deployed on Azure VMs, specify a node group that contains compute node VMs deployed in a single cloud service, and modify the mpiexec command as follows:

    job submit /nodegroup:vmcomputenodes /numnodes:4 mpiexec -c 1 -affinity -env MSMPI_DISABLE_SOCK 1 -env MSMPI_PRECONNECT all -env MPICH_NETMASK mpipingpong -p 1:100000 -op -s nul

  3. When the job completes, to view the output (in this case, the output of task 1 of the job), type the following:

    task view <JobID>.1

    where <JobID> is the ID of the job that was submitted.

    The output will include latency results similar to the following.

    Latency results from mpipingpong

  4. To estimate throughput between pairs of Azure burst nodes, type the following command to submit a job to run mpipingpong with a large packet size and a small number of iterations:

    job submit /nodegroup:azurenodes /numnodes:4 mpiexec -c 1 -affinity mpipingpong -p 4000000:1000 -op -s nul

    The command returns the ID of the job that is submitted.

    On an HPC Pack cluster deployed on Azure VMs, modify the command as noted in step 2.

  5. When the job completes, to view the output (in this case, the output of task 1 of the job), type the following:

    task view <JobID>.1

    The output will include throughput results similar to the following.

    Throughput results from mpipingpong

The following are considerations for running MPI applications on Azure instances. Some apply only to deployments of Azure nodes (worker role instances added in a “burst to Azure” configuration).

  • Worker role instances in a cloud service are periodically reprovisioned without notice by Azure (for example, for system maintenance, or in case an instance fails). If an instance is reprovisioned while it is running an MPI job, the instance will lose all its data and return to the state when it was first deployed, which can cause the MPI job to fail. The more nodes that you use for a single MPI job, and the longer the job runs, the more likely that one of the instances will be reprovisioned while a job is running. You should also consider this if you designate a single node in the deployment as a file server.

  • You do not have to use the A8 and A9 instances to run MPI jobs in Azure. You can use any instance size that is supported by HPC Pack. However, the A8 and A9 instances are recommended for running relatively large-scale MPI jobs that are sensitive to the latency and the bandwidth of the network that connects the nodes. If you use instances other than A8 and A9 to run latency and bandwidth sensitive MPI jobs, we recommend running small jobs, in which a single task runs on only a few nodes.

  • Applications deployed to Azure instances are subject to the licensing terms associated with the application. Check with the vendor of any commercial application for licensing or other restrictions for running in the cloud. Not all vendors offer pay-as-you-go licensing.

  • Azure instances cannot access on-premises nodes, shares, and license servers without additional setup. For example, to enable the Azure nodes to access an on-premises license server, you can configure a site-to-site Azure virtual network.

  • To run MPI applications on Azure instances, you must register each MPI application with Windows Firewall on the instances by running the hpcfwutil command. This allows MPI communications to take place on a port that is assigned dynamically by the firewall.

    For burst to Azure deployments, you can also configure a firewall exception command to run automatically on all new Azure nodes that are added to your cluster. After you run the hpcfwutil command and verify that your application works, you can add the command to a startup script for your Azure nodes. For more information, see Use a Startup Script for Azure Nodes.

  • HPC Pack uses the CCP_MPI_NETMASK cluster environment variable to specify a range of acceptable addresses for MPI communication. Starting in HPC Pack 2012 R2, the CCP_MPI_NETMASK cluster environment variable only affects MPI communication between domain-joined cluster compute nodes (either on-premises or in Azure VMs). The variable is ignored by nodes added in a burst to Azure configuration.

  • MPI jobs cannot run across Azure instances that are deployed in different cloud services (for example, in burst to Azure deployments with different node templates, or Azure VM compute nodes deployed in multiple cloud services). If you have multiple Azure node deployments that are started with different node templates, the MPI job must run on only one set of Azure nodes.

  • When you add Azure nodes to your cluster and bring them online, the HPC Job Scheduler Service immediately tries to start jobs on the nodes. If only a portion of your workload can run on Azure, ensure that you update or create job templates to define what job types can run on Azure. For example, to ensure that jobs submitted with a job template only run on Azure nodes, you can add the Node Groups property to the job template and select AzureNodes as the required value. To create custom groups for your Azure nodes, you can use the Add-HpcGroup Windows HPC PowerShell cmdlet.

© 2015 Microsoft