Export (0) Print
Expand All

Appendix A: Performing Common Administrative Tasks

This appendix shows you how to perform some common tasks that will help you to manage a LINQ to HPC cluster.

Adding nodes to the DSC

HPC to LINQ performs tasks on persistent data that is stored on compute nodes, which are registered with the DSC. Nodes that are registered with the DSC are called DSC nodes. HPC to LINQ schedules jobs so that a DSC node runs only a single task at a time. This means that jobs are scheduled sequentially. When you configure your system, either dedicate the entire cluster to LINQ to HPC workloads, or create a node group that is dedicated to those workloads. The following PowerShell script is an example of how to do this. It adds all the nodes in the LinqToHpcNodes group to the DSC.

$nodes = get-hpcnode -groupname "LinqToHpcNodes"
foreach ($n in $nodes) 
{
  $name = $n.NetBiosName
  dsc node add $name /temppath:c:\L2H\Temp /datapath:c:\L2H\Data /service:MyHeadNode
}


The samples download includes an HPC PowerShell function that is named dsc-nodes-add. This function adds nodes that are in a specific node group to the DSC. It takes a node group, and the other DSC parameters, as arguments. The function is located in the Admin.ps1 file. You can either use this function, or write your own HPC PowerShell script based on the above code. For more information about HPC PowerShell, see Appendix 6: Using HPC PowerShell.

Listing all the files in a DSC file set

To list all the files in a DSC file set, execute the following DSC commands.

  1. The following command lists all the files in the file set.

    DSC FILESET VIEW [fileset name] /FILES /service:HeadNodeName
    
    
  2. For each file, use the following command to list the file’s read paths on nodes in the cluster.

    DSC FILE VIEW [file name] /service:HeadNodeName
    
    

The following Windows PowerShell script shows how to automate this process.

function dsc-file-view
{
  if ($args[0] -eq $null -or $args.length -ne 1)
  {
    echo "Usage: dsc-file-view [file]"
    return
  }
  if ($env:CCP_SCHEDULER -eq $null) { (echo "Usage: Set the DSC service using dsc-service-set before calling this function."); return }

  $out = (dsc file view $args[0])
  
  if ($out.length -ge 1)
  {
    echo ("        " + $args[0].Trim() + "    " + $out[1])
    foreach ($f in $out[4..$out.length]) { echo ("            " + $f.Trim()) }
  }
}

function dsc-fileset-view
{
  if ($args[0] -eq $null -or $args.length -ne 1)
  {
    echo "Usage: dsc-fileset-view [fileset]"
    return
  }
  if ($env:CCP_SCHEDULER -eq $null) { (echo "Usage: Set the DSC service using dsc-service-set before calling this function."); return }

  $out = (dsc fileset view $args[0] /files)
  $found = $False

  foreach ($line in $out)
  {
    if ($line -like "*Files:") 
    { 
      $found = $True 
      echo "    Files in this file set:"
      continue
    }
    if (-not $found) { echo $line } else { dsc-file-view $line }
  }
}

The samples download includes a Windows PowerShell function that is named dsc-fileset-view. This function lists all of the files in a file set. It takes the name of a file set as an argument. This function is located in the User.ps1 file. You can use this function, or write your own Windows PowerShell script based on the preceding code.

Copying or renaming a DSC file

The Dsc.exe command-line application does not provide a way to rename or copy a DSC file set. However, the following script uses the DscService API to create a copy of an existing file set.

[System.Reflection.Assembly]::Load("Microsoft.Hpc.Dsc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35")
function dsc-fileset-copy
{
  if ($env:CCP_SCHEDULER -eq $null) { (echo "Usage: Set the DSC service using dsc-service-set before calling this function."); return }
  if ($args[0] -eq $null -or $args.length -ne 2)
  { 
    echo "dsc-nodes-add [source fileset] [destination fileset]"
    echo "e.g. dsc-fileset-copy myFileSet myNewFileSet"
    return
  }

  $dsc = new-object Microsoft.Hpc.Dsc.DscService($env:CCP_SCHEDULER)

  if (-not $dsc.FileSetExists($args[0])) { (echo "Error: Source fileset does not exist."); return }
  if ($dsc.FileSetExists($args[1])) { (echo "Error: Destination fileset already exists."); return }

  $original = $dsc.GetFileSet($args[0])
  $copy = $dsc.CreateFileSet($args[1], $original.CompressionScheme)

  foreach ($f in $original.GetFiles())
  {
    $copy.AddExistingFile($f)
  }
  $copy.Seal()

  echo ("File set " + $args[0] + " copied to " + $args[1] + ".")
}

This code also demonstrates how to access the Microsoft.Hpc.Dsc API directly from a PowerShell script. After you have copied the file set, you can use the DSC FILESET REMOVE command to delete the original file set.

It is important to realize that the AddExistingFile method creates a new entry in the DSC, and references the file within the new file set. It does not create new copies of the NTFS files. Similarly, the DSC FILESET REMOVE command removes the file set from the DSC, but the underlying files are not deleted if they are referenced by other file sets.

The samples download includes an additional PowerShell function that is named dsc-fileset-copy. This function copies a file set. It is located in the User.ps1 file.

Listing DSC file sets

You can use PowerShell to filter the output of the DSC FILESET LIST command. For example, the following command lists all DSC file sets, but ignores temporary files that are created by LINQ to HPC queries.

DSC fileset list /service:HeadNodeName | foreach ($_) { if ($_ -notlike "__HpcLinq_Temp*") { echo $_ }} 

The samples download includes an additional PowerShell function that is named dsc-fileset-list. This function lists all file sets in the DSC, and ignores all temporary files that are created by LINQ to HPC queries, as well as all temporary file sets that are created by the samples. It is located in the User.ps1 file.

Viewing node disk space usage

Here is how to view the free and allocated space on nodes in the DSC.

  1. The following command lists all of the nodes in the DSC.

    DSC NODE LIST /service:HeadNodeName
    
  2. The following command lets you view the allocated and free space for each node.

    DSC NODE VIEW [node name] /service:HeadNodeName
    

The samples download includes an additional PowerShell function dsc-node-view to list all the nodes in DSC showing their allocated and free space, and will produce output in list or tabular format (with the –t flag). This can be found in the User.ps1 file.

Removing a node from the DSC

Removing a node from the DSC is a multi-step process. Before you remove a node, you must ensure that any data it holds is replicated on other nodes in the cluster. The following procedures list the steps to follow. These procedures assume that there is a node group that is named LinqToHpcNodes, which contains all of the DSC nodes.

Replicating data

  1. To ensure that there are copies of the data, the replication factor must be greater than 1. The default replication factor is 3. If you have changed the replication factor to 1, you will need to set it to a higher number and make new copies of file sets to ensure that they have the new replication factor. The following command sets the replication factor to 3.

    DSC PARAMS SET ReplicationFactor 3
    
    
  2. Use the following command to disable background replication.

    DSC PARAMS SET PeriodicMaintenanceEnabled False
    
    
  3. Take all DSC nodes offline to ensure that there are no incoming jobs. Use the HPC Cluster Manager to do this. On the Node Management tab, select the DSC nodes, right-click the nodes, and then select Take Offline.

  4. Use the following command to ensure that replication is up-to-date by initiating background replication. Wait for the command to complete.

    CLUSRUN /nodegroup:LinqToHpcNodes HpcDscNodeAdmin.exe /r
    
    

If HpcDscNodeAdmin runs successfully, it should produce output that is similar to the following:

-------------------------- CLUSTER-HN returns 0 --------------------------
Replication starting.
267 out of 267 files replicated successfully.
All tasks successful
-------------------------- CLUSTER-CN1 returns 0 --------------------------
Replication starting.
396 out of 396 files replicated successfully.
All tasks successful
-------------------------- Summary --------------------------
2 Nodes succeeded
0 Nodes failed
If the CLUSRUN command fails, do not proceed with the next procedure, because it may destroy some data.

Removing the node

  1. The following command removes a node from the DSC.

    DSC NODE REMOVE [node name]
    
  2. To remove the node from the LinqToHpcNodes group, use the HPC Cluster Manager.

  3. Use the following command to initiate background replication. Wait for it to complete.

    CLUSRUN /nodegroup:LinqToHpcNodes HpcDscNodeAdmin.exe /r
    
  4. Bring all the DSC nodes back online.

  5. The following command enables background replication.

    DSC PARAMS SET PeriodicMaintenanceEnabled True
    

The DSC NODE REMOVE command fails if any of the files on the node have a replication count of 1. In this case, you need to carry out the following procedure.

Handling node removal failure

  1. Set the replication factor to 3 or greater (see step 1 in the “Replicating data” procedure at the beginning of this topic).

  2. Identify any file sets that have a replication factor of 1, and that contain files stored on the node that you want to remove.

  3. For each of these file sets:

    1. Create a new copy of each file set. These file sets will have the replication factor that you set in step 1 of the procedure in Replicating data. You can create a copy programmatically by using the AddExistingFile method, or you can create a new file set by using the original data with the DSC FILESET ADD command.

    2. Delete the original file set by using the DSC FILESET REMOVE command.

  4. Finally, retry the “Removing the node” procedure.

Node removal may also fail if there is insufficient room on the remaining nodes to store fully replicated copies of all the data in a file set. In this case, you may need to add additional storage. You can either insert additional nodes to replace the ones you want to remove, or delete unused file sets to make space.

Handling node failure

If a node fails, follow the steps outlined in the previous section, Removing a node from the DSC. In the unlikely event that the cluster simultaneously loses more nodes than the replication factor, you may lose data. In this case, delete the affected file sets, remove the nodes, replace or repair them, and then recreate the data.

Rebalancing file sets

A file set is considered unbalanced if its data is not uniformly distributed across the DSC nodes. You can use the dsc-fileset-view Windows PowerShell command that is located in the User.ps1 file to list all the files in a file set, as well as the location of the file replicas on each DSC node. For more information, see Listing all the files in a DSC file set.

If data is not evenly distributed across the nodes in a cluster, you can use a hash partitioning LINQ to HPC query to manually create a new, rebalanced file set. The following code shows a simple query that rebalances a file set by creating 5 files per DSC node.

HpcLinqConfiguration config = new HpcLinqConfiguration(SampleConfiguration.HeadNode);
using (HpcLinqContext context = new HpcLinqContext(config))
{
    int partitions = context.GetNodeCount() * 5;
    context.FromDsc<LineRecord>("FileSet")
        .HashPartition(r => r, partitions)
        .ToDsc("NewFileSet")
        .SubmitAndWait();
}

Changing the DSC cluster replication factor

The recommended DSC replication factor is 3, which is the default. This can be set to values between 1 and 4, allowing the cluster to be configured to maximize reliability or storage capacity. It is possible to change the replication factor by using the DSC PARAMS SET command.

If you change the cluster replication factor, any new file sets created will use the new factor; existing file sets and files are unchanged. If you create a new file set and add existing files to a file set by using AddExistingFile, these files will have their replication count set to the current cluster replication factor if it is higher than their current replication count.

The replication factor shown by DSC FILESET VIEW is a lower bound on the replication level of every file within the file set. It is the minimum replication factor guaranteed for every file in the file set.

This means that if you decrease the cluster replication factor, any existing files that are added to a new file set will retain their existing (higher) replication count. Newly created files will use the new (lower) factor. If you use the DSC FILESET VIEW command, you will see this lower factor. If you want to reduce the replication factor and force a new file set to contain only files with the new factor, then you must add them as new files, rather than using AddExistingFile.

Securing data sets

DSC file sets are made up of three elements: the file set’s properties (see Understanding DSC File Set Properties), the metadata that is associated with the file set, and the files that are stored on the cluster in NTFS. The DSC controls the permissions on the file set and its metadata, while NTFS secures access to the underlying files.

By default, users can read all DSC file set properties, and can enumerate file sets by using the DSC FILESET LIST and DSC VIEW commands. The following table lists the DSC permissions and their associated actions.

 

DSC permission level

Available actions

None

This is the default permission. You can read a file set's properties, but not the associated metadata.

Read

You can read the file set's properties and metadata, but you cannot modify the metadata.

ReadOrModify

You can read and modify a file set's properties and metadata. You must have this permission level to delete a file set.

The files that make up a DSC file set also have NTFS permissions associated with them. Because files within a file set can be modified only when the file set is created, and before it is sealed, there are only two NTFS permissions possible. The first is None, which means that a user cannot access the file set’s files. The other is Read, which means that a user can read the file set's files. Files created within the DSC, either by using the DscService API or the DSC FILESET ADD command, inherit the access control lists (ACLs) of the HpcData share on the node. File sets created as the output to LINQ to HPC queries have ACLs set on the files that allow only the job owner and administrators to read them.

The samples download includes a PowerShell function that is named dsc-fileset-permission. This function allows the user to set the ACLs on all files in a file set. It is located in the Admin.ps1 file.

Data management

Each DSC node has two shares. One is the HpcData share, and the other is the HpcTemp share. The first stores DSC data, user-generated file sets, and intermediate file sets that are created during query execution. The second stores information that is associated with particular LINQ to HPC jobs. Use of the HpcData share is governed by the size of the file sets that are created by users, and by the replication factor. The recommended value for the replication factor is 3, which is the default. Replication factors of 1 through 4 are supported, but you should weigh the tradeoffs between storage overhead and tolerance to node failures before you select a value. In addition to file sets that are created by users, additional temporary file sets are created as the query executes. These files have a lease time of 24 hours and are removed automatically.

The HpcTemp directory stores information related to LINQ to HPC queries. Information for each job is stored in a folder named {username}\{job ID} folder. This share is cleaned up automatically, and files relating to jobs that are older than 24 hours old are removed. Typically, each job creates approximately 10 MB of data.

The recommended configuration for these shares is to associate both of them with one volume, where that volume is constructed by using software striping across multiple physical disks. The striped disks should not be the system disk.



Show:
© 2014 Microsoft