Saving the Results of a LINQ to HPC Query to a New DSC File Set
When you use a LINQ to HPC query to process data items, you can call the ToDsc method to create a new DSC file set that contains the results of the query. The LINQ to HPC runtime partitions the results into DSC files on a per-vertex basis. This means that if you query a DSC file set that is partitioned into ten DSC files, transform the records of that file set, and then save the results into a new file set, the new file set will also use ten DSC files.
The data type of the result determines whether the new DSC file set is in binary format or text format. If the objects being saved are instances of the LineRecord class, the data is written in text format. Otherwise, the LINQ to HPC binary serialization format is used. For types that are saved in binary format, you can control whether the LINQ to HPC default serialization is used, or whether LINQ to HPC uses custom serialization that you provide. See Creating DSC File Sets for more information on serialization formats.
The following code converts text data into binary format. The example reads text lines from a DSC file set and writes out the length of each line into a new binary format DSC file set.
var config = new HpcLinqConfiguration("MyHpcClusterHeadNode");
var context = new HpcLinqContext(config);
string textFileSetName = ...
string lengthsFileSetName = ...
HpcLinqJobInfo info = context.FromDsc<LineRecord>(textFileSetName)
.Select(r => r.Line.Length)
.ToDsc(lengthsFileSetName)
.Submit();
info.Wait();
JobState jobState = info.GetJobState();
if (jobState != JobState.Finished)
{
Console.WriteLine("Job {0} did not finish.", info.JobId);
return;
}
Console.WriteLine("Line lengths are:");
var lineLengths = context.FromDsc<int>(lengthsFileSetName);
foreach (var line in lineLengths)
Console.WriteLine(" {0}", line);
The ToDsc method is a LINQ to HPC operator that does not return data to the user. Instead, it stores the results of a previous LINQ to HPC query into a new DSC file set. To execute the query, you must invoke the Submit method.
The Submit method blocks only until the new HPC job that will execute the query is created. If Submit fails to create a valid HPC job that will run the query, an exception is thrown. Submit does not wait for the LINQ to HPC job to complete. However, the method returns a status record, which is an instance of the HpcLinqJobInfo class.
If your program needs to wait for the query to complete, use the HpcLinqJobInfo object to access the underlying HPC job. You can make calls to HPC administrative methods to wait for the job to finish. This example uses the GetJobState helper method defined in the HpcLinqExtras project. See Monitoring a Running LINQ to HPC Query for its definition.