Share via


Using LINQ to Query Tables in Excel 2007

Summary:  Learn how to use LINQ to query data that is stored in Excel 2007 worksheets.

Office Visual How To

Applies to:  2007 Microsoft Office System, Microsoft Office Excel 2007, Microsoft Visual Studio 2008

Michael Case, iSoftStone

June 2009

Overview

Microsoft LINQ allows for database style queries against object data. Although you cannot run LINQ queries directly against data stored in Microsoft Office Excel 2007 workbooks, it is possible to load the data into objects that do support LINQ.

This Visual How To illustrates how to use the Open XML Formats to load information stored in Excel worksheets into generic lists that support LINQ.

This Visual How To uses the Open XML Format SDK 2.0, which you need to install before you begin.

See It Dd920313.2ed23dc4-f8d3-4708-828d-aa4390957491(en-us,office.12).bmp

Watch the Video

Length: 07:23 | Size: 8.08 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the sample code

This Visual How To walks through the following steps to illustrate the process of loading data from an Excel 2007 workbook into generic lists that support LINQ queries.

  1. Creating an Excel Workbook with Sample Data.

  2. Creating a Windows console application solution in Visual Studio 2008.

  3. Adding references to the OpenXml assemblies.

  4. Importing the OpenXml namespaces.

  5. Adding the sample code to the solution.

Creating an Excel Workbook with Sample Data

The sample code presented with this Visual How To uses an Excel workbook named LinqSample.xlsx that contains two worksheets named Customer and Order that exists in the C:\Temp directory. You need to first create this sample workbook.

To create the LinqSample.xlsx sample workbook

  1. Start Microsoft Excel 2007.

  2. Create a new workbook named LinqSample.xlsx.

  3. Rename the Sheet1 worksheet to Customer.

  4. Insert a table into the Customer worksheet that has three columns named Name, City, and State. When you enter data into the table, keep the company names in the Names column unique.

    Table 1. Sample Customer Information

    Name

    City

    State

    Company A

    Seattle

    WA

    Company B

    Bellevue

    WA

    Company C

    Richland

    WA

    Figure 1. Creating the Sample Customer Worksheet

    Creating the Sample Customer Worksheet

  5. Rename the Sheet2 worksheet to Order.

  6. Insert a table into the Order worksheet that has five columns named Number, Date, Customer, Amount, and Status. Enter data into the table using the company names entered in the Name column of the table in the Customer worksheet.

    Table 2. Sample Order Information

    Number

    Date

    Customer

    Amount

    Status

    1111

    1/15/2009

    Company A

    100

    Closed

    1112

    2/12/2009

    Company A

    200

    Closed

    1113

    3/17/2009

    Company A

    150

    New

    2221

    12/11/2008

    Company B

    15

    Closed

    2222

    12/12/2008

    Company B

    20

    Closed

    3331

    3/5/2009

    Company C

    2000

    Shipped

    3332

    3/14/2009

    Company C

    430

    New

    Figure 2. Sample Order Worksheet

    Sample Order Worksheet

  7. Click OK to save the workbook.

Creating a Windows Console Application in Visual Studio 2008

This Visual How To uses a Windows console application as the framework for the sample code. The console application type was selected only for its simplicity. Other application types could use the same approach presented here.

To create a Windows Console Application in Visual Studio 2008

  1. Start Microsoft Visual Studio 2008.

  2. From the Project menu, click File, then click New.

  3. In the New Project dialog box, click the Visual Basic Windows or Visual C# Windows type in the Project types pane.

  4. In the Templates pane, click Console Application.

  5. Name the project and solution LINQWithExcelTables.

    Figure 3. Creating the Solution

    Creating the Solution

  6. Click OK to create the solution.

Adding a Reference to Required Assemblies

This Visual How To uses the Open XML Format SDK 2.0 to access data in an Excel workbook. The Open XML Format SDK 2.0 provides strongly typed classes for use with Open XML documents.

To add a reference to the OpenXml assemblies

  1. In Visual Studio, select the Project and then the Add Reference menu.

  2. From the .NET tab in the Add Reference dialog box, select the [DocumentFormat.OpenXml] assembly.

  3. Click OK to add the reference.

    Figure 4. Adding the Reference

    Adding the Reference

Importing the OpenXml Namespaces

Importing the OpenXml Namespaces allows code to use strongly typed part and content classes when working with Open XML documents.

For a Visual Basic project, add the following lines to the top of the Module1.vb source file.

Imports DocumentFormat.OpenXml.Packaging
Imports Documentformat.OpenXml.Spreadsheet

For a C# project, add the following lines to the top of the Program.cs source file below the other using statements generated by Visual Studio.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

Adding the Sample Code to the Solution

The following sample code uses the existing workbook named LinqSample.xlsx in the C:\Temp folder.

For a Visual Basic project, replace the contents of the Module1 module in the Module1.vb source file with the following code.

Sub Main()
  ' Declare variables to hold refernces to Excel objects.
  Dim workBook As Workbook
  Dim sharedStrings As SharedStringTable
  Dim workSheets As IEnumerable(Of Sheet)
  Dim custSheet As WorksheetPart
  Dim orderSheet As WorksheetPart

  ' Declare helper variables.
  Dim custID As String
  Dim orderID As String
  Dim customers As List(Of Customer)
  Dim orders As List(Of Order)

  ' Open the Excel workbook. 
  Using document As SpreadsheetDocument = _
    SpreadsheetDocument.Open("C:\Temp\LinqSample.xlsx", True)

    ' References to the workbook and Shared String Table. 
    workBook = document.WorkbookPart.Workbook
    workSheets = workBook.Descendants(Of Sheet)()
    sharedStrings = _
      document.WorkbookPart.SharedStringTablePart.SharedStringTable

    ' Reference to Excel Worksheet with Customer data. 
    custID = _
      workSheets.First(Function(sheet) sheet.Name = "Customer").Id()
    custSheet = DirectCast( _
      document.WorkbookPart.GetPartById(custID), WorksheetPart)

    ' Load Customer data to business object. 
    customers = _
      Customer.LoadCustomers(custSheet.Worksheet, sharedStrings)

    ' Reference to Excel Worksheet with Order data. 
    orderID = _
      workSheets.First(Function(sheet) sheet.Name = "Order").Id
    orderSheet = DirectCast( _
      document.WorkbookPart.GetPartById(orderID), WorksheetPart)

    ' Load Order data to business object. 
    orders = Order.LoadOrders(orderSheet.Worksheet, sharedStrings)


    ' List all customers to the console. 
    ' Write header information to the console. 
    Console.WriteLine("All Customers")
    Console.WriteLine("{0, -15} {1, -15} {2, -5}", _
      "Customer", "City", "State")

    ' LINQ Query for all Customers. 
    Dim allCustomers As IEnumerable(Of Customer) = _
     From customer In customers _
     Select customer

    ' Execute query and write customer information to the console. 
    For Each c As Customer In allCustomers
      Console.WriteLine("{0, -15} {1, -15} {2, -5}", _
        c.Name, c.City, c.State)
    Next
    Console.WriteLine()
    Console.WriteLine()


    ' Write all Orders over $100 to the console. 
    ' Write header information to the console. 
    Console.WriteLine("All Orders over $100")
    Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}", _
      "Customer", "Date", "Amount", "Status")

    ' LINQ Query for all Order over $100. 
    ' Join used to display Customer information for the order. 
    Dim highOrders = From customer In customers _
     Join order In orders On customer.Name Equals order.Customer _
     Where order.Amount > 100.0R _
     Select customer.Name, order.Date, order.Amount, order.Status

    ' Execute query and write information to the console.
    For Each result In highOrders
      Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}", _
        result.Name, result.[Date].ToShortDateString(), _
        result.Amount, result.Status)
    Next
    Console.WriteLine()
    Console.WriteLine()


    ' Report on customer orders by status. 
    ' Write header information to the console. 
    Console.WriteLine("Customer Orders by Status")

    ' LINQ Query for summarizing customer order information by status.
    ' There are two LINQ Queries. 
    ' Internal query is used to group orders together by status and 
    '   calculates the total order amount and number of orders. 
    ' External query is used to join Customer information. 
    Dim sumoforders = _
      From customer In customers _
        Select New With {customer.Name, _
          .statusTotals = _
            From order In orders _
            Where order.Customer = customer.Name _
            Group order.Amount By order.Status Into Group _
            Select New With {.status = Status, _
                             .orderAmount = Group.Sum(), _
                             .orderCount = Group.Count() _
                             } _
       }

    ' Execute query and write information to the console. 
    For Each customer In sumoforders
      ' Write customer name to the console. 
      Console.WriteLine("-{0}-", customer.Name)
      For Each x In customer.statusTotals
        Console.WriteLine(" {0, -10}: {2,2} orders totaling {1, 7}", _
          x.status, x.orderAmount, x.orderCount)
      Next
      Console.WriteLine()
    Next

    ' Keep console window open.
    Console.Read()
  End Using
End Sub

''' <summary> 
''' Used to store Customer information for analysis. 
''' </summary> 
Public Class Customer
  'Properties. 
  Private _Name As String
  Public Property Name() As String
    Get
      Return _Name
    End Get
    Set(ByVal value As String)
      _Name = value
    End Set
  End Property
  Private _City As String
  Public Property City() As String
    Get
      Return _City
    End Get
    Set(ByVal value As String)
      _City = value
    End Set
  End Property
  Private _State As String
  Public Property State() As String
    Get
      Return _State
    End Get
    Set(ByVal value As String)
      _State = value
    End Set
  End Property

  ''' <summary> 
  ''' Helper method for creating a list of Customers 
  ''' from an Excel worksheet.
  ''' </summary> 
  Public Shared Function LoadCustomers(ByVal worksheet As Worksheet, _
    ByVal sharedString As SharedStringTable) As List(Of Customer)

    ' Initialize Customer List. 
    Dim result As New List(Of Customer)()

    ' LINQ query to skip first row with column names. 
    Dim dataRows As IEnumerable(Of Row) = _
      From row In worksheet.Descendants(Of Row)() _
        Where row.RowIndex.Value > 1 _
        Select row

    For Each row As Row In dataRows
      ' LINQ Query to return the Row's Cell values. 
      ' Where clause filters out any cells that do not contain a value. 
      ' Select returns the value of the cell unless the cell has
      ' a Shared String.
      ' If the Cell contains a Shared String its value will be a 
      ' reference id which will be used to look up the value in the 
      ' Shared String table. 
      Dim textValues As IEnumerable(Of String) = _
        From cell In row.Descendants(Of Cell)() _
        Where cell.CellValue IsNot Nothing _
        Select ( _
          If(cell.DataType IsNot Nothing _
            AndAlso cell.DataType.HasValue _
            AndAlso cell.DataType.Value = CellValues.SharedString, _
          sharedString.ChildElements( _
            Integer.Parse(cell.CellValue.InnerText)).InnerText, _
          cell.CellValue.InnerText) _
        )

      ' Check to verify the row contains data. 
      If textValues.Count() > 0 Then
        'Create a Customer and add it to the list. 
        Dim textArray = textValues.ToArray()
        Dim customer As New Customer()
        customer.Name = textArray(0)
        customer.City = textArray(1)
        customer.State = textArray(2)
        result.Add(customer)
      Else
        ' If no cells, you have reached the end of the table.
        Exit For
      End If
    Next

    ' Return populated list of customers. 
    Return result
  End Function
End Class

''' <summary> 
''' Used to store Order information for analysis. 
''' </summary> 
Public Class Order
  'Properties. 
  Private _Number As String
  Public Property Number() As String
    Get
      Return _Number
    End Get
    Set(ByVal value As String)
      _Number = value
    End Set
  End Property
  Private _Date As DateTime
  Public Property [Date]() As DateTime
    Get
      Return _Date
    End Get
    Set(ByVal value As DateTime)
      _Date = value
    End Set
  End Property
  Private _Customer As String
  Public Property Customer() As String
    Get
      Return _Customer
    End Get
    Set(ByVal value As String)
      _Customer = value
    End Set
  End Property
  Private _Amount As Double
  Public Property Amount() As Double
    Get
      Return _Amount
    End Get
    Set(ByVal value As Double)
      _Amount = value
    End Set
  End Property
  Private _Status As String
  Public Property Status() As String
    Get
      Return _Status
    End Get
    Set(ByVal value As String)
      _Status = value
    End Set
  End Property

  ''' <summary> 
  ''' Helper method for creating a list of Orders 
  ''' from an Excel worksheet.
  ''' </summary> 
  Public Shared Function LoadOrders(ByVal worksheet As Worksheet, _
    ByVal sharedString As SharedStringTable) As List(Of Order)

    ' Initialize Order List. 
    Dim result As New List(Of Order)()

    ' LINQ query to skip first row with column names. 
    Dim dataRows As IEnumerable(Of Row) = _
      From row In worksheet.Descendants(Of Row)() _
        Where row.RowIndex.Value > 1 _
        Select row

    For Each row As Row In dataRows
      ' LINQ Query to return the value of the cells in the row. 
      ' Where clause filters out any cells that do not contain a value.
      ' Select the value of the return cellunless the cell
      '  has a Shared String.
      ' If the cell has a Shared String, its value is a reference id
      '  which is used to look up the value in the Shared String table.
      Dim textValues As IEnumerable(Of String) = _
        From cell In row.Descendants(Of Cell)() _
        Where cell.CellValue IsNot Nothing _
        Select ( _
          If(cell.DataType IsNot Nothing _
            AndAlso cell.DataType.HasValue _
            AndAlso cell.DataType.Value = CellValues.SharedString, _
          sharedString.ChildElements( _
            Integer.Parse(cell.CellValue.InnerText)).InnerText, _
          cell.CellValue.InnerText) _
        )

        ' Check to verify the row contains data. 
      If textValues.Count() > 0 Then
        ' Create an order and add it to the list. 
        Dim textArray = textValues.ToArray()
        Dim order As New Order()
        order.Number = textArray(0)
        order.[Date] = New DateTime(1900, 1, 1).AddDays( _
          Double.Parse(textArray(1)) - 2)
        order.Customer = textArray(2)
        order.Amount = [Double].Parse(textArray(3))
        order.Status = textArray(4)
        result.Add(order)
      Else
        ' If no cells, then you have reached the end of the table. 
        Exit For
      End If
    Next

    ' Return populated list of orders. 
    Return result
  End Function
End Class

For a C# project, replace the contents of the Program class in the Program.cs source file with the following code.

static void Main(string[] args)
{
  //Declare variables to hold refernces to Excel objects.
  Workbook workBook;
  SharedStringTable sharedStrings;
  IEnumerable<Sheet> workSheets;
  WorksheetPart custSheet;
  WorksheetPart orderSheet;

  //Declare helper variables.
  string custID;
  string orderID;
  List<Customer> customers;
  List<Order> orders;

  //Open the Excel workbook.
  using (SpreadsheetDocument document =
    SpreadsheetDocument.Open(@"C:\Temp\LinqSample.xlsx", true))
  {
    //References to the workbook and Shared String Table.
    workBook = document.WorkbookPart.Workbook;
    workSheets = workBook.Descendants<Sheet>();
    sharedStrings =
      document.WorkbookPart.SharedStringTablePart.SharedStringTable;

    //Reference to Excel Worksheet with Customer data.
    custID =
      workSheets.First(s => s.Name == @"Customer").Id;
    custSheet =
      (WorksheetPart)document.WorkbookPart.GetPartById(custID);

    //Load customer data to business object.
    customers =
      Customer.LoadCustomers(custSheet.Worksheet, sharedStrings);

    //Reference to Excel worksheet with order data.
    orderID =
      workSheets.First(sheet => sheet.Name == @"Order").Id;
    orderSheet =
      (WorksheetPart)document.WorkbookPart.GetPartById(orderID);

    //Load order data to business object.
    orders =
      Order.LoadOrders(orderSheet.Worksheet, sharedStrings);

    //List all customers to the console.
    //Write header information to the console.
    Console.WriteLine("All Customers");
    Console.WriteLine("{0, -15} {1, -15} {2, -5}",
      "Customer", "City", "State");

    //LINQ Query for all customers.
    IEnumerable<Customer> allCustomers =
        from customer in customers
        select customer;

    //Execute query and write customer information to the console.
    foreach (Customer c in allCustomers)
    {
      Console.WriteLine("{0, -15} {1, -15} {2, -5}",
        c.Name, c.City, c.State);
    }
    Console.WriteLine();
    Console.WriteLine();


    //Write all orders over $100 to the console.
    //Write header information to the console.
    Console.WriteLine("All Orders over $100");
    Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}",
      "Customer", "Date", "Amount", "Status");

    //LINQ Query for all orders over $100.
    //Join used to display customer information for the order.
    var highOrders =
      from customer in customers
      join order in orders on customer.Name equals order.Customer
      where order.Amount > 100.00
      select new
      {
        customer.Name,
        order.Date,
        order.Amount,
        order.Status
      };

    //Execute query and write information to the console.
    foreach (var result in highOrders)
    {
      Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}",
        result.Name, result.Date.ToShortDateString(),
        result.Amount, result.Status);
    }
    Console.WriteLine();
    Console.WriteLine();


    //Report on customer orders by status.
    //Write header information to  the console.
    Console.WriteLine("Customer Orders by Status");

    //LINQ Query for summarizing customer order information by status.
    //There are two LINQ queries.  
    //Internal query is used to group orders together by status and 
    //calculates the total order amount and number of orders.
    //External query is used to join Customer information.
    var sumoforders =
      from customer in customers
      select new
      {
        customer.Name,
        statusTotals =
            from order in orders
            where order.Customer == customer.Name
            group order.Amount by order.Status into statusGroup
            select new
            {
              status = statusGroup.Key,
              orderAmount = statusGroup.Sum(),
              orderCount = statusGroup.Count()
            }
      };

    //Execute query and write information to the console.
    foreach (var customer in sumoforders)
    {
      //Write Customer name to the console.
      Console.WriteLine("-{0}-", customer.Name);
      foreach (var x in customer.statusTotals)
      {
        Console.WriteLine("  {0, -10}: {2,2} orders totaling {1, 7}",
          x.status, x.orderAmount, x.orderCount);
      }
      Console.WriteLine();
    }

    //Keep the console window open.
    Console.Read();
  }
}

/// <summary>
/// Used to store customer information for analysis.
/// </summary>
public class Customer
{
  //Properties.
  public string Name { get; set; }
  public string City { get; set; }
  public string State { get; set; }

  /// <summary>
  /// Helper method for creating a list of customers 
  /// from an Excel worksheet.
  /// </summary>
  public static List<Customer> LoadCustomers(Worksheet worksheet,
    SharedStringTable sharedString)
  {
    //Initialize the customer list.
    List<Customer> result = new List<Customer>();

    //LINQ query to skip first row with column names.
    IEnumerable<Row> dataRows =
      from row in worksheet.Descendants<Row>()
      where row.RowIndex > 1
      select row;

    foreach (Row row in dataRows)
    {
      //LINQ query to return the row's cell values.
      //Where clause filters out any cells that do not contain a value.
      //Select returns the value of a cell unless the cell contains
      //  a Shared String.
      //If the cell contains a Shared String, its value will be a 
      //  reference id which will be used to look up the value in the 
      //  Shared String table.
      IEnumerable<String> textValues =
        from cell in row.Descendants<Cell>()
        where cell.CellValue != null
        select
          (cell.DataType != null
            && cell.DataType.HasValue
            && cell.DataType == CellValues.SharedString
          ? sharedString.ChildElements[
            int.Parse(cell.CellValue.InnerText)].InnerText
          : cell.CellValue.InnerText)
        ;

      //Check to verify the row contained data.
      if (textValues.Count() > 0)
      {
        //Create a customer and add it to the list.
        var textArray = textValues.ToArray();
        Customer customer = new Customer();
        customer.Name = textArray[0];
        customer.City = textArray[1];
        customer.State = textArray[2];
        result.Add(customer);
      }
      else
      {
        //If no cells, then you have reached the end of the table.
        break;
      }
    }

    //Return populated list of customers.
    return result;
  }
}

/// <summary>
/// Used to store order information for analysis.
/// </summary>
public class Order
{
  //Properties.
  public string Number { get; set; }
  public DateTime Date { get; set; }
  public string Customer { get; set; }
  public Double Amount { get; set; }
  public string Status { get; set; }

  /// <summary>
  /// Helper method for creating a list of orders 
  /// from an Excel worksheet.
  /// </summary>
  public static List<Order> LoadOrders(Worksheet worksheet,
    SharedStringTable sharedString)
  {
    //Initialize order list.
    List<Order> result = new List<Order>();

    //LINQ query to skip first row with column names.
    IEnumerable<Row> dataRows =
      from row in worksheet.Descendants<Row>()
      where row.RowIndex > 1
      select row;

    foreach (Row row in dataRows)
    {
      //LINQ query to return the row's cell values.
      //Where clause filters out any cells that do not contain a value.
      //Select returns cell's value unless the cell contains
      //  a shared string.
      //If the cell contains a shared string its value will be a 
      //  reference id which will be used to look up the value in the 
      //  shared string table.
      IEnumerable<String> textValues =
        from cell in row.Descendants<Cell>()
        where cell.CellValue != null
        select
          (cell.DataType != null
            && cell.DataType.HasValue
            && cell.DataType == CellValues.SharedString
          ? sharedString.ChildElements[
            int.Parse(cell.CellValue.InnerText)].InnerText
          : cell.CellValue.InnerText)
        ;

      //Check to verify the row contains data.
      if (textValues.Count() > 0)
      {
        //Create an Order and add it to the list.
        var textArray = textValues.ToArray();
        Order order = new Order();
        order.Number = textArray[0];
        order.Date = new DateTime(1900, 1, 1).AddDays(
          Double.Parse(textArray[1]) - 2);
        order.Customer = textArray[2];
        order.Amount = Double.Parse(textArray[3]);
        order.Status = textArray[4];
        result.Add(order);
      }
      else
      {
        //If no cells, then you have reached the end of the table.
        break;
      }
    }

    //Return populated list of orders.
    return result;
  }
}

When you run the solution, customer and order information in the workbook is loaded into generic lists of customers and orders. You then run LINQ queries against the generic lists and send the output to the console.

Figure 5. Console Output

Console Output

Read It

Using the Open XML SDK to load data from Excel workbooks into business objects allows you to use LINQ queries for analysis. The sample code shown with this Visual How To opens an existing workbook, loads data from worksheets into business objects, and then uses LINQ to analyze the information and display results to the console.

To see another approach to querying data stored in Excel tables, check out Eric White's Using LINQ to Query Excel Tables blog post. It provides sample code that includes a set of extension methods and classes that allow queries to be performed against Excel tables using LINQ in a manner that is similar to querying a SQL database.

Explore It