Using LINQ to Query Tables in Excel 2007
Summary: Learn how to use LINQ to query data that is stored in Excel 2007 worksheets.
Applies to: 2007 Microsoft Office System, Microsoft Office Excel 2007, Microsoft Visual Studio 2008
Michael Case, iSoftStone
June 2009
Microsoft LINQ allows for database style queries against object data. Although you cannot run LINQ queries directly against data stored in Microsoft Office Excel 2007 workbooks, it is possible to load the data into objects that do support LINQ. This Visual How To illustrates how to use the Open XML Formats to load information stored in Excel worksheets into generic lists that support LINQ. This Visual How To uses the Open XML Format SDK 2.0, which you need to install before you begin. |
Length: 07:23 | Size: 8.08 MB | Type: WMV file |
Code It | Read It | Explore It
Code It
This Visual How To walks through the following steps to illustrate the process of loading data from an Excel 2007 workbook into generic lists that support LINQ queries.
Creating an Excel Workbook with Sample Data.
Creating a Windows console application solution in Visual Studio 2008.
Adding references to the OpenXml assemblies.
Importing the OpenXml namespaces.
Adding the sample code to the solution.
Creating an Excel Workbook with Sample Data
The sample code presented with this Visual How To uses an Excel workbook named LinqSample.xlsx that contains two worksheets named Customer and Order that exists in the C:\Temp directory. You need to first create this sample workbook.
To create the LinqSample.xlsx sample workbook
Start Microsoft Excel 2007.
Create a new workbook named LinqSample.xlsx.
Rename the Sheet1 worksheet to Customer.
Insert a table into the Customer worksheet that has three columns named Name, City, and State. When you enter data into the table, keep the company names in the Names column unique.
Table 1. Sample Customer Information
Name
City
State
Company A
Seattle
WA
Company B
Bellevue
WA
Company C
Richland
WA
Figure 1. Creating the Sample Customer Worksheet
Rename the Sheet2 worksheet to Order.
Insert a table into the Order worksheet that has five columns named Number, Date, Customer, Amount, and Status. Enter data into the table using the company names entered in the Name column of the table in the Customer worksheet.
Table 2. Sample Order Information
Number
Date
Customer
Amount
Status
1111
1/15/2009
Company A
100
Closed
1112
2/12/2009
Company A
200
Closed
1113
3/17/2009
Company A
150
New
2221
12/11/2008
Company B
15
Closed
2222
12/12/2008
Company B
20
Closed
3331
3/5/2009
Company C
2000
Shipped
3332
3/14/2009
Company C
430
New
Figure 2. Sample Order Worksheet
Click OK to save the workbook.
Creating a Windows Console Application in Visual Studio 2008
This Visual How To uses a Windows console application as the framework for the sample code. The console application type was selected only for its simplicity. Other application types could use the same approach presented here.
To create a Windows Console Application in Visual Studio 2008
Start Microsoft Visual Studio 2008.
From the Project menu, click File, then click New.
In the New Project dialog box, click the Visual Basic Windows or Visual C# Windows type in the Project types pane.
In the Templates pane, click Console Application.
Name the project and solution LINQWithExcelTables.
Figure 3. Creating the Solution
Click OK to create the solution.
Adding a Reference to Required Assemblies
This Visual How To uses the Open XML Format SDK 2.0 to access data in an Excel workbook. The Open XML Format SDK 2.0 provides strongly typed classes for use with Open XML documents.
To add a reference to the OpenXml assemblies
In Visual Studio, select the Project and then the Add Reference menu.
From the .NET tab in the Add Reference dialog box, select the [DocumentFormat.OpenXml] assembly.
Click OK to add the reference.
Figure 4. Adding the Reference
Importing the OpenXml Namespaces
Importing the OpenXml Namespaces allows code to use strongly typed part and content classes when working with Open XML documents.
For a Visual Basic project, add the following lines to the top of the Module1.vb source file.
Imports DocumentFormat.OpenXml.Packaging
Imports Documentformat.OpenXml.Spreadsheet
For a C# project, add the following lines to the top of the Program.cs source file below the other using statements generated by Visual Studio.
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
Adding the Sample Code to the Solution
The following sample code uses the existing workbook named LinqSample.xlsx in the C:\Temp folder.
For a Visual Basic project, replace the contents of the Module1 module in the Module1.vb source file with the following code.
Sub Main()
' Declare variables to hold refernces to Excel objects.
Dim workBook As Workbook
Dim sharedStrings As SharedStringTable
Dim workSheets As IEnumerable(Of Sheet)
Dim custSheet As WorksheetPart
Dim orderSheet As WorksheetPart
' Declare helper variables.
Dim custID As String
Dim orderID As String
Dim customers As List(Of Customer)
Dim orders As List(Of Order)
' Open the Excel workbook.
Using document As SpreadsheetDocument = _
SpreadsheetDocument.Open("C:\Temp\LinqSample.xlsx", True)
' References to the workbook and Shared String Table.
workBook = document.WorkbookPart.Workbook
workSheets = workBook.Descendants(Of Sheet)()
sharedStrings = _
document.WorkbookPart.SharedStringTablePart.SharedStringTable
' Reference to Excel Worksheet with Customer data.
custID = _
workSheets.First(Function(sheet) sheet.Name = "Customer").Id()
custSheet = DirectCast( _
document.WorkbookPart.GetPartById(custID), WorksheetPart)
' Load Customer data to business object.
customers = _
Customer.LoadCustomers(custSheet.Worksheet, sharedStrings)
' Reference to Excel Worksheet with Order data.
orderID = _
workSheets.First(Function(sheet) sheet.Name = "Order").Id
orderSheet = DirectCast( _
document.WorkbookPart.GetPartById(orderID), WorksheetPart)
' Load Order data to business object.
orders = Order.LoadOrders(orderSheet.Worksheet, sharedStrings)
' List all customers to the console.
' Write header information to the console.
Console.WriteLine("All Customers")
Console.WriteLine("{0, -15} {1, -15} {2, -5}", _
"Customer", "City", "State")
' LINQ Query for all Customers.
Dim allCustomers As IEnumerable(Of Customer) = _
From customer In customers _
Select customer
' Execute query and write customer information to the console.
For Each c As Customer In allCustomers
Console.WriteLine("{0, -15} {1, -15} {2, -5}", _
c.Name, c.City, c.State)
Next
Console.WriteLine()
Console.WriteLine()
' Write all Orders over $100 to the console.
' Write header information to the console.
Console.WriteLine("All Orders over $100")
Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}", _
"Customer", "Date", "Amount", "Status")
' LINQ Query for all Order over $100.
' Join used to display Customer information for the order.
Dim highOrders = From customer In customers _
Join order In orders On customer.Name Equals order.Customer _
Where order.Amount > 100.0R _
Select customer.Name, order.Date, order.Amount, order.Status
' Execute query and write information to the console.
For Each result In highOrders
Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}", _
result.Name, result.[Date].ToShortDateString(), _
result.Amount, result.Status)
Next
Console.WriteLine()
Console.WriteLine()
' Report on customer orders by status.
' Write header information to the console.
Console.WriteLine("Customer Orders by Status")
' LINQ Query for summarizing customer order information by status.
' There are two LINQ Queries.
' Internal query is used to group orders together by status and
' calculates the total order amount and number of orders.
' External query is used to join Customer information.
Dim sumoforders = _
From customer In customers _
Select New With {customer.Name, _
.statusTotals = _
From order In orders _
Where order.Customer = customer.Name _
Group order.Amount By order.Status Into Group _
Select New With {.status = Status, _
.orderAmount = Group.Sum(), _
.orderCount = Group.Count() _
} _
}
' Execute query and write information to the console.
For Each customer In sumoforders
' Write customer name to the console.
Console.WriteLine("-{0}-", customer.Name)
For Each x In customer.statusTotals
Console.WriteLine(" {0, -10}: {2,2} orders totaling {1, 7}", _
x.status, x.orderAmount, x.orderCount)
Next
Console.WriteLine()
Next
' Keep console window open.
Console.Read()
End Using
End Sub
''' <summary>
''' Used to store Customer information for analysis.
''' </summary>
Public Class Customer
'Properties.
Private _Name As String
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _City As String
Public Property City() As String
Get
Return _City
End Get
Set(ByVal value As String)
_City = value
End Set
End Property
Private _State As String
Public Property State() As String
Get
Return _State
End Get
Set(ByVal value As String)
_State = value
End Set
End Property
''' <summary>
''' Helper method for creating a list of Customers
''' from an Excel worksheet.
''' </summary>
Public Shared Function LoadCustomers(ByVal worksheet As Worksheet, _
ByVal sharedString As SharedStringTable) As List(Of Customer)
' Initialize Customer List.
Dim result As New List(Of Customer)()
' LINQ query to skip first row with column names.
Dim dataRows As IEnumerable(Of Row) = _
From row In worksheet.Descendants(Of Row)() _
Where row.RowIndex.Value > 1 _
Select row
For Each row As Row In dataRows
' LINQ Query to return the Row's Cell values.
' Where clause filters out any cells that do not contain a value.
' Select returns the value of the cell unless the cell has
' a Shared String.
' If the Cell contains a Shared String its value will be a
' reference id which will be used to look up the value in the
' Shared String table.
Dim textValues As IEnumerable(Of String) = _
From cell In row.Descendants(Of Cell)() _
Where cell.CellValue IsNot Nothing _
Select ( _
If(cell.DataType IsNot Nothing _
AndAlso cell.DataType.HasValue _
AndAlso cell.DataType.Value = CellValues.SharedString, _
sharedString.ChildElements( _
Integer.Parse(cell.CellValue.InnerText)).InnerText, _
cell.CellValue.InnerText) _
)
' Check to verify the row contains data.
If textValues.Count() > 0 Then
'Create a Customer and add it to the list.
Dim textArray = textValues.ToArray()
Dim customer As New Customer()
customer.Name = textArray(0)
customer.City = textArray(1)
customer.State = textArray(2)
result.Add(customer)
Else
' If no cells, you have reached the end of the table.
Exit For
End If
Next
' Return populated list of customers.
Return result
End Function
End Class
''' <summary>
''' Used to store Order information for analysis.
''' </summary>
Public Class Order
'Properties.
Private _Number As String
Public Property Number() As String
Get
Return _Number
End Get
Set(ByVal value As String)
_Number = value
End Set
End Property
Private _Date As DateTime
Public Property [Date]() As DateTime
Get
Return _Date
End Get
Set(ByVal value As DateTime)
_Date = value
End Set
End Property
Private _Customer As String
Public Property Customer() As String
Get
Return _Customer
End Get
Set(ByVal value As String)
_Customer = value
End Set
End Property
Private _Amount As Double
Public Property Amount() As Double
Get
Return _Amount
End Get
Set(ByVal value As Double)
_Amount = value
End Set
End Property
Private _Status As String
Public Property Status() As String
Get
Return _Status
End Get
Set(ByVal value As String)
_Status = value
End Set
End Property
''' <summary>
''' Helper method for creating a list of Orders
''' from an Excel worksheet.
''' </summary>
Public Shared Function LoadOrders(ByVal worksheet As Worksheet, _
ByVal sharedString As SharedStringTable) As List(Of Order)
' Initialize Order List.
Dim result As New List(Of Order)()
' LINQ query to skip first row with column names.
Dim dataRows As IEnumerable(Of Row) = _
From row In worksheet.Descendants(Of Row)() _
Where row.RowIndex.Value > 1 _
Select row
For Each row As Row In dataRows
' LINQ Query to return the value of the cells in the row.
' Where clause filters out any cells that do not contain a value.
' Select the value of the return cellunless the cell
' has a Shared String.
' If the cell has a Shared String, its value is a reference id
' which is used to look up the value in the Shared String table.
Dim textValues As IEnumerable(Of String) = _
From cell In row.Descendants(Of Cell)() _
Where cell.CellValue IsNot Nothing _
Select ( _
If(cell.DataType IsNot Nothing _
AndAlso cell.DataType.HasValue _
AndAlso cell.DataType.Value = CellValues.SharedString, _
sharedString.ChildElements( _
Integer.Parse(cell.CellValue.InnerText)).InnerText, _
cell.CellValue.InnerText) _
)
' Check to verify the row contains data.
If textValues.Count() > 0 Then
' Create an order and add it to the list.
Dim textArray = textValues.ToArray()
Dim order As New Order()
order.Number = textArray(0)
order.[Date] = New DateTime(1900, 1, 1).AddDays( _
Double.Parse(textArray(1)) - 2)
order.Customer = textArray(2)
order.Amount = [Double].Parse(textArray(3))
order.Status = textArray(4)
result.Add(order)
Else
' If no cells, then you have reached the end of the table.
Exit For
End If
Next
' Return populated list of orders.
Return result
End Function
End Class
For a C# project, replace the contents of the Program class in the Program.cs source file with the following code.
static void Main(string[] args)
{
//Declare variables to hold refernces to Excel objects.
Workbook workBook;
SharedStringTable sharedStrings;
IEnumerable<Sheet> workSheets;
WorksheetPart custSheet;
WorksheetPart orderSheet;
//Declare helper variables.
string custID;
string orderID;
List<Customer> customers;
List<Order> orders;
//Open the Excel workbook.
using (SpreadsheetDocument document =
SpreadsheetDocument.Open(@"C:\Temp\LinqSample.xlsx", true))
{
//References to the workbook and Shared String Table.
workBook = document.WorkbookPart.Workbook;
workSheets = workBook.Descendants<Sheet>();
sharedStrings =
document.WorkbookPart.SharedStringTablePart.SharedStringTable;
//Reference to Excel Worksheet with Customer data.
custID =
workSheets.First(s => s.Name == @"Customer").Id;
custSheet =
(WorksheetPart)document.WorkbookPart.GetPartById(custID);
//Load customer data to business object.
customers =
Customer.LoadCustomers(custSheet.Worksheet, sharedStrings);
//Reference to Excel worksheet with order data.
orderID =
workSheets.First(sheet => sheet.Name == @"Order").Id;
orderSheet =
(WorksheetPart)document.WorkbookPart.GetPartById(orderID);
//Load order data to business object.
orders =
Order.LoadOrders(orderSheet.Worksheet, sharedStrings);
//List all customers to the console.
//Write header information to the console.
Console.WriteLine("All Customers");
Console.WriteLine("{0, -15} {1, -15} {2, -5}",
"Customer", "City", "State");
//LINQ Query for all customers.
IEnumerable<Customer> allCustomers =
from customer in customers
select customer;
//Execute query and write customer information to the console.
foreach (Customer c in allCustomers)
{
Console.WriteLine("{0, -15} {1, -15} {2, -5}",
c.Name, c.City, c.State);
}
Console.WriteLine();
Console.WriteLine();
//Write all orders over $100 to the console.
//Write header information to the console.
Console.WriteLine("All Orders over $100");
Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}",
"Customer", "Date", "Amount", "Status");
//LINQ Query for all orders over $100.
//Join used to display customer information for the order.
var highOrders =
from customer in customers
join order in orders on customer.Name equals order.Customer
where order.Amount > 100.00
select new
{
customer.Name,
order.Date,
order.Amount,
order.Status
};
//Execute query and write information to the console.
foreach (var result in highOrders)
{
Console.WriteLine("{0, -15} {1, -10} {2, 10} {3, -5}",
result.Name, result.Date.ToShortDateString(),
result.Amount, result.Status);
}
Console.WriteLine();
Console.WriteLine();
//Report on customer orders by status.
//Write header information to the console.
Console.WriteLine("Customer Orders by Status");
//LINQ Query for summarizing customer order information by status.
//There are two LINQ queries.
//Internal query is used to group orders together by status and
//calculates the total order amount and number of orders.
//External query is used to join Customer information.
var sumoforders =
from customer in customers
select new
{
customer.Name,
statusTotals =
from order in orders
where order.Customer == customer.Name
group order.Amount by order.Status into statusGroup
select new
{
status = statusGroup.Key,
orderAmount = statusGroup.Sum(),
orderCount = statusGroup.Count()
}
};
//Execute query and write information to the console.
foreach (var customer in sumoforders)
{
//Write Customer name to the console.
Console.WriteLine("-{0}-", customer.Name);
foreach (var x in customer.statusTotals)
{
Console.WriteLine(" {0, -10}: {2,2} orders totaling {1, 7}",
x.status, x.orderAmount, x.orderCount);
}
Console.WriteLine();
}
//Keep the console window open.
Console.Read();
}
}
/// <summary>
/// Used to store customer information for analysis.
/// </summary>
public class Customer
{
//Properties.
public string Name { get; set; }
public string City { get; set; }
public string State { get; set; }
/// <summary>
/// Helper method for creating a list of customers
/// from an Excel worksheet.
/// </summary>
public static List<Customer> LoadCustomers(Worksheet worksheet,
SharedStringTable sharedString)
{
//Initialize the customer list.
List<Customer> result = new List<Customer>();
//LINQ query to skip first row with column names.
IEnumerable<Row> dataRows =
from row in worksheet.Descendants<Row>()
where row.RowIndex > 1
select row;
foreach (Row row in dataRows)
{
//LINQ query to return the row's cell values.
//Where clause filters out any cells that do not contain a value.
//Select returns the value of a cell unless the cell contains
// a Shared String.
//If the cell contains a Shared String, its value will be a
// reference id which will be used to look up the value in the
// Shared String table.
IEnumerable<String> textValues =
from cell in row.Descendants<Cell>()
where cell.CellValue != null
select
(cell.DataType != null
&& cell.DataType.HasValue
&& cell.DataType == CellValues.SharedString
? sharedString.ChildElements[
int.Parse(cell.CellValue.InnerText)].InnerText
: cell.CellValue.InnerText)
;
//Check to verify the row contained data.
if (textValues.Count() > 0)
{
//Create a customer and add it to the list.
var textArray = textValues.ToArray();
Customer customer = new Customer();
customer.Name = textArray[0];
customer.City = textArray[1];
customer.State = textArray[2];
result.Add(customer);
}
else
{
//If no cells, then you have reached the end of the table.
break;
}
}
//Return populated list of customers.
return result;
}
}
/// <summary>
/// Used to store order information for analysis.
/// </summary>
public class Order
{
//Properties.
public string Number { get; set; }
public DateTime Date { get; set; }
public string Customer { get; set; }
public Double Amount { get; set; }
public string Status { get; set; }
/// <summary>
/// Helper method for creating a list of orders
/// from an Excel worksheet.
/// </summary>
public static List<Order> LoadOrders(Worksheet worksheet,
SharedStringTable sharedString)
{
//Initialize order list.
List<Order> result = new List<Order>();
//LINQ query to skip first row with column names.
IEnumerable<Row> dataRows =
from row in worksheet.Descendants<Row>()
where row.RowIndex > 1
select row;
foreach (Row row in dataRows)
{
//LINQ query to return the row's cell values.
//Where clause filters out any cells that do not contain a value.
//Select returns cell's value unless the cell contains
// a shared string.
//If the cell contains a shared string its value will be a
// reference id which will be used to look up the value in the
// shared string table.
IEnumerable<String> textValues =
from cell in row.Descendants<Cell>()
where cell.CellValue != null
select
(cell.DataType != null
&& cell.DataType.HasValue
&& cell.DataType == CellValues.SharedString
? sharedString.ChildElements[
int.Parse(cell.CellValue.InnerText)].InnerText
: cell.CellValue.InnerText)
;
//Check to verify the row contains data.
if (textValues.Count() > 0)
{
//Create an Order and add it to the list.
var textArray = textValues.ToArray();
Order order = new Order();
order.Number = textArray[0];
order.Date = new DateTime(1900, 1, 1).AddDays(
Double.Parse(textArray[1]) - 2);
order.Customer = textArray[2];
order.Amount = Double.Parse(textArray[3]);
order.Status = textArray[4];
result.Add(order);
}
else
{
//If no cells, then you have reached the end of the table.
break;
}
}
//Return populated list of orders.
return result;
}
}
When you run the solution, customer and order information in the workbook is loaded into generic lists of customers and orders. You then run LINQ queries against the generic lists and send the output to the console.
Figure 5. Console Output
Read It
Using the Open XML SDK to load data from Excel workbooks into business objects allows you to use LINQ queries for analysis. The sample code shown with this Visual How To opens an existing workbook, loads data from worksheets into business objects, and then uses LINQ to analyze the information and display results to the console.
To see another approach to querying data stored in Excel tables, check out Eric White's Using LINQ to Query Excel Tables blog post. It provides sample code that includes a set of extension methods and classes that allow queries to be performed against Excel tables using LINQ in a manner that is similar to querying a SQL database.