Brief Intro and Solution
A common expectation is that the Merge method will correctly do the following:
-
Append new columns from the second table to an existing row with the same information (ID) or shared primary key in the first table.
-
Update new values from the second table to an existing row with a shared primary key in the first table.
After merging the developer finds the result to be incorrect. Instead of correctly merging and maintaining the original number of rows (where a match exists) the merged DataTable ends up adding an entirely new row for each of the merged entries, resulting in duplicate and, possibly, incomplete rows.
Solution: In short the likely reason is that the Primary Key was never specified when the DataTable was setup. Visit the PrimaryKey property page for more info on how to set it up: http://msdn.microsoft.com/en-us/library/system.data.datatable.primarykey.aspx.
Example Output of the Intended and Unintended Behaviors
Consider the following DataTables:
DataTable #1
ID, Customer, OrderID, Price
1, Peter Parker, 321, 200.00
42, Bruce Wayne, 120, 5000.00
DataTable #2
ID, OrderID, Price, Courier
1, 321, 1500.00, UPS
42, 120, 5000.00, DHL
As you can see DT#2 has a new column called "Courier" and no "Customer" column. DT#2 also has a different price set for Peter Parker (1500.00 instead of 200.00). The expected merge result of DT#1 and DT#2 using ID (and maybe also OrderID) as primary keys is shown below:
Expected Merged DataTable
ID, Customer, OrderID, Price, Courier
1, Peter Parker, 321, 1500.00, UPS
42, Bruce Wayne, 120, 5000.00, DHL
The above correctly maintains the number of rows and updated the price for Peter Parker.
The unintended result that occurs when no PrimaryKey is set is shown below:
Unexpected Merged DataTable
ID, Customer, OrderID, Price, Courier
1, Peter Parker, 321, 200.00
42, Bruce Wayne, 120, 5000.00
1, 321, 1500.00, UPS
42, 120, 5000.00, DHL
It is apparent that the above result is incorrect and has resulted in duplication of rows (duplicate IDs) with incomplete information (missing Customer and Courier details).
As mentioned earlier, the solution is to set the PrimaryKey property. Below is code to demonstrate the solution.
Code Example (C#)
The following code demonstrates how to get an expected merged DataTable. Some of the code is redundant for the sake of clarity. I apologize for the clutter; the code formatting disappeared when pasted here.
// setup for 1st DataTable
DataTable dt1 = new DataTable();
dt1.Columns.Add("ID", typeof(int));
dt1.Columns.Add("Customer", typeof(string));
dt1.Columns.Add("OrderID", typeof(int));
dt1.Columns.Add("Price", typeof(float));
// set primary key(s) here
dt1.PrimaryKey = new DataColumn[] { dt1.Columns["ID"] };
// Another option: multiple keys
// dt1.PrimaryKey = new DataColumn[] { dt1.Columns["ID"], dt1.Columns["OrderID"] };
// populate with some data
dt1.Rows.Add(new Object[] { 1, "Peter Parker", 321, 200.00 });
dt1.Rows.Add(new Object[] { 42, "Bruce Wayne", 120, 5000.00 });
// setup 2nd DataTable similar to the 1st but w/o a Customer column and with a new Courier column
DataTable dt2 = new DataTable();
dt2.Columns.Add("ID", typeof(int));
dt2.Columns.Add("OrderID", typeof(int));
dt2.Columns.Add("Price", typeof(float));
dt2.Columns.Add("Courier", typeof(string));
// set primary key(s) here
// The merge will work provided one of the DataTables have primary keys set, so this step is optional but consistent
// dt2.PrimaryKey = new DataColumn[] { dt2.Columns["ID"] };
// dt2.PrimaryKey = new DataColumn[] { dt2.Columns["ID"], dt2.Columns["OrderID"] };
// populate with some data, same IDs & OrderIDs as before
dt2.Rows.Add(new Object[] { 1, 321, 1500.00, "UPS" }) ; // new price for P. Parker
dt2.Rows.Add(new Object[] { 42, 120, 5000.00, "DHL" }); // same price for B. Wayne
// show the DataTables
ShowDataTable(dt1, "Showing DataTable 1");
ShowDataTable(dt2, "Showing DataTable 2");
// merge and show
dt1.Merge(dt2);
ShowDataTable(dt1, "Showing Merged DataTable");
ShowDataTable Code
// Simple routine to display the DataTable
public static void ShowDataTable(DataTable dt, string caption)
{
Console.WriteLine(caption);
// show column names
foreach (DataColumn col in dt.Columns)
Console.Write("\t" + col.ColumnName);
Console.WriteLine();
// show values
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn col in dt.Columns)
Console.Write("\t" + row[col]);
Console.WriteLine();
}
Console.WriteLine();
}
Closing Comments
To see the unexpected result using the above code simply comment out all the lines where the PrimaryKey property is set.