April 2013

Volume 28 Number 04

Data Points - Why Does Entity Framework Reinsert Existing Objects into My Database?

By Julie Lerman | April 2013

Julie LermanJust as it was time to come up with an idea for this column, three people asked me, via twitter and e-mail, why Entity Framework reinserts existing objects into their databases. Deciding what to write about became easy.

Because of its state management capabilities, when Entity Framework works with graphs, its entity state behavior doesn’t always align with your ideas of how it should work. Let’s look at a typical example.

Suppose I have two classes, Screencast and Topic, where each Screencast is assigned a single Topic, as shown in Figure 1.

Figure 1 The Screencast and Topic Classes

public class Screencast
{
  public int Id { get; set; }
  public string Title { get; set; }
  public string Description { get; set; }
  public Topic Topic { get; set; }
  public int TopicId { get; set; }
}
public class Topic
{
  public int Id { get; set; }
  public string Name { get; set; }
}

If I were to retrieve a list of Topics, assign one of them to a new Screencast and then save—with the entire set of operations contained in a single context—there would be no problem, as the following example shows:

using (var context = new ScreencastContext())
{
  var dataTopic = 
    context.Topics.FirstOrDefault(t=>t.Name.Contains("Data"));
  context.Screencasts.Add(new Screencast
                               {
                                 Title="EF101",
                                 Description = "Entity Framework 101",
                                 Topic = dataTopic
                               });
  context.SaveChanges();
}

A single Screencast would be inserted into the database with the proper foreign key for the chosen Topic.

When you’re working in client applications, or performing these steps within a single Unit of Work where the context is tracking all of the activity, this is the behavior you’d probably expect. However, if you’re working with disconnected data the behavior is quite different, and this has surprised many developers.

Added Graphs in Disconnected Scenarios

A common pattern that I use for handling reference lists is to use a separate context, which would no longer be in scope by the time you saved any user modifications. This situation is common for Web applications and services, but it can also occur in a client-side app. Here’s an example that uses a repository for reference data with the following GetTopicList method for retrieving a list of Topics:

public class SimpleRepository
{
  public List<Topic> GetTopicList()
  {
    using (var context = new ScreencastContext())
    {
      return context.Topics.ToList();
    }
  }
 ...
}

You might then present the Topics in a list on a Windows Presentation Foundation (WPF) form that lets users create a new Screencast, such as the one shown in Figure 2.

A Windows Presentation Foundation Form for Entering New Screencasts
Figure 2 A Windows Presentation Foundation Form for Entering New Screencasts

In a client app, such as the WPF form in Figure 2, you might then set the selected item from the dropdown to the new Screencast’s Topic property with code such as this:

private void Save_Click(object sender, RoutedEventArgs e)
{
  repo.SaveNewScreencast(new Screencast
                {
                  Title = titleTextBox.Text,
                  Description = descriptionTextBox.Text,
                  Topic = topicListBox.SelectedItem as Topic
                });
}

Now the Screencast variable is a graph containing the new Screencast and the Topic instance. Passing that variable into the repository’s SaveNewScreencast method adds the graph to a new context instance and then saves it to the database, like so:

public void SaveNewScreencast(Screencast screencast)
{
  using (var context = new ScreencastContext())
  {
    context.Screencasts.Add(screencast);
    context.SaveChanges();
  }
}

Profiling the database activity reveals that not only is the Screencast inserted but, before that, a new row is inserted for the Data Dev topic into the Topics table, even though that topic already existed:

exec sp_executesql N'insert [dbo].[Topics]([Name])
values (@0)
select [Id]
from [dbo].[Topics]
where @@ROWCOUNT > 0 and [Id] = 
  scope_identity()',N'@0 nvarchar(max) ',@0=N'Data Dev'

This behavior has confounded many developers. The reason it happens is that when you use the DbSet.Add method (that is, Screencasts.Add), not only is the state of the root entity marked “Added,” but everything in the graph that the context was not previously aware of is marked Added as well. Even though the developer may be aware that the Topic has an existing Id value, Entity Framework honors its EntityState (Added) and creates an Insert database command for the Topic, regardless of the existing Id.

While many developers may anticipate this behavior, there are many who don’t. And in that case, if you aren’t profiling the database activity, you may not realize it’s occurring until the next time you (or a user) discover duplicate items in the Topics list.

Note: If you’re not familiar with how EF inserts new rows, you may be curious about the select in the middle of the preceding SQL. That’s to ensure EF will get back the Id value of the newly created Screencast so it can set the value in the Screencast instance.

Not Just a Problem When Adding Entire Graphs

Let’s look at another scenario where this problem might occur.

What if, instead of passing a graph to the repository, the repository method requests both the new Screencast and the selected Topic as parameters? Instead of adding a full graph, it adds the Screencast entity and then sets its Topic navigation property:

public void SaveNewScreencastWithTopic(Screencast screencast,
  Topic topic)
{
  using (var context = new ScreencastContext())
  {
    context.Screencasts.Add(screencast);
    screencast.Topic = topic;
    context.SaveChanges();
  }
}

In this case, the SaveChanges behavior is the same as with the Added graph. You might be familiar with using the EF Attach method to attach an untracked entity to a context. In that case, the entity’s state starts out as Unchanged. But here, where we’re assigning the Topic to the Screencast instance, not to the context, EF  considers this to be an unrecognized entity and its default behavior for unrecognized entities with no state is to mark them as Added. So, again, the Topic will be inserted into the database when SaveChanges is called.

It’s possible to control the state, but this requires a deeper under­standing of EF behavior. For example, if you were to attach the Topic directly to the context, instead of to the Added Screencast, its EntityState would start out as Unchanged. Setting it to screencast.Topic wouldn’t alter the state because the context is already aware of the Topic. Here’s the modified code that demonstrates this logic:

using (var context = new ScreencastContext())
{
  context.Screencasts.Add(screencast);
  context.Topics.Attach(topic);
  screencast.Topic = topic;
  context.SaveChanges();
}

Alternatively, in lieu of context.Topics.Attach(topic), you could set the state of the Topic before or after the fact, explicitly setting its state to Unchanged:

context.Entry(topic).State = EntityState.Unchanged

Calling this code before the context is aware of the Topic will cause the context to attach the Topic and then set the state.

Though these are correct patterns for handling this problem, they’re not obvious. Unless you’ve learned about this behavior and the required code patterns in advance, you’re more apt to write code that seems logical, then run into this problem and only at that point start trying to figure out what the heck is going on.

Save the Grief and Use That Foreign Key

But there’s a much simpler way to avoid this state of confusion (pardon my pun), which is to take advantage of the foreign key properties.

Rather than setting the navigation property and having to worry about the state of the Topic, just set the TopicId property, because you do have access to that value in the Topic instance. This is something I find myself frequently suggesting to developers. Even on Twitter, I see the question: “Why is EF inserting data that already exists?” And I often guess correctly in reply: “Any chance u r setting a navigation property on a new entity instead of an FK? J”

So let’s revisit the Save_Click method in the WPF form and set the TopicId property instead of the Topic navigation property:

repo.SaveNewScreencast(new Screencast
               {
                 Title = titleTextBox.Text,
                 Description = descriptionTextBox.Text,
                 TopicId = (int)topicListBox.SelectedValue)
               });

The Screencast that’s sent to the repository method is now just the single entity, not a graph. Entity Framework can use the foreign key property to directly set the table’s TopicId. Then it’s simple (and faster) for EF to create an insert method for the Screencast entity including the TopicId (in my case, the value 2):

exec sp_executesql N'insert [dbo].[Screencasts]([Title], [Description], [TopicId])
values (@0, @1, @2)
select [Id]
from [dbo].[Screencasts]
where @@ROWCOUNT > 0 and [Id] = scope_identity()',
N'@0 nvarchar(max) ,@1 nvarchar(max) ,@2 int',
  @0=N'EFFK101',@1=N'Using Foreign Keys When Setting Navigations',@2=2

If you wanted to keep the construction logic in the repository and not force the UI developer to worry about setting the foreign key, you could specify a Screencast and the Topic’s Id as parameters for the repository method and set the value in the method as follows:

public void SaveNewScreencastWithTopicId(Screencast screencast, 
  int topicId)
{
  using (var context = new ScreencastContext())
  {
    screencast.TopicId = topicId;
    context.Screencasts.Add(screencast);
    context.SaveChanges();
  }
}

In our never-ending worries about what could happen, we need to consider the possibility that a developer might set the Topic navigation property anyway. In other words, even though we want to use the foreign key to avoid the EntityState problem, what if the Topic instance is part of the graph, such as in this alternative code for the Save_Click button:

repo.SaveNewScreencastWithTopicId(new Screencast
    {
      Title = titleTextBox.Text,
      Description = descriptionTextBox.Text,
      Topic=topicListBox.SelectedItem as Topic
    },
  (int) topicListBox.SelectedValue);

Unfortunately, this brings you back to the original problem: EF sees the Topic entity in the graph and adds it to the context along with Screencast—even though the Screencast.TopicId property has been set. And again, the EntityState of the Topic instance creates confusion: EF will insert a new Topic and use the value for that new row’s Id when it inserts the Screencast.

The safest way to avoid this is to set the Topic property to null when you set the foreign key value. If the repository method will be used by other UIs where you can’t be sure that only existing Topics will be used, you might even want to provide for the possibility of a newly created Topic being passed in. Figure 3 shows the repository method modified yet again to perform this task.

Figure 3 Repository Method Designed to Protect Against Accidental Navigation Property Insertion into Database

public void SaveNewScreencastWithTopicId(Screencast screencast, 
  int topicId)
{
  if (topicId > 0)
  {
    screencast.Topic = null;
    screencast.TopicId = topicId;
  }
  using (var context = new ScreencastContext())
  {
    context.Screencasts.Add(screencast);
    context.SaveChanges();
  }
}

Now I have a repository method that covers a number of scenarios, even providing logic to accommodate new Topics being passed in to the method.

ASP.NET MVC 4 Scaffolding-Generated Code Avoids the Problem

Although this is a problem that’s inherent in disconnected apps, it’s worth pointing out that if you’re using ASP.NET MVC 4 scaffolding to generate views and MVC controllers, you’ll avoid the problem of duplicate navigation entities being inserted into the database.

Given the one-to-many relationship between Screencast and Topic, as well as the TopicId property that’s the foreign key in the Screencast type, the scaffolding generates the following Create method in the controller:

public ActionResult Create()
{
  ViewBag.TopicId = new SelectList(db.Topics, "Id", "Name");
  return View();
}

It has built a list of Topics to pass to the view and named that list TopicId—the same name as the foreign key property.

The scaffolding has also included the following List in the markup for the Create view:

<div class="editor-field">
  @Html.DropDownList("TopicId", String.Empty)
  @Html.ValidationMessageFor(model => model.TopicId)
</div>

When the view posts back, the HttpRequest.Form includes a query-string value named TopicId that comes from the ViewBag property. TopicId’s value is that of the selected item from the DropDownList. Because the query string name matches the Screencast’s property name, ASP.NET MVC model binding uses the value for the TopicId property of the Screencast instance it creates for the method parameter, as you can see in Figure 4.

The New Screencast Gets Its TopicId from the Matching HttpRequest Query-String Value
Figure 4 The New Screencast Gets Its TopicId from the Matching HttpRequest Query-String Value

To verify this, you could change the controller’s TopicId variables to something else, such as TopicIdX, and make the same change to the “TopicId” string in the view’s @Html.DropDownList, and the query-string value (now TopicIdX) would be ignored and screencast.TopicId would be 0.

There’s no Topic instance being passed back through the pipeline. So ASP.NET MVC depends on the foreign key property by default and avoids the particular problem of reinserting an existing duplicate Topic into the database.

It’s Not You! Disconnected Graphs Are Complicated

While the EF team has done a lot to make working with disconnected data easier from one version of EF to the next, it’s still a problem that daunts many developers who aren’t well-versed in the expected behavior of EF. In our book, “Programming Entity Framework: DbContext” (O’Reilly Media, 2012), Rowan Miller and I devoted an entire chapter to working with disconnected entities and graphs. And when creating a recent Pluralsight course, I added in an unplanned 25 minutes that focused on the complexity of disconnected graphs in repositories.

It’s very convenient to work with graphs when you’re querying and interacting with data, but when it comes to building relationships with existing data, foreign keys are your friends! Take a look at my January 2012 column, “Making Do with Absent Foreign Keys” (msdn.microsoft.com/magazine/hh708747), which is also about some of the pitfalls of coding without foreign keys.

In an upcoming column, I’ll continue on my quest to alleviate some of the pain developers encounter when working with graphs in disconnected scenarios. That column, which will be Part 2 on this topic, will focus on controlling the EntityState in many-to-many relationships and navigation collections.


Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other Microsoft .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework” (2010) as well as a Code First edition (2011) and a DbContext edition (2012), all from O’Reilly Media. Follow her on Twitter at twitter.com/julielerman.

Thanks to the following technical expert for reviewing this article: Diego Vega (Microsoft)