August 2018

Volume 33 Number 8

[Data Points]

Deep Dive into EF Core HasData Seeding

By Julie Lerman | August 2018 | Get the Code

Julie LermanThe ability to seed data when migrations are run is a feature that disappeared in the transition from Entity Framework 6 (EF6) to Entity Framework Core (EF Core). With the latest version of EF Core, 2.1, seeding has made a comeback, yet in a very different form. In this article, you’ll learn how the new seeding feature works, as well as scenarios where you may or may not want to use it. Overall, the new mechanism of seeding is a really handy feature for models that aren’t overly complex and for seed data that remains mostly static once it’s been created.

Basic Seeding Functionality

In EF6 (and earlier versions) you added logic into the DbMigration­Configuration.Seed method to push data into the database any time migrations updated the database. For a reminder of what that looks like, check out the Microsoft ASP.NET documentation on seeding with EF6 at bit.ly/2ycTAIm.

In EF Core 2.1, the seeding workflow is quite different. There is now Fluent API logic to define the seed data in OnModelCreating. Then, when you create a migration, the seeding is transformed into migration commands to perform inserts, and is eventually transformed into SQL that that particular migration executes.  Further migrations will know to insert more data, or even perform updates and deletes, depending on what changes you make in the OnModelCreating method.

If you happened to read the July 2018 Data Points (msdn.com/magazine/mt847184), you may recall the Publications model I used to demonstrate query types. I’ll use that same model for these examples. In fact, I slid some data seeding into the July download sample! I’ll start from a clean slate here, though.

The three classes in my model are Magazine, Article and Author. A magazine can have one or more articles and an article can have one author. There’s also a PublicationsContext that uses SQLite as its data provider and has some basic SQL logging set up.

Seeding Data for a Single Entity Type

Let’s start by seeing what it looks like to provide seed data for a magazine—at its simplest.

The key to the new seeding feature is the HasData Fluent API method, which you can apply to an Entity in the OnModelCreating method.

Here’s the structure of the Magazine type:

public class Magazine
{
  public int MagazineId { get; set; }
  public string Name { get; set; }
  public string Publisher { get; set; }
  public List<Article> Articles { get; set; }
}

It has a key property, MagazineId, two strings and a list of Article types. Now let’s seed it with data for a single magazine:

protected override void OnModelCreating (ModelBuilder modelBuilder)
{
  modelBuilder.Entity<Magazine> ().HasData
    (new Magazine { MagazineId = 1, Name = "MSDN Magazine" });
}

A couple things to pay attention to here: First, I’m explicitly setting the key property, MagazineId. Second, I’m not supplying the Publisher string.

Next, I’ll add a migration, my first for this model. I happen to be using Visual Studio Code for this project, which is a .NET Core app, so I’m using the CLI migrations command, “dotnet ef migrations add init.” The resulting migration file contains all of the usual CreateTable and other relevant logic, followed by code to insert the new data, specifying the table name, columns and values:

migrationBuilder.InsertData(
  table: "Magazines",
  columns: new[] { "MagazineId", "Name", "Publisher" },
  values: new object[] { 1, "MSDN Magazine", null });

Inserting the primary key value stands out to me here—especially after I’ve checked how the MagazineId column was defined further up in the migration file. It’s a column that should auto-increment, so you may not expect that value to be explicitly inserted:

MagazineId = table.Column<int>(nullable: false)
                        .Annotation("Sqlite:Autoincrement", true)

Let’s continue to see how this works out. Using the migrations script command, “dotnet ef migrations script,” to show what will be sent to the database, I can see that the primary key value will still be inserted into the key column:

INSERT INTO "Magazines" ("MagazineId", "Name", "Publisher")
VALUES (1, 'MSDN Magazine', NULL);

That’s because I’m targeting SQLite. SQLite will insert a key value if it’s provided, overriding the auto-increment. But what about with a SQL Server database, which definitely won’t do that on the fly?

I switched the context to use the SQL Server provider to investigate and saw that the SQL generated by the SQL Server provider includes logic to temporarily set IDENTITY_INSERT ON. That way, the supplied value will be inserted into the primary key column. Mystery solved!

You can use HasData to insert multiple rows at a time, though keep in mind that HasData is specific to a single entity. You can’t combine inserts to multiple tables with HasData. Here, I’m inserting two magazines at once:

modelBuilder.Entity<Magazine>()
           .HasData(new Magazine{MagazineId=2, Name="New Yorker"},
                    new Magazine{MagazineId=3, Name="Scientific American"}
           );

What About Non-Nullable Properties?

Remember that I’ve been skipping the Publisher string property in the HasData methods, and the migration inserts null in its place. However, if I tell the model builder that Publisher is a required property, in other words, that the database column is non-nullable, HasData will enforce that.

Here’s the OnModelBuilding code I’ve added to require Publisher:

modelBuilder.Entity<Magazine>().Property(m=>m.Publisher).IsRequired();

Now, when I try to add a migration to account for these new changes (the IsRequired method and seeding two more magazines), the migrations add command fails, with a very clear error message:

"The seed entity for entity type 'Magazine' cannot be added because there was no value provided for the required property 'Publisher'."

This happened because the two new magazines I’m adding don’t have a Publisher value.

The same would happen if you tried to skip the MagazineId because it’s an integer, even though you know that the database will provide the value. EF Core also knows that the database will generate this value, but you’re still required to provide it in HasData.

The need to supply required values leads to another interesting limitation of the HasData feature, which is that there’s a possibility it will conflict with a constructor. Imagine I have a constructor for Magazine that takes the magazine’s name and publisher’s name:

public Magazine(string name, string publisher)
{
  Name=name;
  Publisher=publisher;
}

As the database will create the key value (MagazineId), there’s no reason I’d have MagazineId as a parameter of such a constructor.

Thanks to another new feature of EF Core 2.1, I no longer have to add a parameterless constructor to the class in order for queries to materialize magazine objects. That means the constructor is the only option for me to use in my HasData method:

modelBuilder.Entity<Magazine>()
  .HasData(new Magazine("MSDN Magazine", "1105 Media"));

But, again, this will fail because I’m not supplying a value for the non-nullable MagazineId property. There’s a way around this, however, which takes advantage of the EF Core shadows property feature—using anonymous types instead of explicit types.

HasData with Anonymous Types

The ability to seed with anonymous types instead of explicit types solves a lot of potential roadblocks with HasData. 

The first is the one I just explained, where I created a constructor for the Magazine class and there’s no way to set the non-nullable MagazineId when seeding with HasData. Instead of instantiating a Magazine, you can instantiate an anonymous type and supply the MagazineId, without worrying about whether the property is public or private, or even exists! That’s what I’m doing in the following method call:

modelBuilder.Entity<Magazine>()
.HasData(new {MagazineId=1, Name="MSDN Mag", Publisher="1105 Media"});

The migration code will correctly insert that data into the magazines table, and running the migrations update database command works as expected:

migrationBuilder.InsertData(
                table: "Magazines",
                columns: new[] { "MagazineId", "Name", "Publisher" },
                values: new object[] { 1, "MSDN Mag", "1105 Media" });

You’ll see a few more roadblocks that the anonymous type solves further on.

What About Private Setters?

The limitation posed by the required primary key stands out because Magazine uses an integer as a key property. I’ve written many solutions, however, that use Guids for keys and my domain logic ensures that a Guid value is created when I instantiate an entity. With this setup, I can protect any properties by using private setters, yet still get the key property populated without exposing it. But there’s a problem for HasData. First, let’s see the effect and then explore (and solve) the problem.

As an example, I’ve transformed Magazine in Figure 1 so that MagazineId is a Guid, the setters are private and the only way (so far) to set their values is through the one and only constructor.

Figure 1 The Magazine Type with a Guid Key, Private Setters and a Parameter Constructor

public class Magazine
{
  public Magazine(string name, string publisher)
  {
    Name=name;
    Publisher=publisher;
    MagazineId=Guid.NewGuid();
  }
  public Guid MagazineId { get; private set; }
  public string Name { get; private set; }
  public string Publisher { get; private set; }
  public List<Article> Articles { get; set; }
}

Now I’m assured that when I create a new Magazine object a MagazineId value will be created, as well:

modelBuilder.Entity<Magazine>().HasData(new Magazine("MSDN Mag", "1105 Media");

The migration generates the following InsertData method for Magazine, using the Guid created in the constructor:

migrationBuilder.InsertData(
  table: "Magazines",
  columns: new[] { "MagazineId", "Name", "Publisher" },
  values: new object[] { new Guid("8912aa35-1433-48fe-ae72-de2aaa38e37e"),
                        "MSDN Mag", "1105 Media" });

However, this can cause a problem for the migration’s ability to detect changes to the model. That Guid was auto-generated when I created the new migration. The next time I create a migration a different Guid will be generated and EF Core will see this as a change to the data, delete the row created from the first migration and insert a new row using the new Guid. Therefore, you should use explicit Guids when seeding with HasData, never generated ones. Also, you’ll need to use the anonymous type again, rather than the constructor, because MagazineId doesn’t have a public setter:

var mag1=new {MagazineId= new Guid("0483b59c-f7f8-4b21-b1df-5149fb57984e"),
              Name="MSDN Mag", Publisher="1105 Media"};
modelBuilder.Entity<Magazine>().HasData(mag1);

Keep in mind that explicitly creating Guids in advance could get cumbersome with many rows.

Using HasData to seed related data is very different from inserting related data with DbSets. I stated earlier that HasData is specific to a single entity. That means you can’t build graphs as parameters of HasData. You can only provide values for properties of that Entity.

Therefore, if you want to add a Magazine and an Article, these tasks need to be performed with separate HasData methods—one on Entity<Magazine> and one on Entity<Article>. Take a look at the schema of the Article class:

public class Article
{
  public int ArticleId { get; set; }
  public string Title { get; set; }
  public int MagazineId { get; set; }
  public DateTime PublishDate { get;  set; }
  public int? AuthorId { get; set; }
}

Notice that the MagazineId foreign key is an int and, by default, that’s non-nullable. No article can exist without a Magazine identified. However, the AuthorId is a nullable int, therefore it’s possible to have an article that hasn’t yet had an author assigned. This means that when seeding an Article, in addition to the required ArticleId value, you must supply the MagazineId. But you’re not required to supply the AuthorId. Here’s code to add an article where I’ve supplied the key value (1), the value of an existing magazine’s ID (1) and a Title—I didn’t provide an AuthorId or a date:

modelBuilder.Entity<Article>().HasData(
  new Article { ArticleId = 1, MagazineId = 1,
    Title = "EF Core 2.1 Query Types"});

The resulting migration code is as follows:

migrationBuilder.InsertData(
  table: "Articles",
  columns: new[] { "ArticleId", "AuthorId", "MagazineId", "PublishDate", "Title" },
  values: new object[] { 1, null, 1, 
    new DateTime(1, 1, 1, 0, 0, 0, 0, 
    DateTimeKind.Unspecified), "EF Core 2.1 Query Types" });

The migration is adding null for AuthorId, which is fine. I didn’t supply a PublishDate, so it’s defaulting to the minimal .NET date value (01/01/0001). If I were to add a MagazineId or an AuthorId that doesn’t yet exist, it won’t be caught until the SQL is run against the database, triggering a referential integrity error.

If you’ve followed my work for a while, you may know that I’m a big fan of using foreign key properties for related data, rather than navigation properties. But there are scenarios where you may prefer not to have the foreign key property in your dependent type. EF Core can handle that thanks to shadow properties. And once again, anonymous types come to the rescue with HasData to seed the related data that requires you to supply the value of the foreign key column.

Seeding Owned Entities

Owned entities, also known as owned types, are the way EF Core lets you map non-entity types, replacing the complex type feature of Entity Framework. I wrote about this new support in the April 2018 Data Points column (msdn.com/magazine/mt846463). Because an owned type is specific to the entity that owns it, you’ll need to do the data seeding as part of the definition of the type as a property of an entity. You can’t just populate it from modelBuilder the way you do for an entity.

To demonstrate, I’ll introduce a new type in my model, Publisher:

public class Publisher
{
  public string Name { get; set; }
  public int YearFounded { get; set; }
}

Notice it has no key property. I’ll use Publisher as a property of Magazine in place of the Publisher string and, at the same time, revert to a simpler Magazine class:

public class Magazine
{ 
  public int MagazineId { get;  set; }
  public string Name { get;  set; }
  public Publisher Publisher { get;  set; }
  public List<Article> Articles { get; set; }
}

Two important points to remember are that you can only provide properties for one entity type with HasData and that the Model Builder treats an owned type as a separate entity. In this case, that means you can’t populate a magazine and its publisher in a single Entity<Magazine>.HasData method. Instead, you have to identify the owned property (even if you’ve configured it elsewhere) and append HasData to it.

I’ll first provide some Magazine data:

modelBuilder.Entity<Magazine> ()
            .HasData (new Magazine { MagazineId = 1, Name = "MSDN Magazine" });

Seeding the owned type is a little tricky only because it may not be something you can intuit. Because the model builder will treat Publisher as a related object in order to persist it properly, it needs to know the value of the MagazineId that owns it. As there’s no MagazineId property in Publisher—EF Core uses its shadow property feature to infer a MagazineId property. In order to set that property, you’ll need to instantiate an anonymous type rather than a Publisher. If you tried to instantiate a Publisher, it wouldn’t accept the MagazineId property in the initializer:

modelBuilder.Entity<Magazine> ()
    .OwnsOne (m => m.Publisher)
    .HasData (new { Name = "1105 Media", YearFounded = 2006, MagazineId=1 });

When I create a migration to take this pairing into account, the resulting InsertData method knows to insert all of the values—the properties of Magazine and its owned type, Publisher—into the Magazine table:

migrationBuilder.InsertData(
  table: "Magazines",
  columns: new[] { "MagazineId", "Name", "Publisher_Name", "Publisher_YearFounded" },
  values: new object[] { 1, "MSDN Magazine", "1105 Media", 2006 });

This works out easily enough when my classes are simple, although you may reach some limitations with more complicated classes.

No Migrations? EnsureCreated Does the Job

Finally, we’ve reached the point where you get to see the dual nature of the new seeding mechanism. When you’re using database providers with migrations commands, the migrations will contain the logic to insert, update or delete seed data in the database. But at run time, there’s only one way to trigger HasData to be read and acted upon, and that’s in response to the DbContext.Database.EnsureCreated method. Keep in mind that EnsureCreated won’t run migrations if the database already exists. The provider that really benefits from this is the InMemory provider. You can explicitly create and seed InMemory databases in your tests by calling EnsureCreated. Unlike the Migrate command, which runs migrations—and will execute and seed methods in those migrations—EnsureCreated creates a database using the model described by the context class. And whatever provider you’re using, that will also cause HasData methods to insert data at the same time.

To demonstrate, I’ve modified the PublicationsContext by adding a new constructor to allow for injecting a provider by adding an explicit public parameterless constructor to allow for passing in pre-configured options:

public PublicationsContext (DbContextOptions<PublicationsContext> options) :
  base (options) { }
public PublicationsContext () { }

And I’ve added logic to skip the UseSqlite method in OnConfiguring if the options have already been configured:

protected override void OnConfiguring (DbContextOptionsBuilder optionsBuilder)
{
  if (!optionsBuilder.IsConfigured)
  {
    optionsBuilder.UseSqlite (@"Filename=Data/PubsTracker.db");
  }
  optionsBuilder.UseLoggerFactory (MyConsoleLoggerFactory);
}

Note that I moved the UseLoggerFactory command to run after the IsConfigured check. If it comes before the check, IsConfigured returns true. I started out that way and it took me a while to figure out what was wrong.

My automated test sets up the options to use the InMemory provider. Next, critical to the seeding, it calls EnsureCreated and then tests to see if there is, indeed, some data already available:

public void CanRetrieveDataCreatedBySeeding ()
{
  var options = new DbContextOptionsBuilder<PublicationsContext> ()
                    .UseInMemoryDatabase ("RetrieveSeedData").Options;
  using (var context = new PublicationsContext (options))
  {
    context.Database.EnsureCreated();
    var storedMag = context.Magazines.FirstOrDefault ();
    Assert.Equal ("MSDN Magazine", storedMag.Name);
  }
}

The test passes because EnsureCreated forced the HasData methods to push the seed data into the InMemory database.

A Variety of Use Cases, but Not All of Them

Even though you’ve seen some of the limitations of using HasData in a few more-complex scenarios, it’s definitely a nice improvement over the workflow that existed in earlier versions of EF. I really appreciate that I now have more control over the data flow by tying the insert, update and delete statements to individual migrations, rather than having to worry about upserts on every migration. The syntax is much cleaner, as well. Most important is the dual nature of this feature that not only allows you to get the seed data into your development (or even production) database, but also means that by calling EnsureCreated, you can seed the InMemory data to provide a consistent base of seed data that will be relevant for each test.

But HasData isn’t a silver bullet. Keep in mind that this feature is best for seed data that will remain static once it’s been inserted into the database. Also, watch out for HasData migrations that could override data you’ve seeded outside of migrations. As I explained earlier with the Guids, HasData doesn’t work well with computed data. Andriy Svyryd from the EF team adds, “For seeding testing data that’s not expected to be maintained between migrations or to have more complex logic, like computing seed values from the current state of the database, it’s still possible and encouraged to just create a new instance of the context and add the data using SaveChanges.” As another alternative, I’ve heard from readers that my method for seeding with JSON data is still working nicely, even with EF Core 2.1. I wrote about this in a blog post at bit.ly/2MvTyhM.

If you want to stay informed on how HasData will evolve, or even on issues that users are discovering, keep an eye on the GitHub repository at bit.ly/2l8VrEy and just filter on HasData.


Julie Lerman is a Microsoft Regional Director, Microsoft MVP, software team coach and consultant who lives in the hills of Vermont. You can find her presenting on data access and other topics at user groups and conferences around the world. She blogs at the thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at bit.ly/PS-Julie.

Thanks to the following Microsoft technical expert for reviewing this article: Andriy Svyryd


Discuss this article in the MSDN Magazine forum