Generating Test Data for Databases by Using Data Generators

You can use Visual Studio Premium to create test data that you need to verify the behavior of the database objects in your database project. You can generate data that is appropriate for the schema of the database but unrelated to the production data. This approach can help you protect the privacy or security of the production data.

You define a data generation plan that specifies the details of how you want data to be generated for specific tables and columns. For each column, you specify a data generator that produces data of a particular type.

When you create a data generation plan in a database project, the plan is based on the schema of the project. If you create a data generation plan and then the schema changes, you are prompted to update the plan.

Important

If you do not update the data generation plan based on the schema changes, you cannot continue to modify the plan. You can only close and reopen the data generation plan to display the prompt again.

When you run database unit tests, you can specify a different data generation plan for each test project. Therefore, you can initialize the database to a different state for each group of tests.

Data Generators

Visual Studio Premium includes several built-in data generators for generating different kinds of data. For example, the integer generator provides random integer values, the string generator provides random strings, and the regular expression generator provides strings that match a pattern that you specify. For more information and the complete list of standard data generators, see Standard Data Generator Types and Specify Details of Data Generation for a Column.

If the generators that are included with Visual Studio Premium do not provide data that meets your requirements, you can create custom generators. For example, you might want to create a custom data generator if your database contains a column that has a check constraint that references another column. For more information, see Generate Specialized Test Data with a Custom Data Generator.

You might receive errors if you try to generate data for a table that contains SPARSE columns and a COLUMN_SET. For more information, see Troubleshooting Data Generation Issues.

Data Generation Plans in a Team Environment

A data generation plan is an XML file that contains information about the database schema and configuration information that controls the data generation for each column in each table that you specify. A data generation plan for a database that contains approximately 40 tables can be approximately 8 MB and more than 100,000 lines long.

You cannot automatically merge changes to a .dgen file by using Team Foundation version control or another version control system. The process that you must use to manually merge large XML files is difficult and prone to errors. You can minimize problems that relate to your data generation plan by using exclusive checkout on the plan when you must change it.

Security of Data Generators

You can share data generation plans and custom data generators in a team environment. Before you share or use shared data generation files, you should consider the following security risks:

  • Schema Information in Data Generation Plans
    When you create a data generation plan, the .dgen file contains the schema of the tables, which might be a sensitive trade secret. When you share a .dgen file, the person with whom you share the file can see your schema. You should share data generation plans only with trusted sources.

  • Malicious Code in Data Generation Plans
    When a data generation plan contains a data bound generator, you write a Transact-SQL query that runs when the plan is run. This approach allows arbitrary Transact-SQL to run from within a data generation plan. You should obtain data generation plans only from trusted sources, and you should warn end users not to run data generation plans that they receive from non-trusted sources.

  • Connection Information and Custom Data Generators
    All custom data generators have access to the database connection string at run time. A malicious custom generator could expose the connection string information. You should obtain custom data generators only from trusted sources, and you should warn end users not to use custom data generators that they receive from non-trusted sources.

  • Malicious Code in Custom Data Generators
    Custom data generators are classes that can contain arbitrary code. When you use a custom data generator, it runs with the same permissions as the active user. This approach can run malicious code in FullTrust mode. You should obtain custom data generators only from trusted sources, and you should warn end users not to use custom data generators that they receive from non-trusted sources.

  • Malicious Code in Installers for Custom Data Generators
    You can create deployment projects to install custom data generators. Deployment projects can contain arbitrary code. When you run an installation program for a custom data generator, the program runs with elevated permissions. This approach can run malicious code with elevated permissions. You should obtain custom data generator installers only from trusted sources, and you should warn end users not to run installation programs for custom data generators that they receive from non-trusted sources.

Common Tasks

Common Tasks

Supporting Content

Get hands-on practice: You can follow an introductory walkthrough to become familiar with how to create and run a simple database unit test.

Create a data generation plan: Create a data generation plan for each unit test project or for specific application tests. You can also define a data generation plan that defaults all columns to reuse data from another database that you specify.

Upgrade a data generation plan from a previous release: You can use data generation plans that you created with a previous release of Visual Studio Premium. However, you must upgrade the plan immediately after you open it.

Specify the tables that you want to generate: You can generate data for all tables or for only those tables that you specify. For each table, you can specify a number of rows to generate. As an alternative, you can generate rows in a ratio that is based on the number of rows that you generate for another table.

Configure how data is generated for each column in the tables that you specify: You specify the type of data generator for each column. You can also provide constraints on the range of data values. For more-advanced data generators, you can provide additional configuration information to further refine the data that you generate.

Preview the generated data: Before you write data to your database, you can preview the data for any table that is included in your data generation plan. By previewing the data, you can adjust your data generation plan without generating data iteratively.

Generate test data: After you have previewed and verified the data that you will generate, you can run the data generation plan to generate the test data. You can also configure your unit test project so that data is generated automatically whenever you run your unit tests.

Check in the data generation plans: Just as you must check in your unit tests to version control to share them with your team, you must also check in your data generation plans.

Change data generator defaults and options: You can specify default values for the random seed and the number of rows to insert. You can also specify options that control aspects of the data generators, such as the number of rows to display in the preview window and the maximum number of acceptable errors that can be displayed while data is generated for an individual table before the operation fails.

Troubleshoot problems: You can learn more about how to troubleshoot common problems with creating and using data generation plans.

  • Creating and Defining Database Unit Tests
    Provides information and links to additional topics about how you can define unit tests to verify the behavior of your database objects. You can associate each test project with a different data generation plan, deployment configuration, and connection string.

  • Running Database Unit Tests
    Describes how, after you create your database unit tests, you run them to see the results.

  • Define Custom Conditions for Database Unit Tests
    Describes how you can test for a special condition that you cannot verify by using the default test conditions.