Domain Modeling and Persistence in Entity Framework 6

Entity Framework is a two-headed beast. For most developers, the main difficulty with Entity Framework isn't so much mastering the technical aspects of each head but putting each head in the most appropriate perspective.

Entity Framework's two heads are represented by the name of the technology: Entity and Framework. One head—the "Entity"—refers to domain modeling; it ignores physical persistence and blinks at advanced patterns such as domain-driven design (DDD). The other head—the "Framework"—is about actual mapping of entities to relational tables.

The two heads reach a good level of cooperation in Entity Framework 6, which makes modeling domain entities and persistence of modeled classes smoother than ever before. The combination of the two heads makes Entity Framework more than simply an Object/Relational Mapper (O/RM) tool. There's something in Entity Framework 6 that makes it a plain O/RM with an object-based vision of the data. Some other flavors of Entity Framework, however, seem to suggest a vision of persistence that's still database-centered. As you might already understand, these two aspects are somewhat in contrast. However, they don't exist to generate a conflict but rather to offer at least two approaches to data persistence that go beyond the basic abstraction level of ADO.NET and SQL commands.

This article focuses on the three flavors of Entity Framework: database-first, model-first, and code-first. I identify realistic scenarios for each flavor, in an attempt to reveal Entity Framework's true nature and to hopefully clearing the clouds of fear related to auto-generated SQL and the widespread doubt about effectiveness.

Database-First

The database-first approach was the first method that Entity Framework engineers pushed; it's likely the most natural approach to introduce SQL developers to the dazzling new world of objects. In a nutshell, database-first consists of using an existing SQL Server database connection through a Visual Studio integrated wizard. The wizard scans the existing database structure and builds a model of it in which tables, views, indexes, stored procedures, and relationships are faithfully reported. The Visual Studio tooling also allows you to update the model as changes to the underlying structure occur. You can import in the model the entire database structure or just a section of it, such as a subset of the tables or just tables and none of the additional database objects.

Database-first requires an existing SQL Server database or an existing database for which an Entity Framework provider exists. For example, you can connect to an Oracle database and infer an object model out of it. For more information, see the Oracle article "Distinctive Database Development." The inferred model is just the description of a class model created in a meta language and saved to a bunch of XML files. The primary file has the popular EDMX extension. As a successive step, the developer instructs the Visual Studio tooling to create real C# classes based on the class model. The EDMX viewer offers various options for the code artifacts, as you can see in Figure 1.

Figure 1: Turning an Entity Framework Model into Code Artifacts

The self-tracking entities option produces classes with the inherent ability to carry their state as they move across the wire. This is a feature that DataSet nostalgics might recall. The EntityObject option is a legacy option that has existed since the much-maligned first version of Entity Framework. You don't want to have persistence-aware classes in the model because these classes create a dependency on the Entity Framework core assemblies. But is this really a bad idea? Well, yes and no.

Simply put, it's not bad if you envision the data layer as a set of actions around a bunch of core tables and devise it as tightly coupled to the business layer. If this is your architectural vision, using a representation of data that's persistence-aware makes total sense. What makes less sense in this context is just the use of Entity Framework and an object model. The benefits of using EntityObject-derived classes are all in the code, which is far more readable when filled with SQL commands and open/close connections.

When it comes to modeling, the database is realistically almost always a constraint. So the question shifts away from how you persist data and becomes how you model and programmatically handle the data. This has a lot to do with the business logic you're asked to implement. If it's a plain CRUD, you have practically no logic to deal with—and there's little benefit in spending time on an object model. Many systems, though, just look like they're plain CRUD systems. Let's consider even the simplest booking system for the least critical resources you can think of. It looks like a CRUD because key operations are just insertions of new bookings, updates, and deletions. But the point is that you hardly have plain "operations"; you likely have more or less sophisticated "workflows." Operations simply deal with data; workflows deal with data and logic.

The crucial point of organizing the back end of a system is how you organize the data and logic. Modeling data and logic has little to do with the Entity Framework approach, whether database-first, model-first, or code-first. The "Framework" head is concerned with how you define the database; the "Entity" head is concerned with how you model.

Model-First

The model-first option, which is now seldom used, refers to scenarios in which developers actually prefer to create the schema of a relational storage within the Visual Studio Entity Framework designer rather than in SQL Server Management Studio (SSMS). You define entities as abstractions that represent the real objects in the application domain. You also define relationships. When you're done, you get the same EDMX description that in a database-first approach is simply inferred from an existing database.

You still need to turn the EDMX model into classes, and you have the same options for doing so that Figure 1 illustrates. The model just provides a high-level view of what you want to achieve. In addition, you need to take care of the actual mapping between entity properties and physical tables to be created. On the upside, you can tweak and refine the model at will before hitting the database server. However, once you hit the database, making changes to the model still poses the same problems as with an existing database. Either it's acceptable for you to drop the database and repopulate it after every change or you should proceed by tweaking the schema as softly as you can.

Code-First

The model-first approach was introduced as a way to help developers abstract the design of the database structure a bit. However, model-first didn't capture anyone's heart. Although it sounded intriguing at first, in the end it wasn't really what developers wanted. What developers truly wanted—I think—was what came next: code-first. Code-first refers to writing C# classes initially free of any persistence concerns—although such classes will at some point need persistence anyway.

A huge difference exists between model-first and code-first. Model-first makes you think of entities and relationships in relational terms. The focus is on data alone, and the database schema is your primary concern.

With code-first you can still maintain the focus on data and create classes that map one-to-one to tables, with public properties mapped to table columns. However, because you manage C# code, the code-first approach has the tremendous potential to make it far easier to create an object model that faithfully represents the real business. This can only happen when the focus is primarily on logic and behavior instead of data.

Let's consider a scoring system for a sport such as basketball. You might need a Match class with a minimal structure, such as the following:

public class Match
{
   public String Team1 {get; set;}
   public String Team2 {get; set;}
   public Int32 Score1 {get; set;}
   public Int32 Score2 {get; set;}
   public Int32 Period {get; set;}
   public MatchState CurrentState {get; set;}
}

Does this really model the business domain faithfully? From what I know about basketball (and sports in general), this only nicely models persistence for a sports match. It focuses on the data available and blissfully ignores the process you observe while scoring. In action, you see an initial setup phase in which names of the teams are determined. Next, the match officially begins and enters a warm-up state. Then a period begins and ends, repeatedly. During a period the score might change as one of the teams scores field goals, threes, or free throws. Finally, the match ends. As you can see, the domain has mostly actions and events; data is used under the hood but shows up only in read mode—current score and current period.

You can still use the above definition of Match, but the entire domain logic should go elsewhere in a separate service. Worse yet, your domain model is quite anemic and requires care to avoid inconsistency. As the class Match is defined above, what if you assign a negative value to Score1 or Period? You still need a strong validation layer to be in class setters or in a separate service. As I see it, this approach just doesn't scale well with complexity of the domain and individual entities. It's a catch-me-if-you-can kind of scenario. You'll continually be adding checks to the validation layer to ensure that instances of entities are consistent with the business rules.

Let's be honest. This is just the way in which many of us have written, and maybe still write, code. So to some extent an anemic model (just properties—no methods) plus separate business objects is an approach that works. But how good it is in the long run of complexity is yet to be proven.

The game changer is viewing persistence and modeling as distinct aspects of the system while being aware that any model needs persistence at some point. In the end, code comes first and persistence comes later. With code-first you create the model looking at the behavior and render data via public properties and private setters. Resulting classes therefore have a lot of methods that map more or less directly to observable behavior of the entities in the domain. At this point, the Entity Framework machinery properties are easily mappable, even by convention, to columns in relational tables. Here's a possible way to rewrite the Match class:

public class Match

public class Match
{
   public Match Create(String team1, String team2) { ... }
   protectedMatch() {...}
   public String Team1 {get; private set;}
   public String Team2 {get; private set;}
   public Int32 Score1 {get; private set;}
   public Int32 Score2 {get; private set;}
   public Int32 Period {get; private set;}
   public MatchState CurrentState {get; private set;}
   public Match Start() { ... }
   public Match End() { ... }
   public Match StartPeriod() { ... }
   public Match EndPeriod() { ... }
   public Match Goal1(GoalType type) { ... }
   public Match Goal2(GoalType type) { ... }
}

All setters are now private, and a factory method replaces the default constructor. A lot of methods have magically appeared. Because I constrained each of those methods to support fluent code, I can now write tests similar to the following:

var match = Match.Create("Home", "Visitors");
match.Start()
     .StartPeriod()
     .Goal1(GoalType.Field)
     .Goal2(GoalType.FreeThrow)
      :
     .EndPeriod()
     .StartPeriod()
        :
     .End();
Assert.AreEqual(m.Score1, expected1);

Usage of the class now reflects the real business a lot more closely. And Entity Framework can still serialize the class to a database. Attributes or fluent modeling API will still let you shape the table at will. Because persistence is a constraint to the model, some compromise might be necessary. If you use a factory method, then you also need a protected constructor to use in the factory. If you instead use a parameterized constructor, then Entity Framework forces you to also have a parameterless constructor—which is used to instantiate objects in queries. Thankfully, you're not forced to make it public. Other concessions to persistence come in the form of extra properties that hide arrays of objects or enums in older versions of Entity Framework.

Code-First Is a First Choice

The primary benefit of Entity Framework is making code easier to read and write. Two aspects contribute to this: SQL code hidden in the folds of the embedded O/RM and the possibility of modeling the business domain to focus on behavior rather than just data. The code-first approach seems like the natural choice to improve modeling. The database-first method is also a starting point because it gives you anemic classes that you can use the partial class mechanism to extend with methods. Conveniently enough, Entity Framework 6.1 also offers a code-first reverse engineering facility that makes the database-first method even more obsolete—even when you have a clear database constraint to deal with.

Comments

Plain text