When designing relational databases for different business purposes, the usual approach is to create a table for each kind of entity we intend to use and to define relationships between them.
Each property of an entity becomes a table column and is represented by an Entity Framework Core class property. Adding or changing properties is a natural part of the development process of such systems. But sometimes the business domain requires that some entities have a variable set of generic properties. Such scenarios often emerge when the entities have a large number of possible properties, with each entity usually using a smaller subset of properties. By creating such a system, it’s also possible to change the properties of entities without actually changing the database design or the code. When such an approach is required, an implementation of the Entity-Attribute-Value model comes into play.
Entity-Attribute-Value model (EAV)
When creating an Entity-Attribute-Value database design, the idea is to recognize which entities require a scalable list of attributes (properties), and to create metatables for those attributes. The basic tables needed are Entity, Attribute and Value in the simplest form. The design can be extended to your needs.
To describe what a metatable means in this sense, we can take a look at an example scenario where we need to describe different products (with generic properties) and keep data about items of those product-types in the database.
Example of an EAV model
The generic entity we need is the Product. Instead of having a Product table that, with a constant set of columns, represents actual products, we can create a ProductDefinitions table. This table will contain only the columns needed to define a product type, rather than columns that define actual instances of that product.
Next, we need a scalable set of Product attributes. These attributes could be shared among different products, but for the sake of keeping it ’simple’, let’s assume that each product has its own attributes. Each attribute contains the fixed columns for properties that are needed for its definition.
Now that we have tables that describe the Product “class” with a scalable set of attributes, we need the tables to keep the actual data.
Let’s name the main table the Items table. This table shouldn’t contain much more than an Id and a reference to a ProductDefinition, basically describing a class-instance sort of a relationship.
All other property values of an Item should exist as separate objects, in the form of a Value table. And it is this soon that things can get a bit ugly…
When defining ProductAttributes, we’ll probably come to the realization that an attribute needs to have a data type. The system will need to know whether the product attribute should behave like an integer, decimal, bool, string, datetime or some – perhaps also generic – value from a defined finite set (defined in some other table). To keep things generic, let’s create an AttributeTypes table.
Now that we have the attribute types defined, we can create an ItemValues table (or tables), which will hold the values for each Item. There are different approaches and none of them are, unfortunately, particularly appealing:
1. Make a single string (varchar) column for holding any kind of data type, then save the data as JSON or some other similar format.
Although this approach does reduce the table’s complexity, serializing, and deserializing data to and from the single string column every time you access the value can be a little tedious. There’s also no automatic insurance that the string contains value as defined by the AttributeType of its ProductAttribute.
2. Make each row of ItemValues contain a column for each of the AttributeTypes.
This approach allows a ProductAttribute to change its AttributeType without much fuss. But as for drawbacks, the first one is obviously the redundancy in having empty columns for the types that are not used. It’s also worth noting that, if needed, making sure that these columns are really NULL has to be done in code. Finally, adding a new AttributeType requires changes to the database itself, even if such additions are not likely to happen very often.
3. Make separate tables for each AttributeType, then create a ‘value’ column only for the type that each table refers to.
The advantage here is that we have no empty columns. The downside is that because we have separate tables, we also have separate collections of ‘value’” in the EF Core entities. All the same, with some code adjustments it’s possible to use all the values in the same collection. Another thing to note is that adding new AttributeType tables involves making some adjustments to the code.
Once we’ve chosen our approach, we then have our EAV model in the database. It can hold Items of a selected ProductDefinition. Each Item holds a scalable collection of Values, which refer to a ProductAttribute, and have an AttributeType.
Things to consider
Before opting for an EAV as the model of choice for your application, there are some things to consider.
As opposed to just selecting a number of rows from a single table, fetching Items with all their values – which is probably something you’ll need to do – can become a massive SQL join if you use. Include() statements without care, especially queries. Creating the references for child-objects (Item -> ItemValue -> ProductAttribute) in memory can cause bad performance. There might be places where you’ll need to consider what you actually need to fetch. You also might have to consider whether it’ll be quicker to manually create the references for child objects, instead of just calling. Include().ThenInclude().ThenInclude()…
Simple tasks like filtering and grouping Items become far less simple when you have an EAV model. Filtering a list of Items is no longer a matter of a simple. Where() on some collection. You have to find the ItemValue by the ProductAttribute it references, take the ItemValue’s AttributeType into consideration and then do the comparison of the Value itself. In the background that will again cause JOINs to be created. If you’re not concentrating, with dynamically created queries (concatenating Where() statements) you can easily end up with horror-queries.
EAV is all fun and games until you need some specific business logic. It is very likely that the logic will need some of the generic attributes that are identified as specific attributes.
What this means in a practical sense is that a generic attribute – Price – will be used to do some special calculation, and you will need a way to find that data.
At that point, it’ll be a valid design question whether it’ll be better to add those attributes as fixed columns in the Item table.
It’ll certainly be simple, but if done so, these specific attributes won’t be able to interact with other generic attributes, in case they need to – which might just defeat the purpose of having an EAV model.
Another option would be to add a property to the ProductAttribute, specifying which system-type that property represents.
If you use a fixed enum, this is the point where your ProductAttribute table stops being completely generic, but it’s relatively easy to identify special attributes.
Probably, it would be best to create a system that uses foreign key references to find properties with certain specific identities – for example, a ProductGenericBridge table with fields such as ProductAttributeIdUsedAsPrice and ProductAttributeIdUsedAsTax.
In that case, you can keep the EAV tables completely generic, and you can keep the ‘bridge’ completely separate. Changes to the ProductGenericBridge occur only when there are changes to the ‘system properties’ used by the business logic. The generic tables can be completely unaware of that.
This, of course, means that after the code has been developed, it hasn’t necessarily been done. When deploying such a system, the bridge has to be set up ‘manually’ for each Product that the user creates. Otherwise, there might be some mission-critical properties missing from the database.
The EAV model embodies a lot of possibilities but also creates the complexity and performance implications that are not appropriate for all types of systems.