This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name
to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags
and categories
tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
- How can you make one of the categories mandatory (a conventional column uses a
NOT NULL
constraint)?
- How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
- How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the
tag
column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).