Database schema which can support specialized properties

Question

Welcome To Ask or Share your Answers For Others

Database schema which can support specialized properties

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Database schema which can support specialized properties

I need to store a set of entities, of which there are several specialized versions. They have some common properties, but the specialized ones contain properties specific for that entity.

Solutions

The data store is a relational DBMS, and this is not for discussion :-) Specifically, it is the Microsoft SQL Server 2005.

I could easily create a table for the common properties and then a table for each of the specialized versions. However, it is likely that new entities will have to be added to the solution later and I don't want to maintain both an object model and a database schema.

Another idea is to create a table

reading(<common properties>, extended_properties)

and have the extended_propertiesfield be some kind of serialization of the extended properties. I was thinking either JSON or XML. I will most likely be using an ORM framework, but I haven't decided yet. Either way, the object representation of a specialized entity from the reading could expose a dictionary {extended_property_name, value} containing the parsed key/value pairs from the extended_properties field.

From this http://msdn.microsoft.com/en-us/library/ms345117(SQL.90).aspx I gather that XML fields, combined with schemas for these, give the notion of typed XML inside the DBMS. Also, queries involving the XML contents in the extended_propertiesfield can take these into account, too.

What I want

Feedback on my solution suggestions, primarily the one with the reading table and serialization of the extended properties.

Also, I realize this is one of the limitations of relational DBMS' compared to key/value based stores. However, there surely must be some modelling techniques to accommodate this.

Any feedback is greatly appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:06:54+0000

Anders, do not give up any integrity or hardness, eg type safety.

(Response coming).

@Anders. No, not at all, subtyping is fine (the question is which form you use and what are the dis/advantages). Do not give up any strength or Integrity or type safety or checks or DRI. The form you choose will demand additional Checks and maybe a bit of code (depends on your platform).

This subject is coming up frequently, but the seeker always has a narrow perspective; I keep making the same statements (a subset) from an unchanging set. The idea is to evaluate all the options. So I am writing a doc. Unfortunately it is taking longer. Maybe 4 pages. Not ready to post. But the diagrams are finished, I think you are on the ball, and you can use it right away.

Warning: Experienced Project Construction Engineers Only
Road not suitable for caravans or readers with a high Eek factor

Link to ?Four Alternative Data Models? in Document Under Construction. Apologies for the mess on the floor; I will clean up soon.

?Link to IDEF1X Notation? for anyone who is unfamiliar with the Standard for modelling Relational databases.

They are all Relational, with full integrity.
The 6NF options. Relational today (SQL) does not provide support for 6NF; it does not disallow it, it just does not provide the 5NF?6NF structures. Therefore you need to build a small catalogue, what some people call "metadata". Really, it is just an extension of the standard SQL catalogue (sys tables). The level of control required is modelled in each option.
Essentially EAV done properly, with full control and integrity (type safety, Declarative Referential Integrity, etc) rather than the mess it usually is.

You may be interested in these related question/answers (in particular, look at the Data Models):

Multiple Fixed vs Abstract Flexible

Database Schema-Related Problem

"Simple" Database Design Problem

Response to Comments

... That way, we can easily grab "Comment" rows associated with a given specialized type instance. Is this the way to do that, or will I regret that decision later? Is there any other pattern we're missing?

Not sure what you mean. Comments, Notes, Addresses, end up being used (columns resident in) in many tables, so the correct method is to Normalise them; provide One Table for Comment; that is referenced from any table that requires it. Here is a generic Comment table. It is used in Product (the supertype) because you stated any Product. It can just as easily be used in some of the Product subtypes, and not others; in which case the FK will be in said Product Subtypes.

Your Data Model

What is the purpose of the ProductType table in your Product 5NF/subtype example? Does it contain a row corresponding to each specialized Product, e.g., ProductCPU? I assume it indicates which specialization the base product is.

(Small critical mistake in the diagram, corrected.)

Yes, exactly.

In Standard Relational terms (not the uncontrolled messes passing off as databases), the ProductType is the Discriminator; it identifies which of the Product Subtypes apply to this Product. Tells you which Product Subtype table you need to join with. The pair together make a logical Product. Do not forget to produce the Views, one for each ProductType.

(Do evaluate how ProductType changes, exactly what role it plays, for each of the four Data Models.)
"Generalisation-specialisation" is all mumbo jumbo, OO terminology; without crossing the line and learning what Relational has been capable of for 30 years. If you learn a little about Relational, you will have the full power; otherwise you are limited to the very limited OO approach to everything (Ambler and Fowler have a lot to answer for). Please read this post, from 11 Dec 10 onwards. Relational databases model Entities, not objects; not classes.

For example, when adding a new product you'll want to provide, say, a dropdown selection of which product types it is possible to add. Based on this selection, it can be deduced which tables to put the data in. Correct? I'm sorry for talking about application code, but I just need to put it into perspective

Yes. And what page (with fields) to provide next, for the user to enter data.

No problem talking about the app code that will use the Rdb, they go together like husband and wife (not husband and slave).

For your OO classes, map the Class tree to the Rdb, once you have finished modelling the Rdb, independent of any app that will use it. Not the other way around. And not dependent on one app.
Forget about "persisting", it has many problems (Lost Updates; damaged data integrity; problematic debugging; massive contention; etc). All updates to the Rdb should be in Transactions, with ACID compliance, available for 30 years, but Fowler and Ambler have not read about it yet. Usually that means one stored proc pre xact.

The discriminant is a FK to a Type-table as we established earlier. It denotes which spec. sub type the base type adheres to. But what does the discriminant table contain in detail?

Is that not clear from the data model ? ProducType CHAR(1) or (2). Name Char(30).

Could be a display-friendly text stating the type for UI-purposes,

Yes, among other things, such as the control, contraint, etc, elimination of ambiguity when coding or reporting.

but does it also contain the exact table name which contains the specialized type?

No. That would be a little too physical to be placed in data. Disallowed on principle.

But it is not necessary.

Say I'm interested in the Product with ID = 1. It has a discriminant indicating that it is a ProductCPU. How would you go about retrieving this ProductCPU from your app code?

That will be easy if you take the provided model, and implement it (all the tables) as classes, correctly, etc. The example you request will not use Views (which are for lists, and more generic use). The pseudo-code would be:

given the ProductId (Subtype unknown, therefore your should not be sitting a a Subtype-specific window), load the Product supertype only
based on the Discriminator Product.ProductType, set indicators, etc, and load the applicable subtype, one of ProductCPU; ProductMemory; ProductDisk; ProductTape; etc.
I have seen (and do not agree with) OO methods that load all subtypes for the given ProductId at once: one subtype is valid; and the rest are invalid. The code still has to constrain itself to the valid class for the Product based on Product.ProductType.

Alternately, eg. where the context is, the user is sitting in a Subtype-specific window, eg. ProductCPU, with that class set up, and requests ProductId xxx. Then use the ProductCPU View. If it returns zero rows, it does not exist.

There may be a ProductDisk xxx, but not a ProductCPU xxx. How you handle that, whether you indicate there is a Product`xxx but it isn't a CPU, or not, that depends on the app requirements.

For lists, where the app fills in a grid, without regard to the ProductId, use the views (one each) to load each grid. That SQL is based on the join, and does not need to refer to ProductType.

Categories

Database schema which can support specialized properties

Database schema which can support specialized properties

Solutions

What I want

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Response to Comments

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags