Update 2009.04.24
The main point of my question is not developer confusion and what to do about it.
The point is to understand when delimited values are the right solution.
I've seen delimited data used in commercial product databases (Ektron lol).
SQL Server even has an XML datatype, so that could be used for the same purpose as delimited fields.
/end Update
The application I'm designing has some many-to-many relationships. In the past, I've often used associative tables to represent these in the database. This has caused some confusion to the developers.
Here's an example DB structure:
Document
---------------
ID (PK)
Title
CategoryIDs (varchar(4000))
Category
------------
ID (PK)
Title
There is a many-to-many relationship between Document and Category.
In this implementation, Document.CategoryIDs is a big pipe-delimited list of CategoryIDs.
To me, this is bad because it requires use of substring matching in queries -- which cannot make use of indexes. I think this will be slow and will not scale.
With that model, to get all Documents for a Category, you would need something like the following:
select * from documents where categoryids like '%|' + @targetCategoryId + '|%'
My solution is to create an associative table as follows:
Document_Category
-------------------------------
DocumentID (PK)
CategoryID (PK)
This is confusing to the developers. Is there some elegant alternate solution that I'm missing?
I'm assuming there will be thousands of rows in Document. Category may be like 40 rows or so. The primary concern is query performance. Am I over-engineering this?
Is there a case where it's preferred to store lists of IDs in database columns rather than pushing the data out to an associative table?
Consider also that we may need to create many-to-many relationships among documents. This would suggest an associative table Document_Document. Is that the preferred design or is it better to store the associated Document IDs in a single column?
Thanks.
See Question&Answers more detail:
os