Suppose we have a songs
table, and each song
can have any number of labels (0, 1, 2, 3 or more).
2 approaches to storing this info in a database are:
2 tables: a songs
table, and a categories
table, where each row in the categories table would have song_id
, and category
(where category is "Rock", "Country", "Metal" etc etc). If a song belongs to multiple categories, there would be multiple rows with that song_id in the categories table.
3 tables: a songs
table, a songscategories
table, and a categories
table. The songscategories table would have just two columns: song_id
and category_id
, and the categories table would also have just two columns category_id
and category_name
The goal is to avoid future problems that could arise from failing to think carefully about the best schema now.
What I know so far:
- The first approach uses fewer tables and is therefore simpler
- The first approach could require more storage, since the category names need to be remembered many times (rather than just once as with the second approach). If more info is stored for each category, then they will have to be extra columns in the
categories
table, meaning even more duplicated info.
- The second approach requires more joins to retrieve a song and its category (2 joins rather than 1), so it could be slower
So the question is should we optimise for fewer tables and joins, or for consuming less storage space? What do other applications do in this situation, and are there considerations I haven't noted above?
question from:
https://stackoverflow.com/questions/65641491/best-practice-database-schema-for-multi-label-on-a-resource 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…