In SQL and Relational Theory (C.J. Date, 2009) chapter 4 advocates avoiding duplicate rows, and also to avoid NULL
attributes in the data we store. While I have no troubles avoiding duplicate rows, I am struggling to see how I can model data without making use of NULL
. Take the following, for example - which is a bit from work.
We have an artist
table, which has, amongst other columns, a gender
column. This is a foreign key to the gender
table. However, for some artists, we don't know their gender - for example we've been given a list of new music which has no descriptions of the artist. How, without using NULL
is one meant to represent this data? The only solution I see is to add a new gender, "unknown", to the gender
table.
While I am thoroughly enjoying this book, I was really disappointed when the chapter concluded with:
Of course, if nulls are prohibited, then missing information will have to be handled by some other means. Unfortunately, those other means are much too complex to be discussed in detail here.
Which is a real shame - because this was the solution I was waiting to read about! There is a reference to read the appendix which has lots of publications to read, but I was hoping for a little bit more of a down to earth summary before I dived into reading these.
I'm getting a few people commenting that they don't understand why I wish to avoid 'NULL' so I will quote the book again. Take the following query:
SELECT s.sno, p.pno
FROM s, p
WHERE s.city <> p.city
OR p.city <> 'Paris'
Now, take the example that s.city is London, and p.city is Paris. In this case, London <> Paris, so the query is true. Now take the case that p.city is not Paris, and is infact xyz. In this case, (London <> xyz) OR (xyz <> Paris) is also True. So, given any data - this query is true. However, if xyz is 'NULL' the scenario changes. In this case both of these expressions are neither True nor False, they are in fact, Unknown. And in this case because the result is unknown you will not get any rows returned.
The move from 2 value logic to 3 value logic can easily introduce bugs like this. Infact, I just introduced one at work which motivated this very post. I wanted all rows where the type != 0
However, this actually ends up matching type == 0 OR type IS NULL
- confusing behavior.
Whether or not I model my data with or without NULL
in the future is unclear, but I'm very curious what the other solutions are. (I too have always been of the argument that if you don't know, you should use NULL
).
See Question&Answers more detail:
os