There are two concepts that are close but should not be confused: IDENTITY
and PRIMARY KEY
Every table (except for the rare conditions) should have a PRIMARY KEY
, that is a value or a set of values that uniquely identify a row.
See here for discussion why.
IDENTITY
is a property of a column in SQL Server
which means that the column will be filled automatically with incrementing values.
Due to the nature of this property, the values of this column are inherently UNIQUE
.
However, no UNIQUE
constraint or UNIQUE
index is automatically created on IDENTITY
column, and after issuing SET IDENTITY_INSERT ON
it's possible to insert duplicate values into an IDENTITY
column, unless it had been explicity UNIQUE
constrained.
The IDENTITY
column should not necessarily be a PRIMARY KEY
, but most often it's used to fill the surrogate PRIMARY KEY
s
It may or may not be useful in any particular case.
Therefore, the answer to your question:
The question: should every table in a database have an IDENTITY field that's used as the PK?
is this:
No. There are cases when a database table should NOT have an IDENTITY
field as a PRIMARY KEY
.
Three cases come into my mind when it's not the best idea to have an IDENTITY
as a PRIMARY KEY
:
- If your
PRIMARY KEY
is composite (like in many-to-many link tables)
- If your
PRIMARY KEY
is natural (like, a state code)
- If your
PRIMARY KEY
should be unique across databases (in this case you use GUID
/ UUID
/ NEWID
)
All these cases imply the following condition:
You shouldn't have IDENTITY
when you care for the values of your PRIMARY KEY
and explicitly insert them into your table.
Update:
Many-to-many link tables should have the pair of id
's to the table they link as the composite key.
It's a natural composite key which you already have to use (and make UNIQUE
), so there is no point to generate a surrogate key for this.
I don't see why would you want to reference a many-to-many
link table from any other table except the tables they link, but let's assume you have such a need.
In this case, you just reference the link table by the composite key.
This query:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a
ON a.id = br.a_id
is much more efficient than this one:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a_to_b ab
ON br.ab_id = ab.id
JOIN a
ON a.id = ab.a_id
, for obvious reasons.