It is generally good to store multiple entity types per collection. Whether to store entity types within to a single document or not takes a bit more thought.
As David mentioned - how to model data is a bit subjective.
Storing Multiples Entity Types within a Collection
First... let's talk about storing multiple entities in a collection. DocumentDB collections are not tables. Collections do not enforce schema; in other words, you can store different types of documents with different schemas in the same collection. You can track different types of entities simply by adding a type attribute to your document.
You should think of Collections as a unit of partition and boundary for the execution of queries and transactions. Thus a huge perk for storing different entity types within the same collection is you get transaction support right out of the box via sprocs.
Storing Multiple Entity Types within a Document
Whether you store multiple entity types within a single document takes a bit more thought. This is commonly referred to de-normalizing (capturing relationships between data by embedding data in a single document) and normalizing (capturing relationships between data by creating weak links to o other documents) your data.
Typically de-normalizing provides better read performance.
The application may need to issue fewer queries and updates to complete common operations.
In general, use de-normalized data models when:
- have “contains” relationships between entities
- have one-to-few relationships between entities
- de-normalized data changes infrequently
- de-normalized data won’t grow without bound
- de-normalized data is integral to data in document
Example of a de-normalized data model:
{
"Id": 1001,
"Type": "Company",
"Industry": "Software",
"Employees": [
{
"Id": 10011,
"Type": "Employee",
"Name": "John Doe"
},
{
"Id": 10012,
"Type": "Employee",
"Name": "Jane Doe"
}
]
}
Typically normalizing provides better write performance.
Provides more flexibility than de-normalizing
Client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.
In general, use normalized data models:
- when de-normalizing would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
- representing one-to-many relationships
- represent many-to-many relationships.
- related data changes frequently
Example of a normalized data model:
{
"Id": 1001,
"Type": "Company",
"Industry": "Software"
}
{
"Id": 10011,
"Type": "Employee",
"Name": "John Doe",
"CompanyId": 1001
}
{
"Id": 10012,
"Type": "Employee",
"Name": "Jane Doe",
"CompanyId": 1001
}
Hybrid Approaches
Choosing between normalizing and de-normalizing doesn't have to be a black and white choice. I've often found that a winning design pattern is a hybrid approach, in which you may choose to normalize a partial set of an object's fields and de-normalize the others.
In other words, you could choose to de-normalize frequently read stable (or immutable) properties to reduce the need for follow up queries, while normalize frequently written / mutating fields to reduce the need for fanning out writes.
Example of a hybrid approach:
// Author documents:
[{
"id": 1,
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [{
"thumbnail": "http://....png"
}, {
"profile": "http://....png"
}, {
"large": "http://....png"
}]
}, {
"id": 2,
"firstName": "William",
"lastName": "Wakefield",
"countOfBooks": 1,
"books": [1, 4, 5],
"images": [{
"thumbnail": "http://....png"
}]
}]
// Book documents:
[{
"id": 1,
"name": "DocumentDB 101",
"authors": [{
"id": 1,
"name": "Thomas Andersen",
"thumbnailUrl": "http://....png"
}, {
"id": 2,
"name": "William Wakefield",
"thumbnailUrl": "http://....png"
}]
}, {
"id": 2,
"name": "DocumentDB for RDBMS Users",
"authors": [{
"id": 1,
"name": "Thomas Andersen",
"thumbnailUrl": "http://....png"
}, ]
}]