In MongoDB, a unique index ensures that a particular value in a field is not present in more than one document. It will not guarantee that a value is unique across an array within a single document. This is explained here in the MongoDB Manual where it discusses unique multikey Indexes.
Thus, a unique index will not satisfy your requirement. It will prevent seperate documents from containing duplicate combinations, but it will still allow a single document to contain duplicate values across an array.
The best option you have is to change your data model so as to split the array of technologyEmployeeRef objects into separate documents. Splitting it up into separate documents will allow you to use a unique index to enforce uniqueness.
The particular implementation that should be taken for this data model change would depend upon your access pattern (which is out of the scope of this question).
One such way this could be done is to create a TechnologyEmployee collection that has all of the fields that currently exist in the technologyEmployeeRef array. Additionally, this TechnologyEmployee collection would have a field, such as email, which would allow you to associate it with a document in the Employee collection.
Sample Employee Document
{
....
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "[email protected]",
.....
.....
.....
}
Sample EmployeeTechnology Document
{
"email" : "[email protected]",
"technologyCd" : "Java",
"technologyName" : "Java8",
....
.....
"status" : "A"
}
Index in EmployeeTechnology collection
{'email' : 1, 'technologyCd' : 1}, {unique: true}
The disadvantage of this approach is that you would need to read from two collections to have all of the data. This drawback may not be a big deal if you rarely need to retrieve the data from both collections at the same time. If you do need all the data, it can be sped up through use of indexes. With the indexes, it could be furthered sped up through the use of covered queries.
Another option is to denormalize the data. You would do this by duplicating the Employee data that you need to access at the same time as the Technology data.
Sample Documents
[
{
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "[email protected]",
.....
"technologyCd" : "Java",
"technologyName" : "Java8",
....
"status" : "A"
},
{
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "[email protected]",
.....
"technologyCd" : "Spring",
"technologyName" : "Spring Boot2",
....
"status" : "A"
}
]
In this MongoDB blog post,they say that
You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic.
Or as you've already mentioned, it may make sense to leave the data model as it is and to perform the check for uniqueness on the application side. This could likely give you the best read performance, but it does come with some disadvantages. First, it will slow down write operations because the application will need to run some checks before it can update the database.
It may be unlikely, but there is also a possibility that you could still end up with duplicates. If there are two back-to-back requests to insert the same EmployeeTechnology object into the array, then the validation of the second request may finish (and pass) before the first request has written to the database. I have seen a similar scenario myself with an application I worked on. Even though the application was checking for uniqueness, if a user double-clicked a submit button there would end up being duplicate entries in the database. In this case, disabling the button on the first click drastically reduced the risk. This small risk may be tolerable, depending on your requirements and the impact of having duplicate entries.
Which approach makes the most sense largely depends on your access pattern and requirements. Hope this helps.