By default all "numbers" are stored as "double" in MongoDB unless generally cast overwise.
Take the following samples:
db.sample.insert({ "a": 1 })
db.sample.insert({ "a": NumberLong(1) })
db.sample.insert({ "a": NumberInt(1) })
db.sample.insert({ "a": 1.223 })
This yields a collection like this:
{ "_id" : ObjectId("559bb1b4a23c8a3da73e0f76"), "a" : 1 }
{ "_id" : ObjectId("559bb1bba23c8a3da73e0f77"), "a" : NumberLong(1) }
{ "_id" : ObjectId("559bb29aa23c8a3da73e0f79"), "a" : 1 }
{ "_id" : ObjectId("559bb30fa23c8a3da73e0f7a"), "a" : 1.223 }
Despite the different constructor functions note how several of the data points there look much the same. The MongoDB shell itself doesn't always clearly distinquish between them, but there is a way you can tell.
There is of course the $type
query operator, which allows selection of BSON Types.
So testing this with Type 1 - Which is "double":
> db.sample.find({ "a": { "$type": 1 } })
{ "_id" : ObjectId("559bb1b4a23c8a3da73e0f76"), "a" : 1 }
{ "_id" : ObjectId("559bb30fa23c8a3da73e0f7a"), "a" : 1.223 }
You see that both the first insert and the last are selected, but of course not the other two.
So now test for BSON Type 16 - which is a 32-bit integer
> db.sample.find({ "a": { "$type": 16 } })
{ "_id" : ObjectId("559bb29aa23c8a3da73e0f79"), "a" : 1 }
That was the "third" insertion which used the NumberInt()
function in the shell. So that function and other serialization from your driver can set this specific BSON type.
And for the BSON Type 18 - which is 64-bit integer
> db.sample.find({ "a": { "$type": 18 } })
{ "_id" : ObjectId("559bb1bba23c8a3da73e0f77"), "a" : NumberLong(1) }
The "second" insertion which was contructed via NumberLong()
.
If you wanted to "weed out" things that were "not a double" then you would do:
db.sample.find({ "$or": [{ "a": { "$type": 16 } },{ "a": { "$type": 18 } }]})
Which are the only other valid numeric types other than "double" itself.
So to "convert" these in your collection, you can "Bulk" process like this:
var bulk = db.sample.initializeUnorderedBulkOp(),
count = 0;
db.sample.find({
"$or": [
{ "a": { "$type": 16 } },
{ "a": { "$type": 18 } }
]
}).forEach(function(doc) {
bulk.find({ "_id": doc._id })
.updateOne({
"$set": { "b": doc.a.valueOf() } ,
"$unset": { "a": 1 }
});
bulk.find({ "_id": doc._id })
.updateOne({ "$rename": { "b": "a" } });
count++;
if ( count % 1000 == 0 ) {
bulk.execute()
bulk = db.sample.initializeUnOrderedBulkOp();
}
})
if ( count % 1000 != 0 ) bulk.execute();
What that does is performed in three steps "in bulk":
- Re-cast the value to a new field as a "double"
- Remove the old field with the unwanted type
- Rename the new field to the old field name
This is necessary since the BSON type information is "sticky" to the field element once created. So in order to "re-cast" you need to completely remove the old data which includes the original field assignment.
So that should explain how to "detect" and also "re-cast" unwanted types in your documents.