Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

MongoDB: How to move a big chunk of data from one collection to another without interrupting concurrent querying?

I need to move a big chunk of data (100Mb) from FirstCollection to SecondCollection in MongoDB.

Both collections contain millions of other documents that should remain intact.

The SecondCollection already contains similar documents. These documents needs to be removed.

Meanwhile, the SecondCollection is being actively queried by users. The scenario where a user queries the SecondCollection and receives no results or partially-replaced results is not acceptable.

How do I do that?

So far the $out aggregation operator seems like a good candidate, but there seems to be no way to delete data within the same operation before $out.

db.FirstCollection.aggregate([{ $match: {...} }, { $out: SecondCollection }])

Session-Transaction way seems to be designed for different scenarios, not for transferring these big amounts of data because the default transaction limit is 60 seconds and it's not enough. Also, this approach requires actually pulling this huge chunk of data from MongoDB to the NodeJS app and then write it back.

Here are some examples of the data in FirstCollection:

{
  _id: ..., // just a regular mongodb ObjectId, it's not important
  productName: "Product1",
  productId: "product_001", // persistent unique identifier
  category: "firstCategory", // only "firstCategory" products should be updated
  quantity: 10
  // and hundreds of other changing properties like quantity 
}
{
  _id: ...,
  productName: "Product2",
  productId: "product_002",
  category: "firstCategory",
  productQuantity: 20
  ...
}
{
  _id: ...,
  productName: "Product3",
  productId: "product_003",
  category: "firstCategory",
  productQuantity: 30
  ...
}

SecondCollection:

{
  _id: ...,
  productName: "Product1",
  productId: "product_001",
  category: "firstCategory",
  quantity: 11 // <= this will change to 10
  // and hundreds of other changing properties like quantity 
}
{
  _id: ...,
  productName: "Product2",
  productId: "product_002",
  category: "firstCategory",
  productQuantity: 20 // <= this will remain the same 
  ...
}
{
  _id: ...,
  productName: "Product4",
  productId: "product_004", // <= this whole document will be deleted, since there is no "product_004" in the FirstCollection.
  category: "firstCategory",
  productQuantity: 40
  ...
}

The SecondCollection after update look exactly like the FirstCollection:

{
  _id: ...,
  productName: "Product1",
  productId: "product_001",
  category: "firstCategory",
  quantity: 10
  ...
}
{
  _id: ...,
  productName: "Product2",
  productId: "product_002",
  category: "firstCategory",
  productQuantity: 20
  ...
}
{
  _id: ...,
  productName: "Product3",
  productId: "product_003",
  category: "firstCategory",
  productQuantity: 30
  ...
}
question from:https://stackoverflow.com/questions/65914565/mongodb-how-to-move-a-big-chunk-of-data-from-one-collection-to-another-without

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Adjust all queries to be in transactions using snapshot read concern.

Perform the update in (another) transaction.

https://github.com/p-mongo/tests/tree/master/query-tx-write


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...