Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

python - pymongo.errors.CursorNotFound: cursor id '...' not valid at server

I am trying to fetch some ids that exist in a mongo database with the following code:

client = MongoClient('xx.xx.xx.xx', xxx)
db = client.test_database
db = client['...']
collection = db.test_collection
collection = db["..."]


for cursor in collection.find({ "$and" : [{ "followers" : { "$gt" : 2000 } }, { "followers" : { "$lt" : 3000 } }, { "list_followers" : { "$exists" : False } }] }): 
    print cursor['screenname']
    print cursor['_id']['uid']
    id = cursor['_id']['uid']

However, after a short while, I am receive this error:

pymongo.errors.CursorNotFound: cursor id '...' not valid at server.

I found this article which refers to that problem. Nevertheless it is not clear to me which solution to take. Is it possible to use find().batch_size(30)? What exactly does the above command do? Can I take all the database ids using batch_size?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You're getting this error because the cursor is timing out on the server (after 10 minutes of inactivity).

From the pymongo documentation:

Cursors in MongoDB can timeout on the server if they’ve been open for a long time without any operations being performed on them. This can lead to an CursorNotFound exception being raised when attempting to iterate the cursor.

When you call the collection.find method it queries a collection and it returns a cursor to the documents. To get the documents you iterate the cursor. When you iterate over the cursor the driver is actually making requests to the MongoDB server to fetch more data from the server. The amount of data returned in each request is set by the batch_size() method.

From the documentation:

Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.

Setting the batch_size to a lower value will help you with the timeout errors?errors, but it will increase the number of times you're going to get access the MongoDB server to get all the documents.

The default batch size:

For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Batch size will not exceed the maximum BSON document size (16 MB).

There is no universal "right" batch size. You should test with different values and see what is the appropriate value for your use case i.e. how many documents can you process in a 10 minute window.

The last resort will be that you set no_cursor_timeout=True. But you need to be sure that the cursor is closed after you finish processing the data.

How to avoid it without try/except:

cursor = collection.find(
     {"x": 1},
     no_cursor_timeout=True
)
for doc in cursor:
    # do something with doc
cursor.close()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...