Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
154 views
in Technique[技术] by (71.8m points)

multithreading - Atomic UPDATE .. SELECT in Postgres

I'm building a queuing mechanism of sorts. There are rows of data that need processing, and a status flag. I'm using an update .. returning clause to manage it:

UPDATE stuff
SET computed = 'working'
WHERE id = (SELECT id from STUFF WHERE computed IS NULL LIMIT 1)
RETURNING * 

Is the nested select part the same lock as the update, or do I have a race condition here? If so, does the inner select need to be a select for update?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

While Erwin's suggestion is possibly the simplest way to get correct behavior (so long as you retry your transaction if you get an exception with SQLSTATE of 40001), queuing applications by their nature tend to work better with requests blocking for a chance to take their turn at the queue than with the PostgreSQL implementation of SERIALIZABLE transactions, which allows higher concurrency and is somewhat more "optimistic" about the chances of collision.

The example query in the question, as it stands, in the default READ COMMITTED transaction isolation level would allow two (or more) concurrent connections to both "claim" the same row from the queue. What will happen is this:

  • T1 starts and gets as far as locking the row in the UPDATE phase.
  • T2 overlaps T1 in execution time and attempts to update that row. It blocks pending the COMMIT or ROLLBACK of T1.
  • T1 commits, having successfully "claimed" the row.
  • T2 tries to update the row, finds that T1 already has, looks for the new version of the row, finds that it still satisfies the selection criteria (which is just that id matches), and also "claims" the row.

It can be modified to work correctly (if you are using a version of PostgreSQL which allows the FOR UPDATE clause in a subquery). Just add FOR UPDATE to the end of the subquery which selects the id, and this will happen:

  • T1 starts and now locks the row before selecting the id.
  • T2 overlaps T1 in execution time and blocks while trying to select an id, pending the COMMIT or ROLLBACK of T1.
  • T1 commits, having successfully "claimed" the row.
  • By the time T2 is able to read the row to see the id, it sees that it has been claimed, so it finds the next available id.

At the REPEATABLE READ or SERIALIZABLE transaction isolation level, the write conflict would throw an error, which you could catch and determine was a serialization failure based on the SQLSTATE, and retry.

If you generally want SERIALIZABLE transactions but you want to avoid retries in the queuing area, you might be able to accomplish that by using an advisory lock.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...