Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
661 views
in Technique[技术] by (71.8m points)

ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

When a Heroku worker is restarted (either on command or as the result of a deploy), Heroku sends SIGTERM to the worker process. In the case of delayed_job, the SIGTERM signal is caught and then the worker stops executing after the current job (if any) has stopped.

If the worker takes to long to finish, then Heroku will send SIGKILL. In the case of delayed_job, this leaves a locked job in the database that won't get picked up by another worker.

I'd like to ensure that jobs eventually finish (unless there's an error). Given that, what's the best way to approach this?

I see two options. But I'd like to get other input:

  1. Modify delayed_job to stop working on the current job (and release the lock) when it receives a SIGTERM.
  2. Figure out a (programmatic) way to detect orphaned locked jobs and then unlock them.

Any thoughts?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Abort Job Cleanly on SIGTERM

A much better solution is now built into delayed_job. Use this setting to throw an exception on TERM signals by adding this in your initializer:

Delayed::Worker.raise_signal_exceptions = :term

With that setting, the job will properly clean up and exit prior to heroku issuing a final KILL signal intended for non-cooperating processes:

You may need to raise exceptions on SIGTERM signals, Delayed::Worker.raise_signal_exceptions = :term will cause the worker to raise a SignalException causing the running job to abort and be unlocked, which makes the job available to other workers. The default for this option is false.

Possible values for raise_signal_exceptions are:

  • false - No exceptions will be raised (Default)
  • :term - Will only raise an exception on TERM signals but INT will wait for the current job to finish.
  • true - Will raise an exception on TERM and INT

Available since Version 3.0.5.

See this commit where it was introduced.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...