ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

Question

Welcome To Ask or Share your Answers For Others

ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

When a Heroku worker is restarted (either on command or as the result of a deploy), Heroku sends SIGTERM to the worker process. In the case of delayed_job, the SIGTERM signal is caught and then the worker stops executing after the current job (if any) has stopped.

If the worker takes to long to finish, then Heroku will send SIGKILL. In the case of delayed_job, this leaves a locked job in the database that won't get picked up by another worker.

I'd like to ensure that jobs eventually finish (unless there's an error). Given that, what's the best way to approach this?

I see two options. But I'd like to get other input:

Modify delayed_job to stop working on the current job (and release the lock) when it receives a SIGTERM.
Figure out a (programmatic) way to detect orphaned locked jobs and then unlock them.

Any thoughts?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:20:53+0000

Abort Job Cleanly on SIGTERM

A much better solution is now built into delayed_job. Use this setting to throw an exception on TERM signals by adding this in your initializer:

Delayed::Worker.raise_signal_exceptions = :term

With that setting, the job will properly clean up and exit prior to heroku issuing a final KILL signal intended for non-cooperating processes:

You may need to raise exceptions on SIGTERM signals, Delayed::Worker.raise_signal_exceptions = :term will cause the worker to raise a SignalException causing the running job to abort and be unlocked, which makes the job available to other workers. The default for this option is false.

Possible values for raise_signal_exceptions are:

false - No exceptions will be raised (Default)
:term - Will only raise an exception on TERM signals but INT will wait for the current job to finish.
true - Will raise an exception on TERM and INT

Available since Version 3.0.5.

See this commit where it was introduced.

Categories

ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

ruby on rails - Long running delayed_job jobs stay locked after a restart on Heroku

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Abort Job Cleanly on SIGTERM

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags