I've a Django 2 application deployed on AWS Elastic Beanstalk and I'm trying to configure Celery in order to exec async tasks on the same machine.
My files:
02_packages.config
files:
"/usr/local/share/pycurl-7.43.0.tar.gz" :
mode: "000644"
owner: root
group: root
source: https://pypi.python.org/packages/source/p/pycurl/pycurl-7.43.0.tar.gz
packages:
yum:
python34-devel: []
libcurl-devel: []
commands:
01_download_pip3:
# run this before PIP installs requirements as it needs to be compiled with OpenSSL
command: 'curl -O https://bootstrap.pypa.io/get-pip.py'
02_install_pip3:
# run this before PIP installs requirements as it needs to be compiled with OpenSSL
command: 'python3 get-pip.py'
container_commands:
03_pycurl_reinstall:
# run this before PIP installs requirements as it needs to be compiled with OpenSSL
# the upgrade option is because it will run after PIP installs the requirements.txt file.
# and it needs to be done with the virtual-env activated
command: 'source /opt/python/run/venv/bin/activate && pip3 install /usr/local/share/pycurl-7.43.0.tar.gz --global-option="--with-nss" --upgrade'
03_django.config
container_commands:
01_migrate_db:
command: "django-admin.py migrate --noinput"
leader_only: true
02_createsu: # custom django-admin command to create the "admin" superuser
command: "source /opt/python/run/venv/bin/activate && python manage.py createsu"
leader_only: true
03_update_permissions: # custom django-admin command to update user perms
command: "source /opt/python/run/venv/bin/activate && python manage.py update_permissions"
leader_only: true
04_collectstatic:
command: "django-admin.py collectstatic --noinput"
05_pip_upgrade:
command: /opt/python/run/venv/bin/pip install --upgrade pip
ignoreErrors: false
option_settings:
aws:elasticbeanstalk:application:environment:
DJANGO_SETTINGS_MODULE: "my_proj.settings_prod"
APP_ENV: "test"
PYCURL_SSL_LIBRARY: "nss"
aws:elasticbeanstalk:container:python:
WSGIPath: myproj/wsgi.py
NumProcesses: 3
NumThreads: 20
aws:elasticbeanstalk:container:python:staticfiles:
"/static/": "static/"
requirements.txt
boto3==1.6.3
botocore==1.9.3
Django==2.0.3
django-cors-headers==2.2.0
django-filter==1.1.0
django-storages==1.6.5
djangorestframework==3.7.7
djangorestframework-jwt==1.11.0
docutils==0.14
jmespath==0.9.3
Markdown==2.6.11
olefile==0.44
Pillow==5.0.0
psycopg2==2.7.3.2
PyJWT==1.5.3
python-dateutil==2.6.1
pytz==2018.3
reportlab==3.4.0
s3transfer==0.1.13
six==1.11.0
Wand==0.4.4
uwsgi==2.0.17 # WSGI for production deployment
gevent==1.2.2 # Non-blocking Python network library, required by uWSGI
celery==4.1.0
django_celery_beat==1.1.1
django_celery_results==1.0.1
celery_conf/config.py
AWS_ACCESS_KEY_ID = ...
AWS_SECRET_ACCESS_KEY = ...
CELERY_BROKER_TRANSPORT = 'sqs'
CELERY_BROKER_URL = 'sqs://' # 'sqs://%s:%s@' % (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
CELERY_BROKER_USER = AWS_ACCESS_KEY_ID
CELERY_BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
CELERY_WORKER_STATE_DB = '/var/run/celery/worker.db'
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
CELERY_WORKER_PREFETCH_MULTIPLIER = 0 # See https://github.com/celery/celery/issues/3712
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERY_DEFAULT_QUEUE = 'myproj-django' # Queue name
CELERY_QUEUES = {
CELERY_DEFAULT_QUEUE: {
'exchange': CELERY_DEFAULT_QUEUE,
'binding_key': CELERY_DEFAULT_QUEUE,
}
}
CELERY_BROKER_TRANSPORT_OPTIONS = {
"region": "us-east-1", # US East (N. Virginia)
'visibility_timeout': 360,
'polling_interval': 1
}
CELERY_RESULT_BACKEND = 'django-db'
myproj/celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproj.settings_prod')
app = Celery('myproj')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
if __name__ == '__main__':
app.start()
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
myproj/myapp/tasks.py
from __future__ import absolute_import, unicode_literals
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
@task()
def do_something():
logger.info('******** CALLING ASYNC TASK WITH CELERY **********')
settings_prod.py
# Importing base settings
from .settings import *
DEBUG = False
# Importing Celery configurations
from celery_conf.config import *
INSTALLED_APPS += ('django_celery_beat',)
UPDATE 1
Since according to /var/log/celery-beat.log, it seems that celery is not able to find my project module. I think my project structure is not the one that Celery is expecting. How I can make it works without changing the whole project structure?
My project structure is the following:
-- myprof-folder/
-- requirements.txt
-- .ebextensions/
-- celery_conf/
-- __init__.py
-- config.py
-- myproj/
-- __init__.py
-- settings.py # base settings
-- settings_prod.py # production settings
-- urls.py
-- wsgi.py
-- myapp1/
-- models.py
-- urls.py
-- apps.py
-- views.py
-- tasks.py # here my app's tasks
-- ...
-- myapp2/
-- myapp3/
-- ...
-- myappN/
UPDATE 2
99_celery.config
was using the --workdir
option with /tmp
as directory. That option is not needed. I also applied a few changes to that file.
99_celery.config
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
# Create required directories
sudo mkdir -p /var/log/celery/
sudo mkdir -p /var/run/celery/
# Create group called 'celery'
sudo groupadd -f celery
# add the user 'celery' if it doesn't exist and add it to the group with same name
id -u celery &>/dev/null || sudo useradd -g celery celery
# add permissions to the celery user for r+w to the folders just created
sudo chown -R celery:celery /var/log/celery/
sudo chown -R celery:celery /var/run/celery/
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '
' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
celeryenv=${celeryenv%?}
# Create celery configuration script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A myproj --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid"
directory=/opt/python/current/app
user=celery
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A myproj --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid"
directory=/opt/python/current/app
user=celery
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Enable supervisor to listen for HTTP/XML-RPC requests.
# supervisorctl will use XML-RPC to communicate with supervisord over port 9001.
# Source: https://askubuntu.com/questions/911994/supervisorctl-3-3-1-http-localhost9001-refused-connection
if ! grep -Fxq "[inet_http_server]" /opt/python/etc/supervisord.conf
then
echo "[inet_http_server]" | tee -a /opt/python/etc/supervisord.conf
echo "port = 127.0.0.1:9001" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
container_commands:
00_celery_tasks_run:
command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
My logs:
I SSH my EC2 instance and the following are the log files:
/var/log/celery-worker.log
Traceback (most recent call last):
File "/opt/python/run/venv/bin/celery", line 11, in <module>
sys.exit(main())
File "/opt/python/run/venv/local/lib/python3.6/site-packages/celery/__main__.py", line 14, in main
_main()
File "/opt/python/run/venv/local/lib/python3.6/site-packages/celery/bin/celery.py", line 326, in main
cmd.execute_f