As shown in this answer you can use the threading package to perform an asynchronous task. Everyone seems to recommend Celery, but it is often overkill for performing simple but long running tasks. I think it's actually easier and more transparent to use threading.
Here's a simple example for asyncing a crawler:
#views.py
import threading
from .models import Crawl
def startCrawl(request):
task = Crawl()
task.save()
t = threading.Thread(target=doCrawl,args=[task.id])
t.setDaemon(True)
t.start()
return JsonResponse({'id':task.id})
def checkCrawl(request,id):
task = Crawl.objects.get(pk=id)
return JsonResponse({'is_done':task.is_done, result:task.result})
def doCrawl(id):
task = Crawl.objects.get(pk=id)
# Do crawling, etc.
task.result = result
task.is_done = True
task.save()
Your front end can make a request for startCrawl
to start the crawl, it can make an Ajax request to check on it with checkCrawl
which will return true and the result when it's finished.
Update for Python3:
The documentation for the threading
library recommends passing the daemon
property as a keyword argument rather than using the setter:
t = threading.Thread(target=doCrawl,args=[task.id],daemon=True)
t.start()
Update for Python <3.7:
As discussed here, this bug can cause a slow memory leak that can overflow a long running server. The bug was fixed for Python 3.7 and above.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…