I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio
. A bit naively, I wrapped my calls to requests
in an async
function and gave them to asyncio
. This doesn't work.
After searching online, I found two solutions:
- use a library like aiohttp, which is made to work with
asyncio
- wrap the blocking code in a call to
run_in_executor
To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.
from flask import Flask
import time
app = Flask(__name__)
@app.route('/')
def hello_world():
time.sleep(0.1) // heavy calculations here :)
return 'Hello World!'
if __name__ == '__main__':
app.run()
The client is my benchmark
import requests
from time import perf_counter, sleep
# this is the baseline, sequential calls to requests.get
start = perf_counter()
for i in range(10):
r = requests.get("http://127.0.0.1:5000/")
stop = perf_counter()
print(f"synchronous took {stop-start} seconds") # 1.062 secs
# now the naive asyncio version
import asyncio
loop = asyncio.get_event_loop()
async def get_response():
r = requests.get("http://127.0.0.1:5000/")
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))
stop = perf_counter()
print(f"asynchronous took {stop-start} seconds") # 1.049 secs
# the fast asyncio version
start = perf_counter()
loop.run_until_complete(asyncio.gather(
*[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))
stop = perf_counter()
print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs
#finally, aiohttp
import aiohttp
async def get_response(session):
async with session.get("http://127.0.0.1:5000/") as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
await get_response(session)
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))
stop = perf_counter()
print(f"aiohttp took {stop-start} seconds") # 0.121 secs
So, an intuitive implementation with asyncio
doesn't deal with blocking io code. But if you use asyncio
correctly, it is just as fast as the special aiohttp
framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:
# File operations (such as logging) can block the
# event loop: run them in a thread pool.
I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor
, necessary to do this?
The whole selling point of aiohttp
seems to be support for asyncio
. But as far as I can see, the requests
module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…