aiohttp
with Native Coroutines (async
/await
)
Here is a typical pattern that accomplishes what you're trying to do. (Python 3.7+.)
One major change is that you will need to move from requests
, which is built for synchronous IO, to a package such as aiohttp
that is built specifically to work with async
/await
(native coroutines):
import asyncio
import aiohttp # pip install aiohttp aiodns
async def get(
session: aiohttp.ClientSession,
color: str,
**kwargs
) -> dict:
url = f"https://api.com/{color}/"
print(f"Requesting {url}")
resp = await session.request('GET', url=url, **kwargs)
# Note that this may raise an exception for non-2xx responses
# You can either handle that here, or pass the exception through
data = await resp.json()
print(f"Received data for {url}")
return data
async def main(colors, **kwargs):
# Asynchronous context manager. Prefer this rather
# than using a different session for each GET request
async with aiohttp.ClientSession() as session:
tasks = []
for c in colors:
tasks.append(get(session=session, color=c, **kwargs))
# asyncio.gather() will wait on the entire task set to be
# completed. If you want to process results greedily as they come in,
# loop over asyncio.as_completed()
htmls = await asyncio.gather(*tasks, return_exceptions=True)
return htmls
if __name__ == '__main__':
colors = ['red', 'blue', 'green'] # ...
# Either take colors from stdin or make some default here
asyncio.run(main(colors)) # Python 3.7+
There are two distinct elements to this, one being the asynchronous aspect of the coroutines and one being the concurrency introduced on top of that when you specify a container of tasks (futures):
- You create one coroutine
get
that uses await
with two awaitables: the first being .request
and the second being .json
. This is the async aspect. The purpose of await
ing these IO-bound responses is to tell the event loop that other get()
calls can take turns running through that same routine.
- The concurrent aspect is encapsulated in
await asyncio.gather(*tasks)
. This maps the awaitable get()
call to each of your colors
. The result is an aggregate list of returned values. Note that this wrapper will wait until all of your responses come in and call .json()
. If, alternatively, you want to process them greedily as they are ready, you can loop over asyncio.as_completed
: each Future object returned represents the earliest result from the set of the remaining awaitables.
Lastly, take note that asyncio.run()
is a high-level "porcelain" function introduced in Python 3.7. In earlier versions, you can mimic it (roughly) like:
# The "full" versions makes a new event loop and calls
# loop.shutdown_asyncgens(), see link above
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main(colors))
finally:
loop.close()
Limiting Requests
There are a number of ways to limit the rate of concurrency. For instance, see asyncio.semaphore
in async-await function or large numbers of tasks with limited concurrency.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…