Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
444 views
in Technique[技术] by (71.8m points)

concurrency - Fastest parallel requests in Python

I need to keep making many requests to about 150 APIs, on different servers. I work with the trading, time is crucial, I can not waste 1 millisecond.

The solution and problems I found were these:

  • Async using Asyncio: I do not want to rely on a single thread, for some reason it may get stucked.
  • Threads: Is it really reliable on Python to use threads? Do I have the risk of 1 thread make
    other get stucked?
  • Multiprocesses: If a have on process controlling the others, would I loose to much time in interprocess communication?

Maybe a solution that uses all of that.

If there is no really good solution in Python, what should I use instead?

# Using Asyncio
import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = await future1
    response2 = await future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


# Using Threads
from threading import Thread

def do_api(url):
    #...
    #...

#...
#...
for i in range(50):
    t = Thread(target=do_apis, args=(url_api[i],))
    t.start()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Instead of using multithreading or asyncio.executor, you should use aiohttp instead, which is the equivalent of requests but with asynchronous support.

import asyncio
import aiohttp
import time

websites = """https://www.youtube.com
https://www.facebook.com
https://www.baidu.com
https://www.yahoo.com
https://www.amazon.com
https://www.wikipedia.org
http://www.qq.com
https://www.google.co.in
https://www.twitter.com
https://www.live.com
http://www.taobao.com
https://www.bing.com
https://www.instagram.com
http://www.weibo.com
http://www.sina.com.cn
https://www.linkedin.com
http://www.yahoo.co.jp
http://www.msn.com
http://www.uol.com.br
https://www.google.de
http://www.yandex.ru
http://www.hao123.com
https://www.google.co.uk
https://www.reddit.com
https://www.ebay.com
https://www.google.fr
https://www.t.co
http://www.tmall.com
http://www.google.com.br
https://www.360.cn
http://www.sohu.com
https://www.amazon.co.jp
http://www.pinterest.com
https://www.netflix.com
http://www.google.it
https://www.google.ru
https://www.microsoft.com
http://www.google.es
https://www.wordpress.com
http://www.gmw.cn
https://www.tumblr.com
http://www.paypal.com
http://www.blogspot.com
http://www.imgur.com
https://www.stackoverflow.com
https://www.aliexpress.com
https://www.naver.com
http://www.ok.ru
https://www.apple.com
http://www.github.com
http://www.chinadaily.com.cn
http://www.imdb.com
https://www.google.co.kr
http://www.fc2.com
http://www.jd.com
http://www.blogger.com
http://www.163.com
http://www.google.ca
https://www.whatsapp.com
https://www.amazon.in
http://www.office.com
http://www.tianya.cn
http://www.google.co.id
http://www.youku.com
https://www.example.com
http://www.craigslist.org
https://www.amazon.de
http://www.nicovideo.jp
https://www.google.pl
http://www.soso.com
http://www.bilibili.com
http://www.dropbox.com
http://www.xinhuanet.com
http://www.outbrain.com
http://www.pixnet.net
http://www.alibaba.com
http://www.alipay.com
http://www.chrome.com
http://www.booking.com
http://www.googleusercontent.com
http://www.google.com.au
http://www.popads.net
http://www.cntv.cn
http://www.zhihu.com
https://www.amazon.co.uk
http://www.diply.com
http://www.coccoc.com
https://www.cnn.com
http://www.bbc.co.uk
https://www.twitch.tv
https://www.wikia.com
http://www.google.co.th
http://www.go.com
https://www.google.com.ph
http://www.doubleclick.net
http://www.onet.pl
http://www.googleadservices.com
http://www.accuweather.com
http://www.googleweblight.com
http://www.answers.yahoo.com"""


async def get(url, session):
    try:
        async with session.get(url=url) as response:
            resp = await response.read()
            print("Successfully got url {} with resp of length {}.".format(url, len(resp)))
    except Exception as e:
        print("Unable to get url {} due to {}.".format(url, e.__class__))


async def main(urls):
    async with aiohttp.ClientSession() as session:
        ret = await asyncio.gather(*[get(url, session) for url in urls])
    print("Finalized all. Return is a list of len {} outputs.".format(len(ret)))


urls = websites.split("
")
start = time.time()
asyncio.run(main(urls))
end = time.time()

print("Took {} seconds to pull {} websites.".format(end - start, len(urls)))

Outputs:

Successfully got url http://www.msn.com with resp of length 47967.
Successfully got url http://www.google.com.br with resp of length 14823.
Successfully got url https://www.t.co with resp of length 0.
Successfully got url http://www.google.es with resp of length 14798.
Successfully got url https://www.wikipedia.org with resp of length 66691.
Successfully got url http://www.google.it with resp of length 14805.
Successfully got url http://www.googleadservices.com with resp of length 1561.
Successfully got url http://www.cntv.cn with resp of length 3232.
Successfully got url https://www.example.com with resp of length 1256.
Successfully got url https://www.google.co.uk with resp of length 14184.
Successfully got url http://www.accuweather.com with resp of length 269.
Successfully got url http://www.google.ca with resp of length 14172.
Successfully got url https://www.facebook.com with resp of length 192898.
Successfully got url https://www.apple.com with resp of length 75422.
Successfully got url http://www.gmw.cn with resp of length 136136.
Successfully got url https://www.google.ru with resp of length 14803.
Successfully got url https://www.bing.com with resp of length 70314.
Successfully got url http://www.googleusercontent.com with resp of length 1561.
Successfully got url https://www.tumblr.com with resp of length 37500.
Successfully got url http://www.googleweblight.com with resp of length 1619.
Successfully got url https://www.google.co.in with resp of length 14230.
Successfully got url http://www.qq.com with resp of length 101957.
Successfully got url http://www.xinhuanet.com with resp of length 113239.
Successfully got url https://www.twitch.tv with resp of length 105014.
Successfully got url http://www.google.co.id with resp of length 14806.
Successfully got url https://www.linkedin.com with resp of length 90047.
Successfully got url https://www.google.fr with resp of length 14777.
Successfully got url https://www.google.co.kr with resp of length 14797.
Successfully got url http://www.google.co.th with resp of length 14783.
Successfully got url https://www.google.pl with resp of length 14769.
Successfully got url http://www.google.com.au with resp of length 14228.
Successfully got url https://www.whatsapp.com with resp of length 84551.
Successfully got url https://www.google.de with resp of length 14767.
Successfully got url https://www.google.com.ph with resp of length 14196.
Successfully got url https://www.cnn.com with resp of length 1135447.
Successfully got url https://www.wordpress.com with resp of length 216637.
Successfully got url https://www.twitter.com with resp of length 61869.
Successfully got url http://www.alibaba.com with resp of length 282210.
Successfully got url https://www.instagram.com with resp of length 20776.
Successfully got url https://www.live.com with resp of length 36621.
Successfully got url https://www.aliexpress.com with resp of length 37388.
Successfully got url http://www.uol.com.br with resp of length 463614.
Successfully got url https://www.microsoft.com with resp of length 230635.
Successfully got url http://www.pinterest.com with resp of length 87012.
Successfully got url http://www.paypal.com with resp of length 103763.
Successfully got url https://www.wikia.com with resp of length 237977.
Successfully got url http://www.sina.com.cn with resp of length 530525.
Successfully got url https://www.amazon.de with resp of length 341222.
Successfully got url https://www.stackoverflow.com with resp of length 190878.
Successfully got url https://www.ebay.com with resp of length 263256.
Successfully got url http://www.diply.com with resp of length 557848.
Successfully got url http://www.office.com with resp of length 111909.
Successfully got url http://www.imgur.com with resp of length 6223.
Successfully got url https://www.amazon.co.jp with resp of length 417751.
Successfully got url http://www.outbrain.com with resp of length 54481.
Successfully got url https://www.amazon.co.uk with resp of length 362057.
Successfully got url http://www.chrome.com with resp of length 223832.
Successfully got url http://www.popads.net with resp of length 14517.
Successfully got url https://www.youtube.com with resp of length 571028.
Successfully got url http://www.doubleclick.net with resp of length 130244.
Successfully got url https://www.yahoo.com with resp of length 510721.
Successfully got url http://www.tianya.cn with resp of length 7619.
Successfully got url https://www.netflix.com with resp of length 422277.
Successfully got url https://www.naver.com with resp of length 210175.
Successfully got url http://www.blogger.com with resp of length 94478.
Successfully got url http://www.soso.com with resp of length 5816.
Successfully got url http://www.github.com with resp of length 212285.
Successfully got url https://www.amazon.com with resp of length 442097.
Successfully got url http://www.go.com with resp of length 598355.
Successfully got url http://www.chinadaily.com.cn with resp of length 102857.
Successfully got url http://www.sohu.com with resp of length 216027.
Successfully got url https://www.amazon.in with resp of length 417175.
Successfully got url http://www.answers.yahoo.com with resp of length 104628.
Successfully got url http://www.jd.com with resp of length 18217.
Successfully got url http://www.blogspot.com with resp of length 94478.
Successfully got url http://www.fc2.com with resp of length 16997.
Successfully got url https://www.baidu.com with resp of length 301922.
Successfully got url http://www.craigslist.org with resp of length 59438.
Successfully got url http://www.imdb.com with resp of length 675494.
Successfully got url http://www.yahoo.co.jp with resp of length 37036.
Successfully got url http://www.onet.pl with resp of length 854384.
Successfully got url http://www.dropbox.com with resp of length 200591.
Successfully got url http://www.zhihu.com with resp of length 50543.
Successfully got url http://www.yandex.ru with resp of length 174347.
Successfully got url http://www.ok.ru with resp of length 206604.
Successfully got url http://www.163.com with resp of length 588036.
Successfully got url http://www.bbc.co.uk with resp of length 303267.
Successfully got url http://www.nicovideo.jp with resp of length 116124.
Successfully got url http://www.pixnet.net with resp of length 6448.
Successfully got url http://www.bilibili.com with resp of length 96941.
Successfully got url https://www.reddit.com with resp of length 718393.
Successfully got url http://www.booking.com with resp of length 472655.
Successfully got url https://www.360.cn with resp of length 79943.
Successfully got url http://www.taobao.com with resp of length 384755.
Successfully got url http://www.youku.com with resp of length 326873.
Successfully got url http://www.coccoc.com with resp of length 64687.
Successfully got url http://www.tmall.com with resp of length 137527.
Successfully got url http://www.hao123.com with resp of length 331222.
Successfully got url http://www.weibo.com with resp of length 93712.
Successfully got url http://www.alipay.com with resp of length 24057.
Finalized all. Return is a list of len 100 outputs.
Took 3.9256999492645264 seconds to pull 100 websites.

As you can see 100 websites from across the world were successfully reached (with or without https) in about 4 seconds with aiohttp on my internet connection (Miami, Florida). Keep in mind the following can slow down the program by a few ms:

  • print statements (yes, including the ones placed in the code above).
  • Reaching servers further away from your geographical location.

The example above has both instances of the above, and therefore it is arguably the least-optimized way of doing what you have asked. However, I do believe it is a great start for what you are looking for.


Edit: April 6th, 2021

Please note that in the above code we are querying multiple (different) servers, and therefore the use of a single ClientSession might degrade performance:

Session encapsulates a connection pool (connector instance) and supports keepali


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...