Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
741 views
in Technique[技术] by (71.8m points)

python - 'list' object has no attribute 'timeout'

I am trying to download Pdfs using urllib.request.urlopen from a page but it returns an error: 'list' object has no attribute 'timeout':

def get_hansard_data(page_url):
    #Read base_url into Beautiful soup Object
    html = urllib.request.urlopen(page_url).read()
    soup = BeautifulSoup(html, "html.parser")
    #grab <div class="itemContainer"> that hold links and dates to all hansard pdfs
    hansard_menu = soup.find_all("div","itemContainer")

    #Get all hansards
    #write to a tsv file
    with open("hansards.tsv","a") as f:
        fieldnames = ("date","hansard_url")
        output = csv.writer(f, delimiter="")

        for div in hansard_menu:
            hansard_link = [HANSARD_URL + div.a["href"]]
            hansard_date = div.find("h3", "catItemTitle").string

            #download
            
            with urllib.request.urlopen(hansard_link) as response:
                data = response.read()
                r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")
                r.write(data)
                r.close()

            print(hansard_date)
            print(hansard_link)
            output.writerow([hansard_date,hansard_link])
        print ("Done Writing File")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A bit late, but might still be helpful to someone else (if not for topic starter). I found the solution by solving the same problem.

The problem was that page_url (in your case) was a list, rather than a string. The reason for that is mos likely that page_url comes from argparse.parse_args() (at least it was so in my case). Doing page_url[0] should work but it is not nice to do that inside the def get_hansard_data(page_url) function. Better would be to check the type of the parameter and return an appropriate error to the function caller, if the type does not match.

The type of an argument could be checked by calling type(page_url) and comparing the result like for example: typen("") == type(page_url). I am sure there might be more elegant way to do that, but it is out of the scope of this particular question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...