Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

scrapy Pipeline TypeError: can only concatenate str (not "dict") to str

I have a scrapy spider that i want to connect with pipeline my scrapy items are

def parse(self, response):
    x = response.xpath("//script[starts-with(.,'window._sharedData')]/text()").extract_first()
    json_string = x.strip().split('= ')[1][:-1]
    data = json.loads(json_string)
    edges = data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges']
    for i in edges:
        url = 'https://www.instagram.com/p/' + i['node']['shortcode']
        video = i['node']['is_video']
        date_posted_timestamp = i['node']['taken_at_timestamp']
        date_posted_human = datetime.fromtimestamp(date_posted_timestamp).strftime("%d/%m/%Y %H:%M:%S")
        like_count = i['node']['edge_media_preview_like']['count'] if "edge_media_preview_like" in i['node'].keys() else ''
        comment_count = i['node']['edge_media_to_comment']['count'] if 'owner' in i['node'].keys() else ''
        handle = i['node']['owner']['id'] if 'owner' in i['node'].keys() else ''
        usernameid = i['node']['owner']['username']
        captions = ""
        if i['node']['edge_media_to_caption']:
            for i2 in i['node']['edge_media_to_caption']['edges']:
                captions += i2['node']['text'] + "
"
        if video:
            image_url = i['node']['display_url']
        else:
            image_url = i['node']['thumbnail_resources'][-1]['src']
        item = {'handleid': handle,'usernameid': usernameid,'postURL': url, 'isVideo': video, 'date_posted': date_posted_human,
                'timestamp': date_posted_timestamp, 'likeCount': like_count, 'commentCount': comment_count, 'image_url': image_url,
                'captions': captions[:-1]}
        if video:
            yield scrapy.Request(get_url(url), callback=self.get_video, meta={'item': item})
        else:
            item['videoURL'] = ''
            yield item

Content of pipelines.py:

class InstascraperPipeline:
def process_item(self, item, spider):
    print("pipeline :" + item, ['handleid'][0])
    return item

it gives me this error and i dont know where to go from after this

2021-01-11 17:48:45 [scrapy.core.scraper] ERROR: Error processing {'handleid': '40501747559', 'usernameid': 'omnesinfluencers', 'postURL': 'https://www.instagram.com/p/CIk88MzouYm', 'isVi
deo': False, 'date_posted': '09/12/2020 16:41:47', 'timestamp': 1607517707, 'likeCount': 732, 'commentCount': 2, 'image_url': 'https://instagram.fbsb8-1.fna.fbcdn.net/v/t51.2885-15/sh0.08
/e35/s640x640/130284179_224818179237248_5049337129452224360_n.jpg?_nc_ht=instagram.fbsb8-1.fna.fbcdn.net&_nc_cat=108&_nc_ohc=G5LBpPMpPvsAX_C8YfB&tp=1&oh=d42c038193b615d50e11d75edc367217&o
e=6025DA66', 'captions': 'OMNES Influencers’ platform is influencer approved! 
We take care of our beloved OMNESians. 
Sign up now and enjoy being one!', 'videoURL': ''}
Traceback (most recent call last):
  File "c:userswannapycharmprojectspythonprojectvenvlibsite-packageswistedinternetdefer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "c:userswannapycharmprojectspythonprojectvenvlibsite-packagesscrapyutilsdefer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "C:UserswannaPycharmProjectspythonProjectinstascraperinstascraperpipelines.py", line 11, in process_item
    print("pipeline :" + item, ['handleid'][0])
TypeError: can only concatenate str (not "dict") to str
2021-01-11 17:48:45 [scrapy.core.engine] INFO: Closing spider (finished)

can anyone help me what i am doing wrong ?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In this line:

print("pipeline :" + item, ['handleid'][0])

You first have a string ("pipeline :") and thenn a dict, you need to convert the dict to a string so that you can append it.

print("pipeline :" + str(item['handleid'][0]))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...