So I want my Spark App to read some text from Amazon's S3. I Wrote the following simple script:
import boto3
s3_client = boto3.client('s3')
text_keys = ["key1.txt", "key2.txt"]
data = sc.parallelize(text_keys).flatMap(lambda key: s3_client.get_object(Bucket="my_bucket", Key=key)['Body'].read().decode('utf-8'))
When I do data.collect
I get the following error:
TypeError: can't pickle thread.lock objects
and I don't seem to find any help online. Have perhaps someone managed to solve the above?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…