Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

amazon web services - Loading JSON data to AWS Redshift results in NULL values

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table has the correct number of rows/records, but every record is NULL !

It takes the expected amount of time for the load, the COPY command returns OK, the Redshift console reports successful and no errors... but if I perform a simple query from the table, it returns only NULL values.

The JSON is very simple + flat, and formatted correctly (according to examples I found here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)

Basically, it is one row per line, formatted like:

{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }

I have tried things like rewriting the schema based on values and data types found in the JSON objects and also copying from uncompressed files. I thought perhaps the JSON was not being parsed correctly upon load, but it should presumably raise an error if the objects cannot be parsed.

My COPY command looks like this:

copy events from 's3://mybucket/json/prefix' 
with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
json 'auto' gzip;

Any guidance would be appreciated! Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

So I have discovered the cause - This would not have been evident from the description I provided in my original post.

When you create a table in Redshift, the column names are converted to lowercase. When you perform a COPY operation, the column names are case sensitive.

The input data that I have been trying to load is using camelCase for column names, and so when I perform the COPY, the columns do not match up with the defined schema (which now uses all lowercase column names)

The operation does not raise an error, though. It just leaves NULLs in all the columns that did not match (in this case, all of them)

Hope this helps somebody to avoid the same confusion!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...