Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
590 views
in Technique[技术] by (71.8m points)

ruby - Importing CSV quoting error is driving me nuts

I've been having an unbelievable time trying to import a CSV file in ruby-1.9.2.

The file I am trying to parse has:

  • commas within columns
  • quotes within columns
  • uses an '@' as the :col_sep

csv.txt (representative input, real one is 101k lines):

?@?@jié@"seal" radical in Chinese characters, (Kangxi radical 26)

My code:

require 'csv'

CSV.foreach("/Users/adam/Desktop/csvtest.txt", {:col_sep => "@"}) do |row|
    puts row.to_s 
end

My desired output:

["?", "?", "jié", ""seal" radical in Chinese characters, (Kangxi radical 26)"]

What I get for output:

CSV::MalformedCSVError: Unclosed quoted field on line 1.
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1910:in `block in shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `loop'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in `each'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1202:in `block in foreach'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1340:in `open'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1201:in `foreach'
from (irb):31
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `<main>'

It says there are unclosed quoted feilds, but I can see that the quotes open and close.

Escaping the quotes does nothing. I get the same error (...@""seal"" r...). Changing them to single quotes makes it work (...@'seal' r...). The problem is I NEED them to be in double quotes.

Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the problem is that CSV is trying to interpret "seal" as a single quoted column; but, it doesn't appear as @"seal"@ so the parser gets confused because quotes are supposed to surround columns. I don't see any option to tell CSV that the columns aren't quoted but you can kludge around it by setting :quote_char to something that will never occur. If you're using UTF-8 then you can safely use a zero byte as your "quote character that will never occur":

CSV.foreach(filename, :col_sep => "@", :quote_char => "x00") do |row|
    #...
end

This should work as long as none of your columns are quoted.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...