Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
320 views
in Technique[技术] by (71.8m points)

linux - scan and merge two lines in huge file (>5 Gb)

I have a huge file (>5 Gb), with a bunch of bugged lines. To fix it I need to:

  1. Find the 'split' lines
  2. Merge the 'split' lines of code together into the intended 'single' line of code
  3. Save the corrected file

Original file: (Notice the 'split' code in lines #113 and #114)

...
#109=CARTESIAN_POINT('',(1.705232012855E0,-7.756877070089E-1,2.48166921056E0));
#110=CARTESIAN_POINT('',(1.705861274751E0,-7.7602308423645E-1,2.480686063358E0));
#111=CARTESIAN_POINT('',(1.705767565089E0,-7.764706427305E-1,2.472310353831E0));
#112=CARTESIAN_POINT('',(1.70570123242E0,-7.767839147852E-1,2.478226532593E0));
#113=CARTESIAN_POINT('',(1.7015612304515E0,-7.96452125292859E-1,
2.416457902634E0));
#114=CARTESIAN_POINT('',(1.701554931826E0,-7.9649012320387E-1,
2.4163429213930E0));
#115=CARTESIAN_POINT('',(1.705923512855E0,-7.756877070089E-1,2.481645657056E0));
#116=CARTESIAN_POINT('',(1.7058612374751E0,-7.7600123423645E-1,2.48068604563358E0));
...

Expected result:

...    
#109=CARTESIAN_POINT('',(1.705232012855E0,-7.756877070089E-1,2.48166921056E0));
#110=CARTESIAN_POINT('',(1.705861274751E0,-7.7602308423645E-1,2.480686063358E0));
#111=CARTESIAN_POINT('',(1.705767565089E0,-7.764706427305E-1,2.472310353831E0));
#112=CARTESIAN_POINT('',(1.70570123242E0,-7.767839147852E-1,2.478226532593E0));
#113=CARTESIAN_POINT('',(1.7015612304515E0,-7.96452125292859E-1,2.416457902634E0));
#114=CARTESIAN_POINT('',(1.701554931826E0,-7.9649012320387E-1,2.4163429213930E0));
#115=CARTESIAN_POINT('',(1.705923512855E0,-7.756877070089E-1,2.481645657056E0));
#116=CARTESIAN_POINT('',(1.7058612374751E0,-7.7600123423645E-1,2.48068604563358E0));
...

I think it is possible by using some combination of cut/paste/sed commands in Unix, Linux, Terminal, but I don't know how to.

Thanks in advance!

question from:https://stackoverflow.com/questions/65641776/scan-and-merge-two-lines-in-huge-file-5-gb

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With GNU sed, you can use N to add next line to the pattern space, check if newline character is not followed by # and merge if so:

sed -E 'N;s/
([^#])/1/;P;D;' file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...