Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
281 views
in Technique[技术] by (71.8m points)

regex - How to concatenate multiples lines under first occurence of the header using Unix commands?

I have a file like this:

Query=scaffold1_size75580
lcl|Os10t0535800-01
Query=scaffold1_size75580
lcl|Os10t0536000-02
Query=scaffold1_size75580
lcl|Os10t0536100-01
Query=scaffold1_size75580
lcl|Os10t0536400-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536900-00
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold2_size74975
lcl|Os11t0637501-00
Query=scaffold2_size74975
lcl|Os11t0637600-00
Query=scaffold2_size74975
lcl|Os11t0637800-01
Query=scaffold2_size74975
lcl|Os11t0637800-01
Query=scaffold2_size74975
lcl|Os11t0638200-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638900-01
Query=scaffold2_size74975
lcl|Os11t0638900-01
Query=scaffold3_size69500
lcl|Os06t0725100-01
Query=scaffold3_size69500
lcl|Os06t0724900-01
Query=scaffold3_size69500
lcl|Os06t0724900-01
Query=scaffold3_size69500
lcl|Os06t0724700-01
Query=scaffold3_size69500
lcl|Os06t0724700-01
Query=scaffold3_size69500
lcl|Os06t0724600-01
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold4_size68019
lcl|Os01t0627550-00
Query=scaffold4_size68019
lcl|Os01t0626900-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626032-01
Query=scaffold5_size66739
lcl|Os04t0653200-01
Query=scaffold5_size66739
lcl|Os04t0653400-01
Query=scaffold5_size66739
lcl|Os04t0653400-01
Query=scaffold5_size66739
lcl|Os04t0653600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold6_size65486
lcl|Os01t0259900-00
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259200-01
Query=scaffold7_size64123
lcl|Os04t0162100-01
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0324600-01
Query=scaffold7_size64123
lcl|Os05t0324600-01

and so on till scaffolds in some 66000. I want my file to have duplicate headers be removed and all the corresponding entries to come within a single header, i.e., I want like this:

Query=scaffold1_75580
lcl|Os10t0535800-01
lcl|Os10t0536000-02
lcl|Os10t0536100-01
lcl|Os10t0536400-01
lcl|Os10t0536700-01
lcl|Os10t0536700-01
lcl|Os10t0536900-00
lcl|Os10t0536700-01
lcl|Os10t0536700-01
Query=scaffold2_size74975
lcl|Os11t0637501-00
lcl|Os11t0637600-00
lcl|Os11t0637800-01
lcl|Os11t0637800-01
lcl|Os11t0638200-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638900-01
lcl|Os11t0638900-01
Query=scaffold3_size69500
lcl|Os06t0725100-01
lcl|Os06t0724900-01
lcl|Os06t0724900-01
lcl|Os06t0724700-01
lcl|Os06t0724700-01
lcl|Os06t0724600-01
lcl|Os06t0724100-02
lcl|Os06t0724100-02
lcl|Os06t0724100-02
lcl|Os06t0724100-02

and so on. How to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you don't mind pushing this through multiple passes, I might suggest this:

Search:

^(Query=.*
)((?:(?!Query=).*
)+)1

Replace:

12

Live demo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...