unix - split file on Nth occurrence of delimiter

Question

Welcome To Ask or Share your Answers For Others

unix - split file on Nth occurrence of delimiter

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

unix - split file on Nth occurrence of delimiter

Is there a one-liner to split a text file into pieces / chunks after every Nth occurrence of a delimiter?

example: the delimiter below is "+"

entry 1
some more
+
entry 2
some more
even more
+
entry 3
some more
+
entry 4
some more
+
...

There are several million entries, so splitting on every occurrence of delimiter "+" is a bad idea. I want to split on, say, every 50,000th instance of delimiter "+".

Unix commands "split" and "csplit" just don't seem to do this...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:11:11+0000

Using awk you could:

awk '/^+$/ { delim++ } { file = sprintf("chunk%s.txt", int(delim / 50000)); print >> file; }' < input.txt

Update:

To not include the delimiter, try this:

awk '/^+$/ { if(++delim % 50000 == 0) { next } } { file = sprintf("chunk%s.txt", int(delim / 50000)); print > file; }' < input.txt

The next keyword causes awk to halt processing rules for this record and and advance to the next (line). I also changed the >> to > since if you run it more than once you probably don't want to append the old chunk files.

Categories

unix - split file on Nth occurrence of delimiter

unix - split file on Nth occurrence of delimiter

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags