Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
487 views
in Technique[技术] by (71.8m points)

sed recipe: how to do stuff between two patterns that can be either on one line or on two lines?

Let's say we want to do some substitutions only between some patterns, let them be <a> and </a> for clarity... (all right, all right, they're start and end!.. Jeez!)

So I know what to do if start and end always occur on the same line: just design a proper regex.

I also know what to do if they're guaranteed to be on different lines and I don't care about anything in the line containing end and I'm also OK with applying all the commands in the line containing start before start: just specify the address range as /start/,/end/.

This, however, doesn't sound very useful. What if I need to do a smarter job, for instance, introduce changes inside a {...} block?

One thing I can think of is breaking the input on { and } before processing and putting it back together afterwards:

sed 's/{|}/
/g' input | sed 'main stuff' | sed ':a $!{N;ba}; s/
(}|{)
/1/g'

Another option is the opposite:

cat input | tr '
' '#' | sed 'whatever; s/#/
/g'

Both of these are ugly, mainly because the operations are not confined within a single command. The second one is even worse because one has to use some character or substring as a "newline holder" assuming it isn't present in the original text.

So the question is: are there better ways or can the above-mentioned ones be optimized? This is quite a regular task from what I read in recent SO questions, so I'd like to choose the best practice once and for all.

P.S. I'm mostly interested in pure sed solutions: can the job be do with one invocation of sed and nothing else? Please no awk, Perl, etc.: this is more of a theoretical question, not a "need the job done asap" one.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This might work for you:

# create multiline test data
cat <<! >/tmp/a
> this
> this { this needs
> changing to
> that } that
> that
> !
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this|that/U&/g;x;G;s/{[^}]*}([^
]*)
(.*)/{2}1/' /tmp/a
this
this { THIS needs
changing to
THAT } that
that
# convert multiline test data to a single line
tr '
' ' ' </tmp/a >/tmp/b
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this|that/U&/g;x;G;s/{[^}]*}([^
]*)
(.*)/{2}1/' /tmp/b
this this { THIS needs changing to THAT } that that

Explanation:

  • Read the data into the pattern space (PS). /{/!b;:a;/}/!{$q;N;ba}
  • Copy the data into the hold space (HS). h
  • Strip non-data from front and back of string. s/[^{]*{//;s/}.*//
  • Convert data e.g. s/this|that/U&/g
  • Swap to HS and append converted data. x;G
  • Replace old data with converted data.s/{[^}]*}([^ ]*) (.*)/{2}1/

EDIT:

A more complicated answer which I think caters for more than one block per line.

# slurp file into pattern space (PS)
:a
$! {
N
ba
}
# check for presence of v if so quit with exit value 1
/v/q1
# replace original newlines with v's
y/
/v/
# append a newline to PS as a delimiter
G
# copy PS to hold space (HS)
h
# starting from right to left delete everything but blocks
:b
s/(.*)({.*}).*
/1
2/
tb
# delete any non-block details form the start of the file
s/.*
//
# PS contains only block details
# do any block processing here e.g. uppercase this and that
s/th(is|at)/U&/g
# append ps to hs
H
# swap to HS
x
# replace each original block with its processed one from right to left
:c
s/(.*){.*}(.*)

(.*)({.*})/1

423/
tc
# delete newlines
s/
//g
# restore original newlines
y/v/
/
# done!

N.B. This uses GNU specific options but could be tweaked to work with generic sed's.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...