Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
281 views
in Technique[技术] by (71.8m points)

linux - SED replacing with 'possible' newline

I have a sed command that is working fine, except when it comes across a newline right in the file somewhere. Here is my command:

sed -i 's,<a href="(.*)">(.*)</a>,2 - 1,g'

Now, it works perfectly, but I just ran across this file that has the a tag like so:

<a href="link">Click
        here now</a>

Of course it didn't find this one. So I need to modify it somehow to allow for lines breaks in the search. But I have no clue how to make it allow for that unless I go over the entire file first off and remove all before hand. Problem there is I loose all formatting in the file.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do this by inserting a loop into your sed script:

sed -e '/<a href/{;:next;/</a>/!{N;b next;};s,<a href="(.*)">(.*)</a>,2 - 1,g;}' yourfile

As-is, that will leave an embedded newline in the output, and it wasn't clear if you wanted it that way or not. If not, just substitute out the newline:

sed -e '/<a href/{;:next;/</a>/!{N;b next;};s/
//g;s,<a href="(.*)">(.*)</a>,2 - 1,g;}' yourfile

And maybe clean up extra spaces:

sed -e '/<a href/{;:next;/</a>/!{N;b next;};s/
//g;s/s{2,}/ /g;s,<a href="(.*)">(.*)</a>,2 - 1,g;}' yourfile

Explanation: The /<a href/{...} lets us ignore lines we don't care about. Once we find one we like, we check to see if it has the end marker. If not (/<a>/!) we grab the next line and a newline (N) and branch (b) back to :next to see if we've found it yet. Once we find it we continue on with the substitutions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...