Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
263 views
in Technique[技术] by (71.8m points)

Is it possible to pass a backreference to a function from inside sed?

TL;DR It is not possible, as "the backreference inside command substitution is not and will not be parsed by sed, but by shell, before running sed" (see answer below).

I want to substitute unicode values for International Phonetic Alphabet with the corresponding characters inside a big text file (>50MB).

My test.txt input example:

<CHARSET c="T">02C8;</CHARSET>ku:p<CHARSET c="T">0252;</CHARSET>n] noun<BR>

Expected result:

<CHARSET c="T">?</CHARSET>ku:p<CHARSET c="T">?</CHARSET>n] noun<BR>

I am able to convert a given unicode with this command (e.g.):

echo -e "u02C8"

But I'm failing with the escaping inside my sed command. I got the idea to create a function from here, like this:

codeToChar() { $( echo -e "u$1"); }
sed -r -i 's#(<CHARSET c="T">)(....)#1'"$(codeToChar \2)"'#g' test.txt

But it seems the "2" backreference is not passed to the function:

codeToChar() { $( echo -e "u$1"); }
sed -r -i 's#(<CHARSET c="T">)(....)#1'"$(codeToChar \2)"'#g' test.txt
++ codeToChar '2'
+++ echo -e 'u2'
++ 'u2'
./replace.sh: line 2: u2: command not found
+ sed -r -i 's#(<CHARSET c="T">)(....)#1#g' test.tx
question from:https://stackoverflow.com/questions/65946617/is-it-possible-to-pass-a-backreference-to-a-function-from-inside-sed

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

How to properly escape a backreference in sed to pass it to a function?

The presented code properly handles the backreference. The backreference inside command substitution is not and will not be parsed by sed, but by shell, before running sed. The arguments to a program have to be expanded before running the program.

You may potentially use a GNU extension to sed - the e flag to s command that executes the replacement pattern via /bin/sh interpret. Using this flag is highly discouraged and is very hard to use, as figuring the correct quoting and escaping is very hard - it "works" in very simple cases. Because the input string has ; < > and also " special shell characters I doubt it's possible.

I suggest to pick a full fledged programming language, like python, perl or others, to solve your task. sed is not an utility for dynamically executing actions depending on contents of the file, it's a simple stream replacement utility.

In sed, it is possible to build a static list of strings to replace, like so:

sed -r '
     s/(<CHARSET c="T">)02C8/1'"$(echo -e "u02C8")"/
     s/(<CHARSET c="T">)0252/1'"$(echo -e "u0252")"/
     .... one s/// command for each character to replace ...
'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...