Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
805 views
in Technique[技术] by (71.8m points)

bash - How to remove all of the diacritics from a file?

I have a file containing many vowels with diacritics. I need to make these replacements:

  • Replace ā, á, ǎ, and à with a.
  • Replace ē, é, ě, and è with e.
  • Replace ī, í, ǐ, and ì with i.
  • Replace ō, ó, ǒ, and ò with o.
  • Replace ū, ú, ǔ, and ù with u.
  • Replace ǖ, ǘ, ǚ, and ǜ with ü.
  • Replace ā, á, ǎ, and à with A.
  • Replace ē, é, ě, and è with E.
  • Replace ī, í, ǐ, and ì with I.
  • Replace ō, ó, ǒ, and ò with O.
  • Replace ū, ú, ǔ, and ù with U.
  • Replace ǖ, ǘ, ǚ, and ǜ with ü.

I know I can replace them one at a time with this:

sed -i 's/ā/a/g' ./file.txt

Is there a more efficient way to replace all of these?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you check the man page of the tool iconv:

//TRANSLIT
When the string "//TRANSLIT" is appended to --to-code, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similarly looking characters.

so we could do :

kent$  cat test1
    Replace ā, á, ǎ, and à with a.
    Replace ē, é, ě, and è with e.
    Replace ī, í, ǐ, and ì with i.
    Replace ō, ó, ǒ, and ò with o.
    Replace ū, ú, ǔ, and ù with u.
    Replace ǖ, ǘ, ǚ, and ǜ with ü.
    Replace ā, á, ǎ, and à with A.
    Replace ē, é, ě, and è with E.
    Replace ī, í, ǐ, and ì with I.
    Replace ō, ó, ǒ, and ò with O.
    Replace ū, ú, ǔ, and ù with U.
    Replace ǖ, ǘ, ǚ, and ǜ with U.


kent$  iconv -f utf8 -t ascii//TRANSLIT test1
    Replace a, a, a, and a with a.
    Replace e, e, e, and e with e.
    Replace i, i, i, and i with i.
    Replace o, o, o, and o with o.
    Replace u, u, u, and u with u.
    Replace u, u, u, and u with u.
    Replace A, A, A, and A with A.
    Replace E, E, E, and E with E.
    Replace I, I, I, and I with I.
    Replace O, O, O, and O with O.
    Replace U, U, U, and U with U.
    Replace U, U, U, and U with U.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...