Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

How to remove duplicate words from a string in a Bash script?

I have a string containing duplicate words, for example:

abc, def, abc, def

How can I remove the duplicates? The string that I need is:

abc, def
question from:https://stackoverflow.com/questions/65857452/remove-duplicates-words-in-same-line

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We have this test file:

$ cat file
abc, def, abc, def

To remove duplicate words:

$ sed -r ':a; s/([[:alnum:]]+)(.*)1/12/g; ta; s/(, )+/, /g; s/, *$//' file
abc, def

How it works

  • :a

    This defines a label a.

  • s/([[:alnum:]]+)(.*)1/12/g

    This looks for a duplicated word consisting of alphanumeric characters and removes the second occurrence.

  • ta

    If the last substitution command resulted in a change, this jumps back to label a to try again.

    In this way, the code keeps looking for duplicates until none remain.

  • s/(, )+/, /g; s/, *$//

    These two substitution commands clean up any left over comma-space combinations.

Mac OSX or other BSD System

For Mac OSX or other BSD system, try:

sed -E -e ':a' -e 's/([[:alnum:]]+)(.*)1/12/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file

Using a string instead of a file

sed easily handles input either from a file, as shown above, or from a shell string as shown below:

$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/([[:alnum:]]+)(.*)1/12/g; ta; s/(, )+/, /g; s/, *$//'
ab, cd, ef

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...