Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
736 views
in Technique[技术] by (71.8m points)

regex - How do I remove duplicate characters and keep the unique one only in Perl?

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

Expected output is:

EFUAH
UEH
UJHACDEF

I came across perl -pe's/$1//g while/(.).*/' which is wonderful but it is removing even the single occurrence of the character in output.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This can be done using positive lookahead :

perl -pe 's/(.)(?=.*?1)//g' FILE_NAME

The regex used is: (.)(?=.*?1)

  • . : to match any char.
  • first () : remember the matched single char.
  • (?=...) : +ve lookahead
  • .*? : to match anything in between
  • 1 : the remembered match.
  • (.)(?=.*?1) : match and remember any char only if it appears again later in the string.
  • s/// : Perl way of doing the substitution.
  • g: to do the substitution globally...that is don't stop after first substitution.
  • s/(.)(?=.*?1)//g : this will delete a char from the input string only if that char appears again later in the string.

This will not maintain the order of the char in the input because for every unique char in the input string, we retain its last occurrence and not the first.

To keep the relative order intact we can do what KennyTM tells in one of the comments:

  • reverse the input line
  • do the substitution as before
  • reverse the result before printing

The Perl one line for this is:

perl -ne '$_=reverse;s/(.)(?=.*?1)//g;print scalar reverse;' FILE_NAME

Since we are doing print manually after reversal, we don't use the -p flag but use the -n flag.

I'm not sure if this is the best one-liner to do this. I welcome others to edit this answer if they have a better alternative.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...