I am trying to create a 'normalized' copy of a string, to help reduce duplicate names in a database. The names contain many international characters (ie. accented letters), and I want to create a copy with the accents removed.
I did come across the method below, but cannot get it to work. I can't seem to find what the Unicode Hacks plugin is.
# Utility method that retursn an ASCIIfied, downcased, and sanitized string.
# It relies on the Unicode Hacks plugin by means of String#chars. We assume
# $KCODE is 'u' in environment.rb. By now we support a wide range of latin
# accented letters, based on the Unicode Character Palette bundled inMacs.
def self.normalize(str)
n = str.chars.downcase.strip.to_s
n.gsub!(/[? ???¢?£?¤?¥???]/u, 'a')
n.gsub!(/?|/u, 'ae')
n.gsub!(/[???]/u, 'd')
n.gsub!(/[?§???????]/u, 'c')
n.gsub!(/[?¨???a????????????]/u, 'e')
n.gsub!(/??/u, 'f')
n.gsub!(/[??????£]/u, 'g')
n.gsub!(/[?¥?§]/, 'h')
n.gsub!(/[?????-???ˉ?????-]/u, 'i')
n.gsub!(/[?ˉ?±?3?μ]/u, 'j')
n.gsub!(/[?·??]/u, 'k')
n.gsub!(/[?????o????]/u, 'l')
n.gsub!(/[?±??????????]/u, 'n')
n.gsub!(/[?2?3?′?μ?????????]/u, 'o')
n.gsub!(/??/u, 'oe')
n.gsub!(/??/u, 'q')
n.gsub!(/[??????]/u, 'r')
n.gsub!(/[???????è?]/u, 's')
n.gsub!(/[?¥?£?§è?]/u, 't')
n.gsub!(/[?1?o???????ˉ?±?-???3]/u,'u')
n.gsub!(/?μ/u, 'w')
n.gsub!(/[?????·]/u, 'y')
n.gsub!(/[?????o]/u, 'z')
n.gsub!(/s+/, ' ')
n.gsub!(/[^sa-z0-9_-]/, '')
n
end
Do I need to 'require' a particular library/gem? Or maybe someone could recommend another way to go about this.
I am not using Rails, nor do I plan on doing so.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…