There is a very similar question already. One of the solutions uses code like this one:
string.mb_chars.normalize(:kd).gsub(/[^x00-x7F]/n, '').to_s
Which works wonders, until you notice it also removes spaces, dots, dashes, and who knows what else.
I'm not really sure how the first code works, but could it be made to strip only accents? Or at the very least be given a list of chars to preserve? My knowledge of regexps is small, but I tried (to no avail):
/[^-x00-x7F]/n # So it would leave the dash alone
I'm about to do something like this:
string.mb_chars.normalize(:kd).gsub('-', '__DASH__').gsub
(/[^x00-x7F]/n, '').gsub('__DASH__', '-').to_s
Atrocious? Yes...
I've also tried:
iconv = Iconv.new('UTF-8', 'US-ASCII//TRANSLIT') # Also tried ISO-8859-1
iconv.iconv 'Café' # Throws an error: Iconv::IllegalSequence: "é"
Help please?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…