Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.6k views
in Technique[技术] by (71.8m points)

mysql regex utf-8 characters

I am trying to get data from MySQL database via REGEX with or without special utf-8 characters.

Let me explain on example :

If user enters word like sirena it should return rows which include words like sirena,siréna,?íreňá .. and so on.. also it should work backwards when he enters siréná it should return the same results..

I am trying to search it via REGEX, my query looks like this :

SELECT * FROM `content` WHERE `text` REGEXP '[s??][iíí][r????][eééěě][nň?][Aaáá??0]'

It works only when in database is word sirena but not when there is word siréňa..

Is it because something with UTF-8 and MySQL? (collation of mysql column is utf8_general_ci)

Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

MySQL's regular expression library does not support utf-8.

See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.

The only workaround I've seen is to search for specific HEX strings:

mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text     |
+----------+
| siréňa   |
+----------+

Re your comment:

No, I don't know of any solution with MySQL.

You might have to switch to PostgreSQL, because that RDBMS supports u codes for UTF characters in their regular expression syntax.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...