collation - SQLite 3: Character Issue While Ordering By Records

Question

Welcome To Ask or Share your Answers For Others

collation - SQLite 3: Character Issue While Ordering By Records

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

collation - SQLite 3: Character Issue While Ordering By Records

In my SQLite 3 Database, I have some records with Turkish characters such as "?", "ü", "?" etc. When I select my values with SELECT * FROM TABLE ORDER BY COLUMN_NAME query, the records that begin with these characters are coming at the end.

Normally, they should've come after the letter that is dot-less version of each. Like "?" is after "O", "ü" is after "U".

Is it something about regional settings? Is there a way to control these settings?

I use SQLite Manager in Firefox to manage my DB.

Thanks in advance.

P.S. I know it's not a solution for SQLite but for those who need to use SQLite DB in Objective-C, they can sort the data array after getting from SQLite DB. Here's a good solution: How to sort an NSMutableArray with custom objects in it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:00:03+0000

Unfortunately, it seems there's no direct solution for this. For iOS at least. But there are ways to follow.

After I subscribed to mailing list of SQLite, user Named Jean-Christophe Deschamps came with this reply:

"In my SQLite 3 Database, I have some records with Turkish characters such as "?", "ü", "?" etc. When I select my values with 'SELECT * FROM TABLE ORDER BY COLUMN_NAME' query, the records that begin with these characters are coming at the end."

Bare bone SQLite only collates correctly on the lower ASCII charset. While that's fine for plain english, it doesn't work for most of us.

"Normally, they should've come after the letter that is dot-less version of each. Like "?" is after "O", "ü" is after "U". Is it something about regional settings? Is there a way to control these settings?"

You have the choice among some ways to get it right or close to right for your language(s):

o) use ICU either as an extension (for third-party managers) or linked to your application. Advantages: it works 100% correctly for a given language at a time in each operation. Drawbacks: it's huge and slow and it requires you register a collation for every language you deal with. Also it won't work well for columns containing several non-english languages.

o) write your own collation(s) invoking your OS' ICU routines to collate strings. Advantages: doesn't bloat your code with huge libraries. Drawbacks: requires you write this extension (in C or something), same other drawbacks as ICU.

o) If you use Windows, download and use the functions in the extension I wrote for a close-to-correct result. Advantages: it's small, fairly fast and ready to use, it is language- independant yet works decently well for many languages at the same time; it also offers a number of Unicode-aware string manipulation functions (unaccenting or not) functions, a fuzzy search function and much more. Comes as a C source and x86 DLL, free for any purpose. Drawback: it probably doesn't work 100% correctly for any language using more than "vanilla english letters": your dotless i will collate along dotted i, for instance. It's a good compromise between absolute correctness for ONE language and "fair" correctness for most languages (including some asian languages using diacritics) Download: http://dl.dropbox.com/u/26433628/unifuzz.zip

"I use SQLite Manager in Firefox to manage my DB."

My little extension will work with this one. You might also want to try SQLite Expert which has ICU built-in (at least in its Pro version) and much more.

Categories

collation - SQLite 3: Character Issue While Ordering By Records

collation - SQLite 3: Character Issue While Ordering By Records

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags