node.js - How to handle GET parameters containing non-utf8 characters?

Question

Welcome To Ask or Share your Answers For Others

node.js - How to handle GET parameters containing non-utf8 characters?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

node.js - How to handle GET parameters containing non-utf8 characters?

In a nodejs/express-based application I need to deal with GET requests which may contain umlauts encoded using the iso-8859-1 charset.

Unfortunately its querystring parser seems to handle only plain ASCII and UTF8:

> qs.parse('foo=bar&xyz=foo%20bar')
{ foo: 'bar', xyz: 'foo bar' } # works fine
> qs.parse('foo=bar&xyz=T%FCt%20T%FCt')
{ foo: 'bar', xyz: 'T%FCt%20T%FCt' } # iso-8859-1 breaks, should be "Tüt Tüt"
> qs.parse('foo=bar&xyz=m%C3%B6p')
{ foo: 'bar', xyz: 'm?p' } # utf8 works fine

Is there a hidden option or another clean way to make this work with other charsets, too? The major problem with the default behaviour is that there is no way for me to know if there was a decoding error or not - after all, the input could have been something that simply decoded to something still looking like an urlencoded string.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:27+0000

Well URL encoding should always be in UTF-8, other cases can be treated as encoding attack and just reject the request. There is no such thing as a non-utf8 character. I don't know why your application could get query strings in any encoding but you will be fine with browsers if you just use a charset header on your pages. For API requests or whatever, you can specify UTF-8 and reject invalid UTF-8 as Bad Request.

If you really mean ISO-8859-1, then it's very simple because the bytes match unicode code points exactly.

'T%FCt%20T%FCt'.replace( /%([a-f0-9]{2})/gi, function( f, m1 ) {
    return String.fromCharCode(parseInt(m1, 16));
});

Although it is probably never ISO-8859-1 on the web but Windows-1252 actually.

Categories

node.js - How to handle GET parameters containing non-utf8 characters?

node.js - How to handle GET parameters containing non-utf8 characters?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags