Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

what characters are allowed in HTTP header values?

After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.

And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).

Here is dried out excerpt of grammar used in standard:

message-header = field-name ":" [ field-value ]
field-name     = token
field-value    = *( field-content | LWS )
field-content  = <the OCTETs making up the field-value and consisting of
                  either *TEXT or combinations of token, separators, and
                  quoted-string>

CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>
CRLF           = CR LF
LWS            = [CRLF] 1*( SP | HT )
OCTET          = <any 8-bit sequence of data>
CHAR           = <any US-ASCII character (octets 0 - 127)>
CTL            = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
TEXT           = <any OCTET except CTLs, but including LWS>

token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | ""
               | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext         = <any TEXT except <">>
quoted-pair    = "" CHAR

As you can see field-content can be a quoted-string, which is an enquoted sequence of TEXT (i.e. any 8-bit octet with exception of " and values from [0-8, 11-12, 14-31, 127] range) or quoted-pair ( followed by any value from [0, 127] range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with ).

(Note that standard doesn't treat NUL(0x00) char in any special way)

But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.

So... which characters are allowed in HTTP header values and why?

P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

RFC 2616 is obsolete, the relevant part has been replaced by RFC 7230.

The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)

In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...