Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
252 views
in Technique[技术] by (71.8m points)

php - Number in the top-level domain?

Can top-level domains contain a number at the end? Idk nothing about DNS rules etc but when I try to use PHP's filter_var() function with FILTER_VALIDATE_EMAIL for [email protected] it returns true.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Does top-level domain can contain a number at the end?

Yes technically, except if it is purely numerical, then it can not be a TLD, under current rules and for easy reasons to understand (to disambiguate with IP addresses). And it can not contain a number at the end, except if it is an IDN TLD, for reasons enforced by ICANN.

Let us go back to some RFCs to have some clearer definitions of things:

RFC 952: DOD INTERNET HOST TABLE SPECIFICATION (October 1985)

This is the definition of an Internet "hostname" back then:

A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.). Note that periods are only allowed when
they serve to delimit components of "domain style names". (See
RFC-921, "Domain Name System Implementation Schedule", for
background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period.

Note that this also has the following:

Single character names or nicknames are not allowed.

Hence at that point:

  • com1 is a valid TLD
  • 3com is not ("The first character must be an alpha character.")
  • 42 is not (same reason)
  • 1 is not (same reason)
  • a is not ("Single character names or nicknames are not allowed.")

RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES (November 1987)

This is one of the RFC that created the DNS as we know today. For compatibility reasons it defined hostnames as a sequence of labels, where a label is defined as such:

They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

The TLD is one label among others (the L in TLD). Per the above rule, com1 is a valid label, and hence a valid TLD, where 3com would not have been. Which directly brings us to the following amendment.

RFC 1123: Requirements for Internet Hosts -- Application and Support (October 1989)

This amends the previous RFC by changing one rule:

The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.

So at that point:

  • com1 is a valid TLD
  • 3com is also valid
  • 42 is valid
  • 1 is valid
  • a is valid

For the case of "numerical" TLDs, the following rule in first document applies:

Whenever a user inputs the identity of an Internet host, it SHOULD be possible to enter either (1) a host domain name or (2) an IP address in dotted-decimal ("#.#.#.#") form. The host SHOULD check the string syntactically for a dotted-decimal number before looking it up in the Domain Name System.

and

If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.

RFC 1738: Uniform Resource Locators (URL) (December 1994)

This also speaks about the TLD, but giving:

The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

RFC 3696: Application Techniques for Checking and Transformation of Names (February 2004)

This was needed to introduce IDNs (Internationalized Domain Names) and it has this to say:

Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form has been the only one permitted in the names of top-level domains, or TLDs. In general, it is also the only form permitted in most second-level names registered in TLDs, although some names that are normally not seen by users obey other rules. It derives from the original ARPANET rules for the naming of hosts (i.e., the "hostname" rule) and is perhaps better described as the "LDH rule", after the characters that it permits. The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. No other symbols or punctuation characters are permitted, nor is blank space. If the hyphen is used, it is not permitted to appear at either the beginning or end of a label. There is an additional rule that essentially requires that top-level domain names not be all- numeric.

In fact as soon as IDNs are involved, and they are IDN TLDs (both ccTLDs and gTLDs now), the encoding chosen generates an ASCII string of the form xn--something where the something can have digits, including at the end, like shown in other answers.

However it is not really clear from where the "additional rule" in the last sentence comes from.

RFC 4697: Observed DNS Resolution Misbehavior (October 2006)

Not defining anything, but providing some interesting facts:

The root name servers receive a significant number of A record queries where the QNAME looks like an IPv4 address.

and

A possible solution is to delegate these numeric TLDs from the root zone to a separate set of servers to absorb the traffic.

Which clearly shows that indeed, in the wild, there are applications, maybe by mistake but it shows at least that it works technically, sending queries for names that are indeed formatted like IPv4 addresses, so with a fully numerical "TLD".

There was in fact an experience to launch a .42 registry, obviously completely outside of ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http://register.42registry.org:80/ (in French).

It did not went far, even if it technically works.

It showed for example that Microsoft based OS by default did not consider purely numeric TLDs at all, but they provided a patch for that: https://support.microsoft.com/en-us/help/947228/error-message-when-you-try-to-join-a-windows-vista-based-client-comput "When you try to join a Windows Vista-based client computer to a top level domain (TLD) that has a purely numeric suffix, the Windows Vista-based client computer cannot join the domain. [..] This behavior is by design."

Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

This finally gives some explanations on why purely numeric TLD or even TLD with one digit are sometimes considered invalid when it is not a clear consequence from above specifications:

(section 2.1 below refers to content in RFC 1123, quoted above)

In addition, the DISCUSSION section of Section 2.1 says:

 'However, a valid host name can never have the dotted-decimal form
 #.#.#.#, since at least the highest-level component label will be
 alphabetic.'  [Section 2.1]

Some implementers may have understood the above phrase 'will be alphabetic' to be a protocol restriction.

But it basically just recommend to go with the flow and continue the same restrictions:

Neither [RFC0952] nor [RFC1123] explicitly states the reasons for these restrictions. It might be supposed that human factors were a consideration; [RFC1123] appears to suggest that one of the reasons was to prevent confusion between dotted-decimal IPv4 addresses and host domain names. In any case, it is reasonable to believe that the restrictions have been assumed in some deployed software, and that changes to the rules should be undertaken with caution.

Hence it offered this definition:

traditional-tld-label = 1*63(ALPHA)

This draft never converted to an RFC because not everyone agreed with it. You can find a thread with dissenting voices for it at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; basically it was not clear if there was a restriction in the past that we are now trying to relax a little or if there never was a restriction to begin with and that people implemented systems wrongly.

For example you can see about this Chromium/Chrome bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=31405 Browsing failed if using a TLD starting with a digit or purely numeric (it worked if it ended with a digit with letters before). This was not considered as a bug, and is not fixed, because the browser ships with a list of TLDs so it can know which ones are valid which are not, besides testing their syntax.

ICANN Application Guidebook for new TLDs (June 2012)

Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf it says the following starting at page 64:

The ASCII label (i.e., the label as transmitted on the wire) must be valid as specified in technical standards Domain Names: Implementation and Specification (RFC 1035), and Clar


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...