There are a lot of redundancies in this regular expression of yours (and also, the leaning toothpick syndrome). This, though, should produce results:
$rx = '~
^(?:https?://)? # Optional protocol
(?:www[.])? # Optional sub-domain
(?:youtube[.]com/watch[?]v=|youtu[.]be/) # Mandatory domain name (w/ query string in .com)
([^&]{11}) # Video id of 11 characters as capture group 1
~x';
$has_match = preg_match($rx, $url, $matches);
// if matching succeeded, $matches[1] would contain the video ID
Some notes:
- use the tilde character
~
as delimiter, to avoid LTS
- use
[.]
instead of .
to improve visual legibility and avoid LTS. ("Special" characters - such as the dot .
- have no effect in character classes (within square brackets))
- to make regular expressions more "readable" you can use the
x
modifier (which has further implications; see the docs on Pattern modifiers), which also allows for comments in regular expressions
- capturing can be suppressed using non-capturing groups:
(?: <pattern> )
. This makes the expression more efficient.
Optionally, to extract values from a (more or less complete) URL, you might want to make use of parse_url()
:
$url = 'http://youtube.com/watch?v=VIDEOID';
$parts = parse_url($url);
print_r($parts);
Output:
Array
(
[scheme] => http
[host] => youtube.com
[path] => /watch
[query] => v=VIDEOID
)
Validating the domain name and extracting the video ID is left as an exercise to the reader.
I gave in to the comment war below; thanks to Toni Oriol, the regular expression now works on short (youtu.be) URLs as well.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…