There is question by the almost the same name already:
What is the best regular expression to check if a string is a valid URL
I don't understand this stackoverflow. It seems like I need reputation to comment an answer. As I don't have it, I don't know how to tell/ask that the proposed solution doesn't seem to work. So I'm forced to make a new question and ask for the solution this way?
UPDATE: So it seems that that Reg Exp supports IPV6 and I was to blame as the IPv6 is
supposed to go like http://[2620:0:1cfe:face:b00c::3]/.
So only real problem I know with that now is, that it accepts example.org: as valid URL.
Or is PHP to blame?
/**
* Validate URL - RFC 3987 (IRI)
*
* https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url
*
* @param string $str_url
* @return boolean
*/
function is_url($str_url)
{
// RFC 3987 For absolute IRIs (internationalized):
return (bool) preg_match('/^[a-z](?:[-a-z0-9+.])*:(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}|x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?$/iu',$str_url);
}
Here is the test for it:
$urls=array('http://www.example.org/','http://www.example.org:80/','example.org','ftp://user:[email protected]/','http://example.org/?cat=5&test=joo','http://www.fi/?cat=5&test=joo','http://[::1]/','http://[2620:0:1cfe:face:b00c::3]/','http://[2620:0:1cfe:face:b00c::3]:80/','');
foreach ($urls as $a)
{
echo $a."
";
$a=is_url($a);
var_dump($a);
}
And that outputs:
"http://www.example.org/" bool(true)
"http://www.example.org:80/" bool(true)
"example.org" bool(false)
"ftp://user:[email protected]/" bool(true)
"http://example.org/?cat=5&test=joo" bool(true)
"http://www.fi/?cat=5&test=joo" bool(true)
"http://[::1]/" bool(true)
"http://[2620:0:1cfe:face:b00c::3]/" bool(true)
"http://[2620:0:1cfe:face:b00c::3]:80/" bool(true)
"" bool(false)
So what is the RFC compilicant and working regexp?
See Question&Answers more detail:
os