Because HTML's syntactic rules are so loose, it's pretty difficult to do with any reliability (unless, say, you know for absolute certain that all your tags will use double quotes around their attribute values). Here's some fairly general regex-based code for the purpose:
function extract_urls($html) {
$html = preg_replace('<!--.*?-->', '', $html);
preg_match_all('/<as+[^>]*href="([^"]+)"[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/.wmv/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
preg_match_all('/<as+[^>]*href='([^']+)'[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/.wmv/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
preg_match_all('/<as+[^>]*href=([^"'][^> ]*)[^>]*>/is', $html, $matches);
foreach($matches[1] as $url) {
$url = str_replace('&', '&', trim($url));
if(preg_match('/.wmv/i', $url) && !in_array($url, $urls))
$urls[] = $url;
}
return $urls;
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…