Disclaimer : You're maybe off using a library, because it's tougher than you think. I also want to start this answer on how to match URL's that are not within @font-face {}. I also suppose/define that the brackets {} are balanced within @font-face {}.
Note : I'm going to use "~" as delimiters instead of "/", this will releave me from escaping later on in my expressions. Also note that I will be posting online demos from regex101.com, on that site I'll be using the g modifier. You should remove the g modifier and just use preg_match_all().
Let's use some regex Fu !!!
Part 1 : matching url's that are not within @font-face {}
1.1 Matching @font-face {}
Oh yes, this might sound "weird" but you will notice later on why :)
We'll need some recursive regex here:
@font-faces* # Match @font-face and some spaces
( # Start group 1
{ # Match {
(?: # A non-capturing group
[^{}]+ # Match anything except {} one or more times
| # Or
(?1) # Recurse/rerun the expression of group 1
)* # Repeat 0 or more times
} # Match }
) # End group 1
demo
1.2 Escaping @font-face {}
We'll use (*SKIP)(*FAIL)
just after the previous regex, it will skip it. See this answer to get an idea how it works.
demo
1.3 Matching url()
We'll use something like this:
urls*( # Match url, optionally some whitespaces and then (
s* # Match optionally some whitespaces
("|'|) # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
(?!["']?(?:https?://|ftp://)) # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\]|\.)*? # Match anything except a backslash or backslash and a character zero or more times ungreedy
2 # Match what was matched in group 2
s* # Match optionally some whitespaces
) # Match )
Note that I'm using 2
because I've appended this to the previous regex which has group 1.
Here's another use of ("|')(?:[^\]|\.)*?1
.
demo
1.4 Matching the value inside url()
You might have guessed we need to use some lookaround-fu, the problem is with a lookbehind since it needs to be fixed length. I've got a workaround for that, I'll introduce you to the K
escape sequence. It will reset the beginning of the match to the current position in the token list. more-info
Well let's drop K
somewhere in our expression and use a lookahead, our final regex will be :
@font-faces* # Match @font-face and some spaces
( # Start group 1
{ # Match {
(?: # A non-capturing group
[^{}]+ # Match anything except {} one or more times
| # Or
(?1) # Recurse/rerun the expression of group 1
)* # Repeat 0 or more times
} # Match }
) # End group 1
(*SKIP)(*FAIL) # Skip it
| # Or
urls*( # Match url, optionally some whitespaces and then (
s* # Match optionally some whitespaces
("|'|) # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
K # Reset the match
(?!["']?(?:https?://|ftp://)) # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\]|\.)*? # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?= # Lookahead
2 # Match what was matched in group 2
s* # Match optionally some whitespaces
) # Match )
)
demo
1.5 Using the pattern in PHP
We'll need to escape some things like quotes, backslashes \\
=
, use the right function and the right modifiers:
$regex = '~
@font-faces* # Match @font-face and some spaces
( # Start group 1
{ # Match {
(?: # A non-capturing group
[^{}]+ # Match anything except {} one or more times
| # Or
(?1) # Recurse/rerun the expression of group 1
)* # Repeat 0 or more times
} # Match }
) # End group 1
(*SKIP)(*FAIL) # Skip it
| # Or
urls*( # Match url, optionally some whitespaces and then (
s* # Match optionally some whitespaces
("|'|) # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
K # Reset the match
(?!["']?(?:https?://|ftp://)) # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\\]|\\.)*? # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?= # Lookahead
2 # Match what was matched in group 2
s* # Match optionally some whitespaces
) # Match )
)
~xs';
$input = file_get_contents($css_file);
preg_match_all($regex, $input, $m);
echo '<pre>'. print_r($m[0], true) . '</pre>';
demo
Part 2 : matching url's that are within @font-face {}
2.1 Different approach
I want to do this part in 2 regexes because it will be a pain to match URL's that are within @font-face {}
while taking care of the state of braces {}
in a recursive regex.
And since we already have the pieces we need, we'll only need to apply them in some code:
- Match all
@font-face {}
instances
- Loop through these and match all url()'s
2.2 Putting it into code
$results = array(); // Just an empty array;
$fontface_regex = '~
@font-faces* # Match @font-face and some spaces
( # Start group 1
{ # Match {
(?: # A non-capturing group
[^{}]+ # Match anything except {} one or more times
| # Or
(?1) # Recurse/rerun the expression of group 1
)* # Repeat 0 or more times
} # Match }
) # End group 1
~xs';
$url_regex = '~
urls*( # Match url, optionally some whitespaces and then (
s* # Match optionally some whitespaces
("|'|) # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
K # Reset the match
(?!["']?(?:https?://|ftp://)) # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\\]|\\.)*? # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?= # Lookahead
1 # Match what was matched in group 2
s* # Match optionally some whitespaces
) # Match )
)
~xs';
$input = file_get_contents($css_file);
preg_match_all($fontface_regex, $input, $fontfaces); // Get all font-face instances
if(isset($fontfaces[0])){ // If there is a match then
foreach($fontfaces[0] as $fontface){ // Foreach instance
preg_match_all($url_regex, $fontface, $r); // Let's match the url's
if(isset($r[0])){ // If there is a hit
$results[] = $r[0]; // Then add it to the results array
}
}
}
echo '<pre>'. print_r($results, true) . '</pre>'; // Show the results
demo
????????????????????????????????????????????????????????????????????Join the regex chatroom !