PCRE and newlines
PCRE has a superfluity of newline related escape sequences and alternatives.
Well, a nifty escape sequence that you can use here is R
. By default R
will match Unicode newlines sequences, but it can be configured using different alternatives.
To match any Unicode newline sequence that is in the ASCII
range.
preg_match('~R~', $string);
This is equivalent to the following group:
(?>
|
|
|f|x0b|x85)
To match any Unicode newline sequence; including newline characters outside the ASCII
range and both the line separator (U+2028
) and paragraph separator (U+2029
), you want to turn on the u
(unicode) flag.
preg_match('~R~u', $string);
The u
(unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).
The is equivalent to the following group:
(?>
|
|
|f|x0b|x85|x{2028}|x{2029})
It is possible to restrict R
to match CR
, LF
, or CRLF
only:
preg_match('~(*BSR_ANYCRLF)R~', $string);
The is equivalent to the following group:
(?>
|
|
)
Additional
Five different conventions for indicating line breaks in strings are supported:
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
Note: R
does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…