What R
matches
By default, the sequence R in a pattern matches any Unicode newline
sequence, whatever has been selected as the line ending sequence. If
you specify
--enable-bsr-anycrlf
the default is changed so that R matches only CR, LF, or CRLF. Whatever is selected when PCRE is built can be overridden when the library
functions are called.
Newline sequences
Outside a character class, by default, the escape sequence R matches
any Unicode newline sequence. In non-UTF-8 mode R is equivalent to the
following:
(?>
|
|x0b|f|
|x85)
This is an example of an "atomic group", details of which are given
below. This particular group matches either the two-character sequence
CR followed by LF, or one of the single characters LF (linefeed,
U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
return, U+000D), or NEL (next line, U+0085). The two-character sequence
is treated as a single unit that cannot be split.
In UTF-8 mode, two additional characters whose codepoints are greater
than 255 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). Unicode character property support is not needed for
these characters to be recognized.
It is possible to restrict R to match only CR, LF, or CRLF (instead of
the complete set of Unicode line endings) by setting the option
PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
(BSR is an abbrevation for "backslash R".) This can be made the default
when PCRE is built; if this is the case, the other behaviour can be
requested via the PCRE_BSR_UNICODE option. It is also possible to
specify these settings by starting a pattern string with one of the
following sequences:
(*BSR_ANYCRLF) CR, LF, or CRLF only
(*BSR_UNICODE) any Unicode newline sequence
These override the default and the options given to pcre_compile() or
pcre_compile2(), but they can be overridden by options given to
pcre_exec() or pcre_dfa_exec(). Note that these special settings, which
are not Perl-compatible, are recognized only at the very start of a
pattern, and that they must be in upper case. If more than one of them
is present, the last one is used. They can be combined with a change of
newline convention; for example, a pattern can start with:
(*ANY)(*BSR_ANYCRLF)
They can also be combined with the (*UTF8) or (*UCP) special sequences.
Inside a character class, R is treated as an unrecognized escape
sequence, and so matches the letter "R" by default, but causes an error
if PCRE_EXTRA is set.