Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
153 views
in Technique[技术] by (71.8m points)

How to replace different newline styles in PHP the smartest way?

I have a text which might have different newline styles. I want to replace all newlines ' ', ' ',' ' with the same newline (in this case ).

What's the fastest way to do this? My current solution looks like this which is way sucky:

    $sNicetext = str_replace("
",'%%%%somthing%%%%', $sNicetext);
    $sNicetext = str_replace(array("
","
"),array("
","
"), $sNicetext);
    $sNicetext = str_replace('%%%%somthing%%%%',"
", $sNicetext);

Problem is that you can't do this with one replace because the will be duplicated to .

Thank you for your help!

question from:https://stackoverflow.com/questions/7836632/how-to-replace-different-newline-styles-in-php-the-smartest-way

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
$string = preg_replace('~R~u', "
", $string);

If you don't want to replace all Unicode newlines but only CRLF style ones, use:

$string = preg_replace('~(*BSR_ANYCRLF)R~', "
", $string);

R matches these newlines, u is a modifier to treat the input string as UTF-8.


From the PCRE docs:

What R matches

By default, the sequence R in a pattern matches any Unicode newline sequence, whatever has been selected as the line ending sequence. If you specify

     --enable-bsr-anycrlf

the default is changed so that R matches only CR, LF, or CRLF. Whatever is selected when PCRE is built can be overridden when the library functions are called.

and

Newline sequences

Outside a character class, by default, the escape sequence R matches any Unicode newline sequence. In non-UTF-8 mode R is equivalent to the following:

    (?>
|
|x0b|f|
|x85)

This is an example of an "atomic group", details of which are given below. This particular group matches either the two-character sequence CR followed by LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next line, U+0085). The two-character sequence is treated as a single unit that cannot be split.

In UTF-8 mode, two additional characters whose codepoints are greater than 255 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). Unicode character property support is not needed for these characters to be recognized.

It is possible to restrict R to match only CR, LF, or CRLF (instead of the complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched. (BSR is an abbrevation for "backslash R".) This can be made the default when PCRE is built; if this is the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option. It is also possible to specify these settings by starting a pattern string with one of the following sequences:

    (*BSR_ANYCRLF)   CR, LF, or CRLF only
    (*BSR_UNICODE)   any Unicode newline sequence

These override the default and the options given to pcre_compile() or pcre_compile2(), but they can be overridden by options given to pcre_exec() or pcre_dfa_exec(). Note that these special settings, which are not Perl-compatible, are recognized only at the very start of a pattern, and that they must be in upper case. If more than one of them is present, the last one is used. They can be combined with a change of newline convention; for example, a pattern can start with:

    (*ANY)(*BSR_ANYCRLF)

They can also be combined with the (*UTF8) or (*UCP) special sequences. Inside a character class, R is treated as an unrecognized escape sequence, and so matches the letter "R" by default, but causes an error if PCRE_EXTRA is set.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...