Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

string - Explode a paragraph into sentences in PHP

I have been using

explode(".",$mystring)

to split a paragraph into sentences. However this doen't cover sentences that have been concluded with different punctuation such as ! ? : ;

Is there a way of using an array as a delimiter instead of a single character? Alternativly is there another neat way of splitting using various punctuation?

I tried

explode(("." || "?" || "!"),$mystring)

hopefully but it didn't work...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use preg_split() combined with a PCRE lookahead condition to split the string after each occurance of ., ;, :, ?, !, .. while keeping the actual punctuation intact:

Code:

$subject = 'abc sdfs.    def ghi; this is [email protected]! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);

Result:

Array
(
    [0] => abc sdfs.
    [1] => def ghi;
    [2] => this is [email protected]!
    [3] => asdasdasd?
    [4] => abc xyz
)

You can also add a blacklist for abbreviations (Mr., Mrs., Dr., ..) that should not be split into own sentences by inserting a negative lookbehind assertion:

$subject = 'abc sdfs.   Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);

Result:

Array
(
    [0] => abc sdfs.
    [1] => Dr. Foo said he is not a sentence;
    [2] => asdasdasd?
    [3] => abc xyz
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...