With a regular expression this can be done in one call, to preg_replace:
$long_text = "This is a phrase that has 10 words in it. But
more text is following and should be ignored";
$first10 = preg_replace("/^(s*(S+s+){0,9}S+).*/s", '$1', $long_text);
The content of $first10 will be:
This is a phrase that has 10 words in it.
Here is what the regular expression does:
^
: match only at the start of the string, not anywhere else
s*
: allow 0 or more spaces at the start
(...)
: capture group, whatever is matched is referenced later with $1
S+
: one or more non-space characters (i.e. a word)
s+
: one or more spaces
(...){0,9}
: match as many times as possible, up to 9 times.
.*
: anything else up until the end (as nothing else is specified after).
s
: the modifier after the slash dictates that the dot should also match new lines
Note that this will match with any string, as they all start with zero or more blanks, have zero or more words in it, and have zero or more characters following that.
The replacement is:
$1
: whatever was captured in brackets (the outer ones), thereby omitting whatever was matched by .*
.
See it working in this fiddle.
Note that this solution works correctly when:
- words are numbers;
- words are separated by more than one blank;
- words are separated by a line-break, tab, or other white-space;
- the sentence starts with blanks;
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…