Simple approach
This first approach if you, and javascript's definition of "word" match. A more customizable approach is below.
Try test.split(/s*s*/)
. It splits on word boundaries (
) and eats whitespace.
"hello how are you all doing, I hope that it's good! and fine. Looking forward to see you."
.split(/s*s*/);
// Returns:
["hello",
"how",
"are",
"you",
"all",
"doing",
",",
"I",
"hope",
"that",
"it",
"'",
"s",
"good",
"!",
"and",
"fine",
".",
"Looking",
"forward",
"to",
"see",
"you",
"."]
How it works.
var test = "This is. A test?"; // Test string.
// First consider splitting on word boundaries ().
test.split(//); //=> ["This"," ","is",". ","A"," ","test","?"]
// This almost works but there is some unwanted whitespace.
// So we change the split regex to gobble the whitespace using s*
test.split(/s*s*/) //=> ["This","is",".","A","test","?"]
// Now the whitespace is included in the separator
// and not included in the result.
More involved solution.
If you want words like "isn`t" and "one-thousand" to be treated as a single word while javascript regex considers them to be two you will need to create your own definition of a word.
test.match(/[w-']+|[^ws]+/g) //=> ["This","is",".","A","test","?"]
How it works
This matches the actual words an punctuation characters separately using an alternation. The first half of the regex [w-']+
matches whatever you consider to be a word, and the second half [^ws]+
matches whatever you consider punctuation. In this example I just used whatever isn't a word or whitespace. I also but a +
on the end so that multi-character punctuation (such as ?! which is properly written ?) is treated as a single character, if you don't want that remove the +
.