int n = sscanf("string", "%s %[^, ]%*[, ]%s", word1, word2, word3);
The return value in n
tells you how many assignments were made successfully. The %[^, ]
is a negated character-class match that finds a word not including either commas or blanks (add tabs if you like). The %*[, ]
is a match that finds a comma or space but suppresses the assignment.
I'm not sure I'd use this in practice, but it should work. It is, however, untested.
Maybe a tighter specification is:
int n = sscanf("string", "%s %[^, ]%*[,]%s", word1, word2, word3);
The difference is that the non-assigning character class only accepts a comma. sscanf()
stops at any space (or EOS, end of string) after word2
, and skips spaces before assigning to word3
. The previous edition allowed a space between the second and third words in lieu of a comma, which the question does not strictly allow.
As pmg suggests in a comment, the assigning conversion specifications should be given a length to prevent buffer overflow. Note that the length does not include the null terminator, so the value in the format string must be one less than the size of the arrays in bytes. Also note that whereas printf()
allows you to specify sizes dynamically with *
, sscanf()
et al use *
to suppress assignment. That means you have to create the string specifically for the task at hand:
char word1[20], word2[32], word3[64];
int n = sscanf("string", "%19s %31[^, ]%*[,]%63s", word1, word2, word3);
(Kernighan & Pike suggest formatting the format string dynamically in their (excellent) book 'The Practice of Programming' or Amazon The Practice of Programming 1999.)
Just found a problem: given "word1 word2 ,word3"
, it doesn't read word3
. Is there a cure?
Yes, there's a cure, and it is actually trivial, too. Add a space in the format string before the non-assigning, comma-matching conversion specification. Thus:
#include <stdio.h>
static void tester(const char *data)
{
char word1[20], word2[32], word3[64];
int n = sscanf(data, "%19s %31[^, ] %*[,]%63s", word1, word2, word3);
printf("Test data: <<%s>>
", data);
printf("n = %d; w1 = <<%s>>, w2 = <<%s>>, w3 = <<%s>>
", n, word1, word2, word3);
}
int main(void)
{
const char *data[] =
{
"word1 word2 , word3",
"word1 word2 ,word3",
"word1 word2, word3",
"word1 word2,word3",
"word1 word2 , word3",
};
enum { DATA_SIZE = sizeof(data)/sizeof(data[0]) };
size_t i;
for (i = 0; i < DATA_SIZE; i++)
tester(data[i]);
return(0);
}
Example output:
Test data: <<word1 word2 , word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2 ,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2, word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2 , word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Once the 'non-assigning character class' only accepts a comma, you can abbreviate that to a literal comma in the format string:
int n = sscanf(data, "%19s %31[^, ] , %63s", word1, word2, word3);
Plugging that into the test harness produces the same result as before. Note that all code benefits from review; it can often (essentially always) be improved even after it is working.