Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
465 views
in Technique[技术] by (71.8m points)

c - How to get scanf to continue with empty scanset

I am currently trying to parse UnicodeData.txt with this format: ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html However, I am hitting a problem in that when I try to read, say a line like the following.

something;123D;;LINE TABULATION;

I try to get the data from the fields by code such as the following. The problem is that fields[3] is not getting filled in, and scanf is returning 2. in is the current line.

char fields[4][256];
sscanf(in, "%[^;];%[^;];%[^;];%[^;];%[^;];",
    fields[0], fields[1], fields[2], fields[3]);

I know this is the correct implementation of scanf(), but is there a way to get this to work, short of making my own scanf()?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

scanf does not handle "empty" fields. So you will have to parse it on your own.

The following solution is:

  • fast, as it uses strchr rather than the quite slow sscanf
  • flexible, as it will detect an arbitrary number of fields, up to a given maximum.

The function parse extracts fields from the input str, separated by semi-colons. Four semi-colons give five fields, some or all of which can be blank. No provision is made for escaping the semi-colons.

#include <stdio.h>
#include <string.h>

static int parse(char *str, char *out[], int max_num) {
    int num = 0;
    out[num++] = str;
    while (num < max_num && str && (str = strchr(str, ';'))) {
        *str = 0;           // nul-terminate previous field
        out[num++] = ++str; // save start of next field
    }
    return num;
}

int main(void) {
    char test[] = "something;123D;;LINE TABULATION;";
    char *field[99];
    int num = parse(test, field, 99);
    int i;
    for (i = 0; i < num; i++)
        printf("[%s]", field[i]);
    printf("
");
    return 0;
}

The output of this test program is:

[something][123D][][LINE TABULATION][]

Update: A slightly shorter version, which doesn't require an extra array to store the start of each substring, is:

#include <stdio.h>
#include <string.h>

static int replaceSemicolonsWithNuls(char *p) {
    int num = 0;
    while ((p = strchr(p, ';'))) {
        *p++ = 0;
        num++; 
    }
    return num;
}

int main(void) {
    char test[] = "something;123D;;LINE TABULATION;";
    int num = replaceSemicolonsWithNuls(test);
    int i;
    char *p = test;
    for (i = 0; i < num; i++, p += strlen(p) + 1)
        printf("[%s]", p);
    printf("
");
    return 0;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...