Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
219 views
in Technique[技术] by (71.8m points)

list - Going Through multiple columns of a csv file in C

I want to write a program, that reads a very large csv file. I want the file to read the columns by name and then print the entirety of the column. However it only prints out one of the columns in the datalist. So it only prints out the unix timestamp columns out of the entirety of the program. I want the code to be able to print out the other columns as well Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume BTC,Volume USD

csv file:

Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume BTC,Volume USD
1605139200.0,2020-11-12,BTCUSD,15710.87,15731.73,15705.58,15710.01,1.655,26014.29
1605052800.0,2020-11-11,BTCUSD,15318,16000,15293.42,15710.87,1727.17,27111049.25
1604966400.0,2020-11-10,BTCUSD,15348.2,15479.49,15100,15318,1600.04,24521694.72
1604880000.0,2020-11-09,BTCUSD,15484.55,15850,14818,15348.2,2440.85,37356362.78
1604793600.0,2020-11-08,BTCUSD,14845.5,15672.1,14715.98,15484.55,987.72,15035324.13

Current code:

#include<stdio.h>
#include<stdlib.h>
void main()
{
    char buffer[1001]; //get line
    float timestampfile;
    FILE *fp;
    int i=1; //line
    fp = fopen("filename.csv", "r"); //used to read csv
    if(!fp)
    {
        printf("file not found"); //file not found
        exit(0);
    }
    fgets(buffer,1000, fp); //read line
    printf("Expected output print the first column:
");
    while(feof(fp) == 0)
    {
        sscanf(buffer,"%f",&timestampfile); //read data line
        printf("%d: %f
",i,timestampfile); //used to print data
        i++;
        fgets(buffer, 1000, fp);
    }
    printf("end of the column");
    fclose(fp);
}

Current output:

1: 1605139200.000000
2: 1605052800.000000
3: 1604966400.000000
4: 1604880000.000000
5: 1604793600.000000
end of the column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You have started out in the right direction, but you have stumbled a bit in handling separating the comma separated values. The standard C library provides all you need to handle separating the values.

Simple Implementation Using strtok()

The easiest implementation would be to take the filename to read and the index of column to extract as the first two arguments to your program. Then you could simply discard the heading row and output the requested value for the column index. That could be done with a simple loop that keeps track of the token number while calling strtok(). Recall on the first call to strtok() the variable name for the string is passed as the first parameter, ever successive call passes NULL as the first argument until no more tokens are found.

A short example would be:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC 1024       /* if you need a constant, #define one (or more) */
#define DELIM ",
"

int main (int argc, char **argv) {
    
    if (argc < 3) { /* validate filename and column given as arguments */
        fprintf (stderr, "usage: %s filename column
", argv[0]);
        return 1;
    }
    
    char buf[MAXC];                                 /* buffer to hold line */
    size_t ndx = strtoul (argv[2], NULL, 0);        /* column index to retrieve */
    FILE *fp = fopen (argv[1], "r");                /* file pointer */
    
    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    
    if (!fgets (buf, MAXC, fp)) {                   /* read / discard headings row */
        fputs ("error: empty file.
", stderr);
        return 1;
    }
    
    while (fgets (buf, MAXC, fp)) {                 /* read / validate each line */
        char *p = buf;
        size_t i = 0;
        /* loop until the ndx token found */
        for (p = strtok(p, DELIM); p && i < ndx; p = strtok (NULL, DELIM))
            i++;
        if (i == ndx && p)  /* validate token found */
            puts (p);
        else {              /* handle error */
            fputs ("error: invalid index
", stderr);
            break;
        }
    }
}

(note: strtok() considers multiple delimiters as a single delimiter. It cannot be used when empty fields are a possibility such as field1,field2,,field4,.... strsep() was suggested as a replacement for strtok() and it does handle empty-fields, but has shortcomings of its own.)

Example Use/Output

first column (index 0):

$ ./bin/readcsvbycol_strtok dat/largecsv.csv 0
1605139200.0
1605052800.0
1604966400.0
1604880000.0
1604793600.0

second column (index 1)

$ ./bin/readcsvbycol_strtok dat/largecsv.csv 1
2020-11-12
2020-11-11
2020-11-10
2020-11-09
2020-11-08

thrid column (index 2)

$ ./bin/readcsvbycol_strtok dat/largecsv.csv 2
BTCUSD
BTCUSD
BTCUSD
BTCUSD
BTCUSD

forth column (index 3)

$ ./bin/readcsvbycol_strtok dat/largecsv.csv 3
15710.87
15318
15348.2
15484.55
14845.5

request out of range:

$ ./bin/readcsvbycol_strtok dat/largecsv.csv 9
error: invalid index

More Involved Example Displaying Headings as Menu

If you wanted to provide a short interface for the user to choose which column to output, you could count the columns available. You can determine the number of commas present (and adding one more provides the number of columns). You can then save the headings to allow the user to select which column to output by allocating column number of pointers and then by allocating storage for each heading and copying the heading to the storage. You can then display the headings as a menu for the user to select from.

After determining which column to print, you simply read each line into your buffer, and then tokenize the line with either strtok() or strcspn() (the downside to strtok() is that it modifies the buffer, so if you need to preserve it, make a copy). strcspn() returns the length of the token, so it provides the advantage of not modifying the original and providing the number of characters in the token. Then you can output the column value and repeat until you run out of lines.

An example would be:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC 1024       /* if you need a constant, #define one (or more) */

int main (int argc, char **argv) {
    
    char buf[MAXC], *p = buf, **headings = NULL;
    size_t cols = 1, ndx = 0, nchr;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    
    if (!fgets (buf, MAXC, fp)) {                       /* read / validate headings row */
        fputs ("error: empty file.
", stderr);
        return 1;
    }
    
    while (*p && (p = strchr (p, ','))) {               /* loop counting ',' */
        cols++;
        p++;
    }
    p = buf;    /* reset p to start of buf */
    
    /* allocate cols pointers for headings */
    if (!(headings = malloc (cols * sizeof *headings))) {
        perror ("malloc-heading pointers");
        return 1;
    }
    
    /* loop separating headings, allocate/assign storage for each, copy to storage */
    while (*p && *p != '
' && (nchr = strcspn (p, ",
"))) {
        if (!(headings[ndx] = malloc (nchr + 1))) {     /* allocate/validate */
            perror ("malloc headings[ndx]");
            return 1;
        }
        memcpy (headings[ndx], p, nchr);                /* copy to storage */
        headings[ndx++][nchr] = 0;                      /* nul-terminate */
        p += nchr+1;                                    /* advance past ',' */
    }
    
    if (ndx != cols) {  /* validate ndx equals cols */
        fputs ("error: mismatched cols & ndx
", stderr);
        return 1;
    }
    
    puts ("
Available Columns:");                      /* display available columns */
    for (size_t i = 0; i < cols; i++)
        printf (" %2zu) %s
", i, headings[i]);
    while (ndx >= cols) {                               /* get / validate selection */
        fputs ("
Selection: ", stdout);
        if (!fgets (buf, MAXC, stdin)) {                /* read input (same buffer) */
            puts ("(user canceled input)");
            return 0;
        }
        if (sscanf (buf, "%zu", &ndx) != 1 || ndx >= cols)  /* convert/validate */
            fputs ("  error: invalid index.
", stderr);
    }
    
    printf ("
%s values:
", headings[ndx]);           /* display column name */
    
    while (fgets (buf, MAXC, fp)) {                     /* loop displaying column */
        char column[MAXC];
        p = buf;
        /* skip forward ndx ',' */
        for (size_t col = 0; col < ndx && (p = strchr (p, ',')); col++, p++) {}
        /* read column value into column */
        if ((nchr = strcspn (p, ",
"))) {
            memcpy (column, p, nchr);                   /* copy */
            column[nchr] = 0;                           /* nul-terminate */
            puts (column);                              /* output */
        }
    }
    
    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);
    
    for (size_t i = 0; i < cols; i++)   /* free all allocated memory */
        free (headings[i]);
    free (headings);
}

Example Use/Output

$ ./bin/readcsvbycol dat/largecsv.csv

Available Columns:
  0) Unix Timestamp
  1) Date
  2) Symbol
  3) Open
  4) High
  5) Low
  6) Close
  7) Volume BTC
  8) Volume USD

Selection: 1

Date values:
2020-11-12
2020-11-11
2020-11-10
2020-11-09
2020-11-08

Or the open values:

$ ./bin/readcsvbycol dat/largecsv.csv

Available Columns:
  0) Unix Timestamp
  1) Date
  2) Symbol
  3) Open
  4) High
  5) Low
  6) Close
  7) Volume BTC
  8) Volume USD

Selection: 3

Open values:
15710.87
15318
15348.2
15484.55
14845.5

Column out of range canceling input with Ctrl + d (Ctrl + z on windows):

$ ./bin/readcsvbycol dat/largecsv.csv

Available Columns:
0) Unix Timestamp
1) Date
2) Symbol
3) Open
4) High
5) Low
6) Close
7) Volume BTC
8) Volume USD

Selection: 9
error: invalid index.

Selection: (user canceled input)

Both approaches accomplish the same thing, it all depends on your program needs. Look things over and let me know if you have further questions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...