Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
153 views
in Technique[技术] by (71.8m points)

c - Word counting program using multi threading: Large file size

I am trying to write a program which will count words in a large file. I am doing multi threading. But my program gives segmentation fault and I am just stuck here. I am looking for any advice from mentors: The code is given below:

INPUT: file name Output: Segmentation Fault

The code is:

   #include <stdio.h>
#include <pthread.h>
#include <stdlib.h>


struct thread_data{
    FILE *fp;
    long int offset;
    int start;
    int blockSize;
};

int words=0;  

void *countFrequency(void* data){

    struct thread_data* td=data;
    char *buffer = malloc(td->blockSize);

    int i,c;
    i=0;c=0;
    enum states { WHITESPACE, WORD };
    int state = WHITESPACE;

    fseek(td->fp, td->offset, td->start);

        char last = ' '; 
        while ((fread(buffer, td->blockSize, 1, td->fp))==1){

            if ( buffer[0]== ' ' || buffer[0] == '	'  ){
            state = WHITESPACE;
            }
            else if (buffer[0]=='
'){
            //newLine++;
                state = WHITESPACE;
            }
            else {
                if ( state == WHITESPACE ){
                    words++;
                }
                state = WORD;
            }
            last = buffer[0];
    }
    free(buffer);

    pthread_exit(NULL);

    return NULL;
}

int main(int argc, char **argv){

    int nthreads, x, id, blockSize,len;
    //void *state;
    FILE *fp;
    pthread_t *threads;

    struct thread_data data[nthreads];

    if (argc < 2){
        fprintf(stderr, "Usage: ./a.out <file_path>");
        exit(-1);
    }

    if((fp=fopen(argv[1],"r"))==NULL){
        printf("Error opening file");
        exit(-1);
    }  

    printf("Enter the number of threads: ");
    scanf("%d",&nthreads);
    threads = malloc(nthreads*sizeof(pthread_t));

    fseek(fp, 0, SEEK_END);
    len = ftell(fp);  
    printf("len= %d
",len);

    blockSize=(len+nthreads-1)/nthreads;
    printf("size= %d
",blockSize);

    for(id = 0; id < nthreads; id++){

        data[id].fp=fp;
        data[id].offset = blockSize;
        data[id].start = id*blockSize+1;

        }
        //LAST THREAD
        data[nthreads-1].start=(nthreads-1)*blockSize+1;

        for(id = 0; id < nthreads; id++)
            pthread_create(&threads[id], NULL, &countFrequency,&data[id]);

    for(id = 0; id < nthreads; id++)
        pthread_join(threads[id],NULL);

    fclose(fp);
    //free(threads);

    //pthread_exit(NULL);

    printf("%d
",words); 
    return 0;  
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Typecasting does not fix wrong code - it only disguises it or makes it even more wrong. Let's look at those errors:

struct thread_data* td=(struct thread_data)data; /* wrong */

You can't cast a struct thread_data * to a struct thread_data, neither can you assign a struct thread_data to a struct thread_data *. The incorrect and unnecessary cast is the sole cause of the error.

x = pthread_create(&threads[id], NULL, &countFrequency, (void *)data); /* wrong */

Secondly, nor can you cast a struct thread_data to a void * - you need an actual pointer, like the address of data:

x = pthread_create(&threads[id], NULL, &countFrequency, &data);

No cast, either, because pointers to data types convert to void * naturally. Of course, since there's only one copy of data all the threads are going to share it, and all work on whatever the last values written to it were. That's not going to go well - you'll want one struct thread_data per thread.

Thirdly, those warnings are telling you your thread function has the wrong signature:

void *countFrequency(struct thread_data *data) /* wrong */

Combined with the first point, get all the types correct and yet again no casts are needed.

void *countFrequency(void *data) {
    struct thread_data* td = data;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...