Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
231 views
in Technique[技术] by (71.8m points)

multithreading - How to integrate my calculation C code with OpenMP

I am using a dual channels DAQ card with data stream mode. I wrote some code for analysis/calculation and put them to the main code for operation. However, the FIFO overflow warning sign always occur once its total data reach around 6000 MSamples (the DAQ on board memory is 8GB). I am well-noticed that a complicated calculation might retard the system and cause the overflow but all of the works I wrote are necessary to my experiment which means cannot be replaced (or there is more effective code can let me get the same result). I have heard that the OpenMP might be a solution to boost up the speed, but I am just a beginner in C, how could I implement to my calculation code?

My computer has 64GB RAM and Intel Core i7 processor. I always turn off other unnecessary software when running the data stream code. The code has been optimize as possible as I can, like simplify the hilbert() and use memcpy to pick out a specific range of data points.

This is how I process the data:

1.Install the FFTW source code for the Hilbert transform. 2.For loop to de-interleave pi16Buffer data to ch2Buffer 3.memcpy to get a certain range of data that I am interested put them to another array called ch2newBuffer 4.Do the hilbert() on ch2newBuffer and calculate its absolute number. 5.Find the max value of ch1 and abs(hilbert(ch2newBuffer)). 6.Calculate max(abs(hilbert(ch2))) / max(ch1).

Here is a part of the my DAQ code which in charge to calculation:

void hilbert(const int16* in, fftw_complex* out, fftw_plan plan_forward, fftw_plan plan_backward)
{
    // copy the data to the complex array
    for (int i = 0; i < N; ++i) {
        out[i][REAL] = in[i];
        out[i][IMAG] = 0;
    }
    // creat a DFT plan and execute it
    //fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_execute(plan_forward);
    // destroy a plan to prevent memory leak
    //fftw_destroy_plan(plan_forward);
    int hN = N>>1; // half of the length (N/2)
    int numRem = hN; // the number of remaining elements
    // multiply the appropriate value by 2
    //(those should multiplied by 1 are left intact because they wouldn't change)
    for (int i = 1; i < hN; ++i) {
        out[i][REAL] *= 2;
        out[i][IMAG] *= 2;
    }
    // if the length is even, the number of the remaining elements decrease by 1
    if (N % 2 == 0)
        numRem--;
    else if (N > 1) {
        out[hN][REAL] *= 2;
        out[hN][IMAG] *= 2;
    }
    // set the remaining value to 0
    // (multiplying by 0 gives 0, so we don't care about the multiplicands)
    memset(&out[hN + 1][REAL], 0, numRem * sizeof(fftw_complex));
    // creat a IDFT plan and execute it
    //plan = fftw_plan_dft_1d(N, out, out, FFTW_BACKWARD, FFTW_ESTIMATE);
    fftw_execute(plan_backward);
    // do some cleaning
    //fftw_destroy_plan(plan_backward);
    //fftw_cleanup();
    // scale the IDFT output
    //for (int i = 0; i < N; ++i) {
        //out[i][REAL] /= N;
        //out[i][IMAG] /= N;
    //}
}


float SumBufferData(void* pBuffer, uInt32 u32Size, uInt32 u32SampleBits)
{
    // In this routine we sum up all the samples in the buffer. This function 
    // should be replaced with the user's analysys function


    if ( 8 == u32SampleBits )
    {
        pu8Buffer = (uInt8 *)pBuffer;
        for (i = 0; i < u32Size; i++)
        {
            i64Sum += pu8Buffer[i];
        }
    }
    else
    {
        pi16Buffer = (int16 *)pBuffer;
        
        
        
        fftw_complex(hilbertedch2[N]);
        fftw_plan plan_forward = fftw_plan_dft_1d(N, hilbertedch2, hilbertedch2, FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_plan plan_backward = fftw_plan_dft_1d(N, hilbertedch2, hilbertedch2, FFTW_BACKWARD, FFTW_ESTIMATE);
    
    
        
        ch2Buffer = (int16*)calloc(u32Size / 2, sizeof * ch2Buffer);
        ch2newBuffer= (int16*)calloc(u32Size/2, sizeof* ch2newBuffer);
        

        // De-interleave the data from pi16Buffer
        for (i = 0; i < u32Size/2 ; i++)
        {
            ch2Buffer[i] = pi16Buffer[i*2+1];

        }

        // Pick out the data points range that we are interested
        memcpy(ch2newBuffer, &ch2Buffer[6944], 1024 * sizeof(ch2Buffer[0]));
        // Do the hilbert transform to these data points
        hilbert(ch2newBuffer, hilbertedch2, plan_forward, plan_backward);
        fftw_destroy_plan(plan_forward);
        fftw_destroy_plan(plan_backward);

        
        //Find max value in each segs of ch1 and ch2
        for (i = 128; i < 200 ; i++)
        {
            if (pi16Buffer[i*2] > max1)
                max1 = pi16Buffer[i*2];
        }

        for (i = 0; i < 1024; i++)
        {
            if (fabs(hilbertedch2[i][IMAG]) > max2)
                max2 = fabs(hilbertedch2[i][IMAG]);
        }
        
        Corrected = max2 / max1 / N; // Calculate the signal correction

        
        
    
        
        
    }
    free(ch2Buffer);
    free(ch2newBuffer);
    return Corrected;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Loop are typically a good start for parallelism, for instance:

#pragma omp parallel for
for (int i = 0; i < N; ++i) {
    out[i][REAL] = in[i];
    out[i][IMAG] = 0;
}

or

#pragma omp parallel for reduction(max:max2)
for (i = 0; i < 1024; i++)
{
     float tmp = fabs(hilbertedch2[i][IMAG]);
     max2 = (max2 > tmp) ? max2 : tmp.
}

That being said, you need to profile your code find out where the execution takes the most time and try to parallelized if possible. However, looking at what you have posted, I do not see a lot of parallelism opportunity there.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...