Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
149 views
in Technique[技术] by (71.8m points)

c++ - Incomplete output from printf() called on device

For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.

EDIT: My gpu is GeForce GTX 550i, with compute capability 2.1

My code:

#include<stdio.h>
#include<stdlib.h>
#define N 4096

__global__ void Printcell(float *d_Array , int n){
    int k = 0;

    printf("
=========== data of d_Array on device==============
");
    for( k = 0; k < n; k++ ){
        printf("%f  ", d_Array[k]);
        if((k+1)%6 == 0) printf("
");
    }
    printf("

Totally %d elements has been printed", k);
}

int main(){

    int i =0;

    float Array[N] = {0}, rArray[N] = {0};
    float *d_Array;
    for(i=0;i<N;i++)
        Array[i] = i;


    cudaMalloc((void**)&d_Array, N*sizeof(float));
    cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
    cudaDeviceSynchronize();
    Printcell<<<1,1>>>(d_Array, N);    //Print the device array by a kernel
    cudaDeviceSynchronize();

    /* Copy the device array back to host to see if it was correctly copied */   
    cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);

    printf("

");

    for(i=0;i<N;i++){
        printf("%f  ", rArray[i]);
        if((i+1)%6 == 0) printf("
");
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.

referring to the programmer's guide:

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.

Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.

The linked documentation indicates that the buffer size can be increased, also.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...