I just had the same problem you had (took me a whole day to fix).
I'm sure people with the same problem will stumble upon this, that's why I'm posting to this old question.
You propably didn't check for the maximum work group size of the kernel.
This is how you do it:
size_t kernel_work_group_size;
clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel_work_group_size, NULL);
My devices (2x NVIDIA GTX 460 & Intel i7 CPU) support a maximum work group size of 1024, but the above code returns something around 500 when I pass my Path Tracing kernel.
When I used a workgroup size of 1024 it obviously failed and gave me the CL_OUT_OF_RESOURCES error.
The more complex your kernel becomes, the smaller the maximum workgroup size for it will become (or that's at least what I experienced).
Edit:
I just realized you said "clEnqueueReadBuffer" instead of "clEnqueueNDRangeKernel"...
My answer was related to clEnqueueNDRangeKernel.
Sorry for the mistake.
I hope this is still useful to other people.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…