Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
845 views
in Technique[技术] by (71.8m points)

algorithm - Solve small symmetric positive definite Ax = b on GPU only

I'm attempting to optimise an application in realtime 3D modelling. The compute part of the application runs almost entirely on the GPU in CUDA. The application requires the solution of a small (6x6) double precision symmetric positive definite linear system Ax = b 500+ times per second. Currently this is being done with an efficient CPU based Linear Algebra library using Cholesky but necessitates the copying of data from the CPU - GPU and back to GPU hundreds of times per second and the overhead of kernel launches each time etc.

How can I calculate the solution to the linear system on the GPU solely without having to take the data onto the CPU at all? I've read a little about the MAGMA library but it seems to use hybrid algorithms rather than GPU only algorithms.

I'm prepared for the fact that the solution of an individual linear system on the GPU is going to be a lot slower than with the existing CPU based library but I want to see if that can be made up for by removing the data communication between the host and device and the overhead of kernel launches etc hundreds of times per second. If there is no GPU only LAPACK-like alternative out there how would I go about implementing something to solve this particular 6x6 case on the GPU only? Could it be done without a huge time investment with GPU BLAS libraries for example?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

NVIDIA posted code for a batched Ax=b solver to the registered developer website last fall. This code works for generic matrices, and should work well enough for your needs provided you can expand the symmetric matrices to full matrices (that should not be an issue for a 6x6?). As the code performs pivoting, which is unnecessary for positive definite matrices, it is not optimal for your case, but you may be able to modify it for your purposes as the code is under a BSD license.

NVIDIA's standard developer website is experiencing some issues at the moment. Here is how you can download the batched solver code at this time:

(1) Go to http://www.nvidia.com/content/cuda/cuda-toolkit.html

(2) If you have an existing NVdeveloper account (e.g. via partners.nvidia.com) click on the green "Login to nvdeveloper" link on the right half of the screen. Otherwise click on "Join nvdeveloper" to apply for a new account; requests for new accounts are typically approved within one business day.

(3) Log in at the prompt with your email address and password

(4) There is a section on the right hand side titled "Newest Downloads". The fifth item from the top is "Batched Solver". Click on that and it will bring you to the download page for the code.

(5) Click on the "download" link, then click "Accept" to accept the license terms. Your download should start.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...