nvcc
has many formats by which the code generation options can be specified. A read of section 6 of the nvcc manual may be instructive.
when using this format:
nvcc -gencode arch=compute_13,code=sm_13 ...
only the SASS code for a sm_13 (cc 1.3) device will be retained. There will be no PTX retained in the executable object, and so the code can only run on a device capable of running cc1.3 SASS.
Using the above command format, in order to embed a PTX version of the source code into the executable object, it's necessary to use a virtual architecture specification for the option provided to code=...
. Since this particular format (using -gencode
) does not allow specification of multiple targets in a single switch, we must pass the -gencode
switch multiple times to nvcc, one for each target we desire to be embedded in the executable object.
So extending the above example, we could use the following:
nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...
This would embed both cc1.3 SASS (by the first gencode
switch) and cc1.3 PTX (by the second gencode
switch) in the executable. Devices capable of running cc1.3 SASS code directly will use that. Other devices (of compute capability greater than cc 1.3) will do a JIT-compile step by the driver, to convert the cc1.3 PTX code to a SASS code with an architecture suitable for the device in question.
I agree that the GTC 2013 presentation (e.g. slide 37) seems to suggest that
nvcc -gencode arch=compute_13,code=sm_13 ...
is sufficient for all devices of compute capability 1.3 or higher. It is not, and this is easy to demonstrate. If you compile a code using the above format, and attempt to run it on a cc 2.0 device, it will fail with an "invalid device function" error associated with any kernel or kernels you have in your code.
Again, nvcc
has a variety of command formats and "shortcuts" for specifying code generation. Some relatively simple ones, such as:
nvcc -arch=sm_13 ...
will embed both a PTX and SASS version of the code in the executable object, resulting in the kind of forward-compatibility suggested.