This question on StackExchange was put on hold as primarily opinion-based: “…answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise.”
The content of StackExchange is usually high quality, but in this case, while the design decision was based on opinion, the answer to the question needn’t be… you just need to ask the people who know! And the inimitable talonmies, who is poised to crack 30k on StackExchange’s points-based reputation system, compounded the problem by saying that CUdeviceptr
is a handle to a device memory allocation, not a pointer.
I don’t think I have ever seen talonmies give an incorrect answer before; but in this case, he’s off the mark. CUdeviceptr
always has represented a pointer in the CUDA address space. In fact, though it was frowned upon to mix driver API and CUDA runtime code, even in CUDA 1.0 you could transform between CUDART’s void *
and the driver API’s CUdeviceptr
by writing something like:
void *p;
CUdeviceptr dptr;
p = (void *) (uintptr_t) dptr;
dptr = (CUdeviceptr) (uintptr_t) p;
We could have made device pointers void *
, but there was a desire to make it easy for compilers to distinguish between host and device pointers at compile time instead of runtime. Furthermore, SM 1.x hardware only supported 32-bit pointers, so using void *
would have created a difference in pointer size on 64-bit host platforms. It’s a long-distant memory now, since so much great compiler work has gone into CUDA since then, but at the time “pointer-squashing” (having CUDA’s compiler transform 64-bit pointers into 32-bit pointers on 64-bit host systems) was a big issue in early versions of CUDA.
For the record, not making the driver API’s device pointer type void *
is one of my bigger regrets about early CUDA development. It took months to refactor the driver to support 64-bit device pointers when hardware support for that feature became available in SM 2.x class hardware.
In fact, some weeks before we released CUDA 1.0, we had a meeting and a serious discussion about replacing CUdeviceptr
with void *
, and decided not to take the schedule hit. We weren’t going to let perfect be the end of done, and we paid the price later.
While we’re on the topic of regrettable design decisions in early CUDA, I wish I had done a search-and-replace to convert cuFunction
to cuKernel
, and put cuLaunchKernel
in the first release (in place of the stateful, chatty and not-thread-safe cuParamSet*
family of functions). But we had scant engineering resources to spend on fit and finish, a constraint that is no less true for CUDA than for many other successful software projects in history.