Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
87 views
in Technique[技术] by (71.8m points)

python - CUDA Data Structure for Jagged, Static, and Sparse Data

I wanted to get some advice on the most efficient way to interact with my data on the GPU. The data is stored in a Python dictionary of Sorted Dict. My plan was to transfer/ transform the dictionary data into the following format to the device:

Row i: [key_index_0: float_0 ,key_index_1: float_1 ,...,key_index_n: float_n],

where key_index_0: float_0 are stored in a struct and all key_index are already in sorted order. Then I want to perform some calculation (for example the dot product) on all matching key values present in both A_i, and B_j. Below is an example:

Let A = [ [key_0: float_0  ,key_1: float_1  ,...,key_3: float_3 ] ] 
Let B = [ [key_0: float_0  ,key_1: float_1  ,…,key_5: float_5 ],
        [key_0: float_0  ,key_1: float_1  ,…,key_10:float_10 ] ]
Dot_product_results = [value, value] = (A_i. * B_j) 

I’m currently considering the following three approaches:

  1. Manually refactor/flatten the jagged array into a flat (1D) array. In this approach, I believe I would need one array containing all categories and one (1D) arrays that stores the start of each individual sub-array. For example, the following array [(1:0.5), (2:3.43), (3:0.00032)], [(2:1.0), (6:1.5)] would be transformed as data = [(1:0.5), (2:3.43), (3:0.00032), (2:1.0), (6:1.5)] and index = [0,3]. I am hoping that there is a way to use this data structure without pointer chasing, use coalesced memory, and somehow take advantage of shared memory.
  2. The second approach would be to create a 2D matrix that is equal to the largest array of my data, and, then append a bunch of empty structs to ensure uniformity. In theory, the B matrix will be larger than the A matrix, and thus I will keep it in column-major order to ensure memory coalescing on the GPU. Which means, the A matrix would need to be transposed; meaning that the x-dimension would contain categories and the y-dimension contain key-values. In this scenario then, the calculation operation would be more like (B_i. * A_j). I think that this approach would be horrible for GPU memory as less than 0.05% of columns would be filled with non-empty structs. To fix that, I could use a sparse matrix representation to reduce the memory overhead for this approach.
  3. My final approach would be to utilize a more traditional approach of using (SpMV) or (SpMM) representation, such as ELLPACK (ELL). Basically, my data would no longer be stored inside a struct, rather the columns indexes value would be mapped to the keys, the row indexes would mapped the category index, and the data stored inside each cell would be the float values.

So, any suggestions, any alternative ideas, or any help be greatly appreciated.

question from:https://stackoverflow.com/questions/65865724/cuda-data-structure-for-jagged-static-and-sparse-data

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...