nvidia - GPU shared memory size is very small - what can I do about it? -


the size of shared memory ("local memory" in opencl terms) 16 kib on nvidia gpus of today.
have application in need create array has 10,000 integers. amount of memory need fit 10,000 integers = 10,000 * 4b = 40kb.

  • how can work around this?
  • is there gpu has more 16 kib of shared memory ?

think of shared memory explicitly managed cache. need store array in global memory , cache parts of in shared memory needed, either making multiple passes or other scheme minimises number of loads , stores to/from global memory.

how implement depend on algorithm - if can give details of trying implement may more concrete suggestions.

one last point - aware shared memory shared between threads in block - have way less 16 kb per thread, unless have single data structure common threads in block.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -