OPEN CL Zero copy with UserData

Hi @srkreddy1238,

I’m trying to achieve zero copy of my input data using OPENCL Host ptr support. However I don’t want to create a new EMPTY NDARRAY as the input data is already allocated. Currently, I’ve modified the AllocDataSpace() (to create cl mem from existing data ptr using flag CL_MEM_USE_HOST_PTR) exposed it as a packed func to use it directly. With this I’m able to successfully execute with zero copy.

But I have few queries.

  1. The BufferDescriptor ptr returned by allocatedataspace is not aligned with KAllocAlign value so I need to turn off TVMAlignChecks when using NdArray::FromDLPack().
  2. Is there a better / optimal way to handle this zero copy?
  3. is there a plan to include OpenCLWorkspace class to tvm/include folder. Currently, I need to expose as packedFunc and register the same. (same case for GetNativePtr())

About time you raised requirement on CL_MEM_USE_HOST_PTR.

Internally, I am working on similar functionality with OpenCLWorkspace::SetNativePtr. This basically replaces the internal cl_mem and creates new cl_mem backed by host_ptr provided.

Packed functions is a way for customization. In this case as the DeviceAPI is globally accessible already we can extend OpenCL DeviceAPI as needed tvm::runtime::cl::OpenCLWorkspace::Global();

Thanks for pointing this requirements it helps me to design the new API flexible for any kind of host ptr like ion …etc. Will try upstreaming soon.

1 Like

Oh great :). I’m also experimenting the same feature (CL_MEM_USE_HOST_PTR) for set_output_zero_copy. I do have one more requirement.

desc->host_ptr = reinterpret_cast<cl_uchar*>(
      clEnqueueMapBuffer(this->GetQueue(dev), desc->buffer, CL_TRUE, CL_MAP_WRITE, 0,
                         sizeof(cl_uchar) * size, 0, NULL, NULL, &err_code));

Can we have separate api / support for CL_MAP_READ. such that output read can also be achieved with zero copy :slight_smile:

one query regarding set_output_zero_copy. If I set the outputs in a similar fashion, (USE_HOST_PTR), I observe that the GetNativePtr() [i.e `clEnqueueMapBuffer - with CL_MAP_READ | CL_MAP_WRITE` flag] returns 0.0 as output. However, If I use TVMArrayCopyToBytes(zero_copied_array, host_ptr) [i.e `clEnqueueReadBuffer`] I’m able to read the data. @srkreddy1238 Is there a way to achieve true zero copy in this scenario.

Here is the test case for data loop with host_ptr. Check if these test pass.

1 Like

okay, so here’s the problem.

for input memory

  1. the memory zero copy works fine. from cl_mem and back to my void*
  2. Any changes made to native ptr directly is reflected to cl_mem.

for output memory

  1. the memory copy kinda works. Here’s my flow
    1.1 Malloc output data. 1.2 Create a cl::BufferDescriptor and assign this data for zero copy (through AllocDataSpace() using USE_HOST_PTR), assign this to the data ptr of my NDArray. 1.3 set_output_zero_copy(output_name, NDArray). 1.4 Run the model.

My obversations are.

  1. After run, the get_native_ptr is giving me the same value which was initiated during malloc.
  2. If I use CopyTo (or) CopyToBytes() (with my NdArray used in step 1.2 and 1.3), both eventually calls clEnqueueReadBuffer , in this case the outputs are fine.

So basically, for output ptrs, the native ptr returned is having the same value which was initiated even after Run(), however CopyTo() returns proper value. the same is not observed for input_ptrs :slight_smile:

let me reiterate my understanding

We have two cases of sharing of memory 1: clDriver allocates and share to host (via clMap api) : My test cases refer to this approach and data consistency is verified (host writing → clDriver reading, clDriver writing and host reading). 2: Host allocated and share to clDriver (use USE_HOST_PTR while creating cl_mem): This is new requirement. Have implemented this feature via SetNativePtr extn as said earlier for extensions cl_qcom_ion_host_ptr and cl_qcom_dmabuf_host_ptr internally. Let me add this use case where data allocated via malloc.

Will probably raise a PR for this.