More proprietary runtime for Adreno GPU

@echuraev @elvin-n

In the context of adding new features for Adreno (OpenCL runtime) like

  • clImage1D instead instead of clBuffers as default allocation (global scope) as Adreno can benefit through texture path.
  • Recordable queue support
  • More tight integration between CLML and OpenCL while context switch (Reuse CLML/TVM allocated cl mem objects across with out additional copies)

Features like these need changes across Codegen and also the runtime.

Codegen part is a bit easy as defining a virtual target “adreno” as shown below and extending CodeGenOpenCL as CodeGenAdrenoCL to generate sampler based load/store for 1D buffers can achieve this. Here we reuse most of the OpenCL codegen here.

TVM_REGISTER_TARGET_KIND("adreno", kDLOpenCL)
    . . . . .
    .set_default_keys({"opencl", "gpu"});

On the runtime we don’t have information to differentiate regular opencl vs adreno based memory allocation strategy and management.

I see two options here

  • We can define kDLAdreno as native target
  • Alter graph runtime to supply the device specific options via the graph json attributes.

Any thoughts ?

Thanks, Siva

1 Like

I would like to avoid introduction of kDLAdreno so far. The examples that you showed can be covered without introduction of compilation flag

  1. climage1d - it is just another memory scope which will be marked in the network, handled in tvm opencl runtime universally, does not belong to Adreno
  2. Recordable queue support - we can verify in tvm opencl runtime part existence of certain opencl extension and go by certain flow with reusing/extending/copying of cuda streaming capture graph executor. We are prototyping this feature
  3. Need more context what happen here and why we need to have interaction between compilation and runtime that would not fit into standard opencl flow
1 Like

Avoiding new target compilation flag make sense.

1D enablement is not proprietary to Adreno, but not every body want this by default. I think we can define and use target attributes (use_textures) to enable/disable at compilation. Runtime can inspect the device as Adreno and enable them.

Good to hear prototyping of recordable queue. Looking forward to see the PR.

1 Like