More proprietary runtime for Adreno GPU

srkreddy1238 · February 17, 2023, 11:19am

In the context of adding new features for Adreno (OpenCL runtime) like

clImage1D instead instead of clBuffers as default allocation (global scope) as Adreno can benefit through texture path.
Recordable queue support
More tight integration between CLML and OpenCL while context switch (Reuse CLML/TVM allocated cl mem objects across with out additional copies)

Features like these need changes across Codegen and also the runtime.

Codegen part is a bit easy as defining a virtual target “adreno” as shown below and extending CodeGenOpenCL as CodeGenAdrenoCL to generate sampler based load/store for 1D buffers can achieve this. Here we reuse most of the OpenCL codegen here.

TVM_REGISTER_TARGET_KIND("adreno", kDLOpenCL)
    . . . . .
    .set_default_keys({"opencl", "gpu"});

On the runtime we don’t have information to differentiate regular opencl vs adreno based memory allocation strategy and management.

I see two options here

We can define kDLAdreno as native target
Alter graph runtime to supply the device specific options via the graph json attributes.

Any thoughts ?

Thanks, Siva

elvin-n · February 17, 2023, 12:09pm

I would like to avoid introduction of kDLAdreno so far. The examples that you showed can be covered without introduction of compilation flag

climage1d - it is just another memory scope which will be marked in the network, handled in tvm opencl runtime universally, does not belong to Adreno
Recordable queue support - we can verify in tvm opencl runtime part existence of certain opencl extension and go by certain flow with reusing/extending/copying of cuda streaming capture graph executor. We are prototyping this feature
Need more context what happen here and why we need to have interaction between compilation and runtime that would not fit into standard opencl flow

srkreddy1238 · February 17, 2023, 12:56pm

Avoiding new target compilation flag make sense.

1D enablement is not proprietary to Adreno, but not every body want this by default. I think we can define and use target attributes (use_textures) to enable/disable at compilation. Runtime can inspect the device as Adreno and enable them.

Good to hear prototyping of recordable queue. Looking forward to see the PR.