InternalError: Check failed: (op && result == CL_SUCCESS) is false: Pad Error:-30 running CLML model

@srkreddy1238 I’m trying to run a tflite model with NHWC layout, converting it to NCHW supported by CLML in TVM and converted to tvm.so running on Qualcomm device S24 ultra. I’m getting following error:

InternalError: Check failed: (op && result == CL_SUCCESS) is false: Pad Error:-30

from here:

result = CLML_INTF->clCreateMLOpPadQCOM(CLML_CTX, nullptr, &pad_desc, input->tensor,
                                            output->tensor, &op, layer_.tuning_cache);
ICHECK(op && result == CL_SUCCESS) << "Pad Error:" << result;

Now if I offload all the padding layers in the model to cpu or opencl, it runs fine. But gives above error including padding layers. Padding layers used are ZeroPadding2D.

Below is the code used to generate tvm.so and running the model.

import tvm
from tvm import relay, transform
from tvm.contrib import utils, ndk, graph_runtime as runtime

mod, params = relay.frontend.from_tflite(
    tflite_model, shape_dict={input_tensor: input_shape}, dtype_dict={input_tensor: input_dtype}
)
with tvm.transform.PassContext(opt_level=3):
#    mod = seq(mod)
    print("----------ir after layout change------------")
    print(mod)
    print("----------------------")

    if not local_demo and enable_clml:
        print("partition clml")
        print(clml.is_clml_runtime_enabled())
        mod = clml.preprocess_module(mod)
        mod = clml.partition_for_clml(mod, params)

    print("-------------------After PArtition-------------")
    print(mod)
    print("-----------------------")
    target = tvm.target.Target(test_target, host=target)

    #mod = seq(mod)
    lib = relay.build(mod, target=target, params=params)
lib_fname = "dummy_model.tvm.so"
print(ndk)
print (ndk.create_shared)
fcompile = ndk.create_shared if run_on_device else None
lib.export_library(lib_fname, fcompile)

import tvm
import numpy as np
from tvm import te
from tvm.contrib import graph_executor as runtime

ctx = remote.cl(0)
#ctx = remote.cpu(0)

# Transfer the model lib to remote device
remote.upload(lib_fname)
# Load the remote module
rlib = remote.load_module(lib_fname)

# Create a runtime executor module
module = runtime.GraphModule(rlib["default"](ctx))

# Run
module.run()

# Benchmark the performance

ftime = module.module.time_evaluator("run", ctx, number=1, repeat=10)
prof_res = np.array(ftime().results) * 1000
print("Mean inference time (std dev): %.2f ms (%.2f ms)" % (np.mean(prof_res), np.std(prof_res)))

Do you think you can share CLML Codegen output ?

Check this ref about how to dump clml codegen.

Also, let me know disabling pad offload works fine ? You may comment below line.

Yes, disabling pad offload works fine.