BYOC for ARM Ethos-N Fails

Thanks a lot. After these tips, I could run with ethosn and llvm. Rather, after I saw the graph.json from the module.tar, I found there is no computing operator using ethos-n. I think that is because of the model I used, I should change a model to test for my ethos-n. Is that right?

Here is my code which can run successfully.

Here is the model I used now: https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx

Hi, yes it looks like the model you’re using is in NCHW format which is not supported. You could try to convert the model to NHWC by adding desired_layout="NHWC" to the compile function. This should insert layout_transform operations where necessary so that the graph can be offloaded to the NPU.

I have used several different codes for testing. Rather, I found I have never used ethos-n for running. My code is as follows:

from tvm.driver import tvmc
model = tvmc.load('my_model.onnx')
print(model.summary())
package = tvmc.compile(model,target="ethos-n -variant=n78, llvm",dump_code="relay",package_path = "module.tar",desired_layout="NHWC")
result = tvmc.run(package, device="cpu")
print(result.outputs)

The output relay graph is as follow(part):(I found there is no use for ethos-n)

  1. My Ethos-n is at /dev/ethosn0. I found tvmc.run did not run it anymore.

    (I found even I deleted /dev/ethosn0, the runtime executes withour error message)

  2. Everytime I tested for the model, I got the different result.

Apologies for missing this before, it looks like the graph you’re providing is in float32 format. Only int8 and uint8 formats can be offloaded to the NPU, so this explains why no operations are being offloaded

Thanks a lot. I will test for other model now.

I have tested my mobilenet model for cpu and cpu+npu using tvmc, but the result is still failed.

(Not the same between cpu and cpu+npu)

My code is here: https://github.com/guanjen375/EthosN-tvmc

By running run_llvm.sh, you can run the mobilenet model with cpu. (target = “llvm”)

By running run_ethosn.sh, you can run the mobilenet model with cpu and npu. (target = "ethos-n -variant=n78, “llvm”)

The result is also shown when running. You can see two result is different.

How should I do to make it consistent ?

Thanks @guanjen375, I believe I was able to reproduce the mismatch: 232 vs 231, 1 vs 2? This can simply be attributed to differences in rounding behaviour of both the LLVM backend and the NPU integration

What you mean is as follows:

cpu result : [[0 0 231 … 0 0 0]] (source run_llvm.sh)

cpu+npu result : [[0 0 232 … 0 0 0]] (source run_ethosn.sh)

Is that right?

Here is my result:

cpu result : [[0 0 231 … 0 0 0]

npu result : [[0 0 0 … 0 0 0]]

If what is correct above, can you provide the tvm edition you use and npu library edition?

Yes that’s correct, your first example is what I got when running the provided scripts. I’m using the latest main f2a740331f21106787a29566185d8924e5dcb25a and the NPU driver stack version 22.08.

It seems as though something could be incorrect with your setup. Just to confirm, does this occur with other networks/operators as well or just the network you’re trying here (mobilenet v2)? It might be worth trying out just the average pool you tested previously to make sure that gives the expected result.

Thanks a lot. I will check for my setup.

After I follow the tips from here: https://github.com/apache/tvm/issues/13191

I can run using my python file:

I also want to run the model using tvmc with cpu and npu.

Do you know how to modify my code to achieve?

The Original Code

python3 -m tvm.driver.tvmc compile --target=“ethos-n -variant=n78, llvm”

That’s good news, I believe the issue could be due to the TVMC target string not being specific enough for the variant you’re compiling for. If you’re compiling for the 4TOPS_4PLE_RATIO variant, you can change the target string to be ethos-n -variant=n78 -tops=4 -ple_ratio=4, llvm, hopefully that helps :slight_smile:

Thanks a lot. After using the instruction target = “ethos-n -variant=n78 -tops=4 -ple_ratio=4, llvm”

I can run with tvmc or without tvmc on cpu and npu correctly in mobilenet.

I can now run mobilenet and inceptionet with cpu and npu correctly.(test for 1000 pictures)

Nevertheless, when I want to run resnet with cpu and npu, I get failed.

I found when I dispatch fully connected to cpu, the result is correct.

The code at /tvm/python/tvm/relay/op/contrib/ethosn.py

Is there anything computing difference between cpu and npu with fully connected?

If there is not, I think I should check for my ethos-n setup just as before.

Besides, this is my resnet model: resnet50_uint8_tf2.1_20200911_quant.tflite - Google Drive

From the snippet you show, it looks as though a slightly outdated version of TVM is being used, so I suspect that you don’t have this patch which fixes the weight transformation in fully connected: https://github.com/apache/tvm/pull/12970

l have tested with following branch.

Unfortunately, It got failed.

Can I use the newest tvm for alternative?

I have updated my tvm to the newest edition.

However, when I run the model. I got the error message as follows:

Thus, I modified the file /home/sunplus/project/tvm/relay/op/contrib/ethosn.py as follows:

After the modified, I could run the model with cpu and npu successfully.

Resnet can also be run.

Do you have any comment with this modification?

That’s great to hear, with later versions of TVM we don’t officially support the 22.05 (3.0.1) version of the NPU driver stack, hence observing this error. Although, it seems that it works for your use case, so that should be okay for now

I have tested successfully the model mobile net / inceptionet / vgg-16 / resnet / squeeze net with cpu and npu using tvmc. Unfortunately, when I tried to test yolo, I got failed when compiling ( I have also tested for run with cpu only and run successfully. )

Here is my code: https://github.com/guanjen375/tvmc_debug

The error message:

I have already used the newest tvm.

tvm edition: 5364e5a39a5e33728b7f5a26ddb40543a544ea02

Hi @guanjen375, I just gave this a try and can reproduce the issue. It seems as though it is occurring while compiling for qnn.add. Although not perfect, perhaps you could try with qnn.add offloading commented out in the pattern table? The yolo model you have seems a bit different from the one I’ve tested with in the past.

I’ll be away for a few days now but I’ll dig into this when I get back, apologies for the inconvenience.