[MetaSchedule] [TensorCore]Please help check out this error

MasterJianxing · April 10, 2023, 5:22am

I try to use tensorcore to tune a network. To use tensorcore, I set datatype as “float16”, and I find this error. Besides when I set datatype as “float32”, it runs normally.

Traceback (most recent call last): File “resnet_meta.py”, line 58, in database = ms.tune.tune_tasks( File “/home/pan/tvm/python/tvm/meta_schedule/tune.py”, line 117, in tune_tasks task_scheduler.tune( File “/home/pan/tvm/python/tvm/meta_schedule/task_scheduler/task_scheduler.py”, line 132, in tune _ffi_api.TaskSchedulerTune( # type: ignore # pylint: disable=no-member File “/home/pan/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 237, in call raise get_last_ffi_error() tvm.tir.schedule.schedule.ScheduleError: Traceback (most recent call last):

ScheduleError: An error occurred in the schedule primitive ‘compute-at’ … Error message: The scope tir.Block#0 is not a stage pipeline.

I imitate the testing file to write my resnet metaschedule tune file, here is the code:

mod, params = testing.resnet.get_workload( num_layers=50, batch_size=batch_size, image_shape=image_shape, dtype=“float16” )

tune_tasks = ms.relay_integration.extract_tasks(mod, tgt, params)

tasks, task_weights = ms.relay_integration.extracted_tasks_to_tune_contexts( extracted_tasks=tune_tasks, work_dir=work_dir, space=ms.space_generator.PostOrderApply( sch_rules=“cuda-tensorcore”, postprocs=“cuda-tensorcore”, mutator_probs=“cuda-tensorcore”))

database = ms.tune.tune_tasks( tasks=tasks, task_weights=task_weights, work_dir=work_dir, max_trials_per_task=4, max_trials_global=150, )

Please help me check out why this error happens

Many thanks.

junrushao · April 10, 2023, 7:24am

Could you please share the TIR you are tuning?

MasterJianxing · April 10, 2023, 7:59am

Here it is. Please check @junrushao

Besides, here is the original file

github.com/MasterJianxing/metaschedule

resnet_meta.py

main

import numpy as np
import pytest
import tvm
#import tvm.testing
from tvm.contrib import graph_executor
from tvm.relay import testing
from tvm import meta_schedule as ms
from tvm import relay, auto_scheduler
from tvm.meta_schedule.testing import relay_workload
from tvm.meta_schedule.testing.tlcbench import load_quantized_bert_base
from tvm.tir.tensor_intrin import *

batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)
dtype = "float16"

mod, params = testing.resnet.get_workload(

This file has been truncated. show original

junrushao · April 10, 2023, 9:24pm

I have personally switched to Relax in the unity branch, and my bandwidth is very limited to supporting Relay at this moment, so i cannot guarantee if Relay works or not.

in your specific case, it seems that the stage pipeline assumption is broken. would you like to attach the log file to this workload so that we could investigate without using Relay?

MasterJianxing · April 11, 2023, 5:04am

Here are the tuning logs and print logs, I just upload them.

Thanks for your patience! Please check

MasterJianxing · April 13, 2023, 2:23am

@junrushao Log files are in tune_tmp and log.log, Could you mind having a look?

Many thanks

JackWw · July 26, 2023, 6:32am

Do you solve this problem? i meet the similar problem when tuning fused_matmul_add

Krishna · December 4, 2023, 9:11am

Hello @MasterJianxing ,

I ran the resnet_meta.py code you had shared on my system and I didnt seem to get the same error as you, but I received this error:

 InternalError: Check failed: original_producers.size() == 1u (0 vs. 1) :

The full diagnostic is given below:

Can you please take a look and help as to why this InteralError is occurring? Please help.

Thanks and reagrds,

Krishna

Krishna · March 1, 2024, 6:38am

Hi @MasterJianxing Were you able to fix this? I read about a somehwat similar issue here, please take a look. Thanks.

@zxybazh @junrushao @AndrewZhaoLuo @comaniac Please help.

MasterJianxing · March 1, 2024, 7:31am

It’s too long time… But now i can use metaschedule to tune BERT with latest tvm version. You can update and have a try

Krishna · March 1, 2024, 9:44am

Okay, thank you for the response. Did you tune BERT in fp-16 by targeting cuda-tensorcores?

Krishna · March 4, 2024, 6:46am

Hi, Sorry to bother you, can you please share the metascheduling part of your code? I just wanted to know how you got Metacsheduling to work with BERT. It would greatly help me in my current work. Thanks in advance.

MasterJianxing · March 8, 2024, 11:57am

github.com

MasterJianxing/metaschedule/blob/main/transformer.py

import numpy as np
import pytest
import tvm
import onnx
from tvm.contrib import graph_executor
from tvm import meta_schedule as ms
from tvm import relay, auto_scheduler
from tvm.meta_schedule.testing import relay_workload
from tvm.meta_schedule.testing.tlcbench import load_quantized_bert_base
from tvm.tir.tensor_intrin.cuda import *
from tvm.tir.tensor_intrin.arm_cpu import DP4A_INTRIN
from tvm.tir.tensor_intrin.rocm import AMDGPU_SDOT4_INTRIN
from tvm.tir.tensor_intrin.x86 import VNNI_DOT_16x4_INTRIN as VNNI_INTRIN

@tvm.testing.requires_gpu
@pytest.mark.skip("Slow on CI")
@pytest.mark.parametrize(
    ["model_name", "input_shape"],
    [("bert_base", (8, 128)), ("resnet_18", (16, 3, 224, 224)), ("resnet_50", (16, 3, 224, 224))],
)

This file has been truncated. show original

Krishna · March 18, 2024, 11:34am

Hi @MasterJianxing ,

Thank you so much for sharing the code. I ran the transformer.py file for the BERT model and I am currently facing this RuntimeError in thread bindings. Error :

RuntimeError: parallel_for_dynamic error with [16:48:39] /home/name/tvm/src/tir/transforms/unify_thread_binding.cc:112: Check failed: (ana.CanProveEqual(dom->extent, new_iter_var->dom->extent)) is false: ValueError: All loops that are bound to ``threadIdx.y`` should have the same extent. However, there are two loops with extent T.int64(6) and T.int64(2), which are not equal

What am I doing wrong here? I have not made any modifications to your code, kindly let me know what I am missing here and how I can rectify it.

TIA

Regards, Krishna

SharynHu · May 14, 2024, 8:36am

hello, I got into the same problem, have you figured it out?

SharynHu · May 14, 2024, 9:05am

hello, I got into the same problem, have you figured it out?

Krishna · May 15, 2024, 10:27am

Hi, I havent figured it out yet, Its been quite some time since I picked it up. I am still waiting for a response from the author on this

SharynHu · May 16, 2024, 2:45am

Thank you for your response. Have you raised any issue on this?

Krishna · May 16, 2024, 4:33am

Hi, I have not raised an issue. My use case here is with resnet and not BERT in itself, actually.

The Broken Stage Pipeline error goes away when we change the Batch size to 16 or 32. Currently unsure why the batch size change fixes it, but it does anyway.

I tried metascheduling the resnet workloads as well as the resnet50 ONNX model on my GPU. The usage of the MixedPrecisionPass() along with Batch size = 16 results in the following error : "Block no longer exists in IRModule

I troubleshot this, and found this Bug issue thread from @zxybazh : [Bug] Tensorization Failure During Multilevel Tiling with Tensor Intrin · Issue #16614 · apache/tvm · GitHub

TL;DR → The issue says Multilevel Tiling is not supported by metascheduler. This still needs to be resolved, apparently. Will post here if this gains any traction.

Regards,

Krishna

SharynHu · May 16, 2024, 5:43am

Thank you again. Actually i tried tensorizing resnet50 and it’s fine with a batchsize of 16. However i come accross the runtime error

Check failed: (ana.CanProveEqual(dom->extent, new_iter_var->dom->extent)) is false: ValueError: All loops that are bound to ``threadIdx.y`` should have the same extent. However, there are two loops with extent T.int64(6) and T.int64(2), which are not equal

which you mentioned before. Meta-schedule have restrictions on the input size for convolutions when it comes to computing with tensorcores. So maybe it is of this reason.