[MetaSchedule] [TensorCore]Please help check out this error

I try to use tensorcore to tune a network. To use tensorcore, I set datatype as “float16”, and I find this error. Besides when I set datatype as “float32”, it runs normally.

Traceback (most recent call last): File “resnet_meta.py”, line 58, in database = ms.tune.tune_tasks( File “/home/pan/tvm/python/tvm/meta_schedule/tune.py”, line 117, in tune_tasks task_scheduler.tune( File “/home/pan/tvm/python/tvm/meta_schedule/task_scheduler/task_scheduler.py”, line 132, in tune _ffi_api.TaskSchedulerTune( # type: ignore # pylint: disable=no-member File “/home/pan/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 237, in call raise get_last_ffi_error() tvm.tir.schedule.schedule.ScheduleError: Traceback (most recent call last):

ScheduleError: An error occurred in the schedule primitive ‘compute-at’ … Error message: The scope tir.Block#0 is not a stage pipeline.

I imitate the testing file to write my resnet metaschedule tune file, here is the code:

mod, params = testing.resnet.get_workload( num_layers=50, batch_size=batch_size, image_shape=image_shape, dtype=“float16” )

tune_tasks = ms.relay_integration.extract_tasks(mod, tgt, params)

tasks, task_weights = ms.relay_integration.extracted_tasks_to_tune_contexts( extracted_tasks=tune_tasks, work_dir=work_dir, space=ms.space_generator.PostOrderApply( sch_rules=“cuda-tensorcore”, postprocs=“cuda-tensorcore”, mutator_probs=“cuda-tensorcore”))

database = ms.tune.tune_tasks( tasks=tasks, task_weights=task_weights, work_dir=work_dir, max_trials_per_task=4, max_trials_global=150, )

Please help me check out why this error happens

Many thanks.

1 Like

Could you please share the TIR you are tuning?

Here it is. Please check @junrushao

Besides, here is the original file

I have personally switched to Relax in the unity branch, and my bandwidth is very limited to supporting Relay at this moment, so i cannot guarantee if Relay works or not.

in your specific case, it seems that the stage pipeline assumption is broken. would you like to attach the log file to this workload so that we could investigate without using Relay?

Here are the tuning logs and print logs, I just upload them.

Thanks for your patience! Please check

@junrushao Log files are in tune_tmp and log.log, Could you mind having a look?

Many thanks

Do you solve this problem? i meet the similar problem when tuning fused_matmul_add

1 Like

Hello @MasterJianxing ,

I ran the resnet_meta.py code you had shared on my system and I didnt seem to get the same error as you, but I received this error:

 InternalError: Check failed: original_producers.size() == 1u (0 vs. 1) :

The full diagnostic is given below:

Can you please take a look and help as to why this InteralError is occurring? Please help.

Thanks and reagrds,

Krishna

Hi @MasterJianxing Were you able to fix this? I read about a somehwat similar issue here, please take a look. Thanks.

@zxybazh @junrushao @AndrewZhaoLuo @comaniac Please help.

It’s too long time… But now i can use metaschedule to tune BERT with latest tvm version. You can update and have a try

1 Like

Okay, thank you for the response. Did you tune BERT in fp-16 by targeting cuda-tensorcores?

Hi, Sorry to bother you, can you please share the metascheduling part of your code? I just wanted to know how you got Metacsheduling to work with BERT. It would greatly help me in my current work. Thanks in advance.

1 Like

Hi @MasterJianxing ,

Thank you so much for sharing the code. I ran the transformer.py file for the BERT model and I am currently facing this RuntimeError in thread bindings. Error :

RuntimeError: parallel_for_dynamic error with [16:48:39] /home/name/tvm/src/tir/transforms/unify_thread_binding.cc:112: Check failed: (ana.CanProveEqual(dom->extent, new_iter_var->dom->extent)) is false: ValueError: All loops that are bound to ``threadIdx.y`` should have the same extent. However, there are two loops with extent T.int64(6) and T.int64(2), which are not equal

What am I doing wrong here? I have not made any modifications to your code, kindly let me know what I am missing here and how I can rectify it.

TIA

Regards, Krishna

hello, I got into the same problem, have you figured it out?

hello, I got into the same problem, have you figured it out?

Hi, I havent figured it out yet, Its been quite some time since I picked it up. I am still waiting for a response from the author on this

Thank you for your response. Have you raised any issue on this?

Hi, I have not raised an issue. My use case here is with resnet and not BERT in itself, actually.

The Broken Stage Pipeline error goes away when we change the Batch size to 16 or 32. Currently unsure why the batch size change fixes it, but it does anyway.

I tried metascheduling the resnet workloads as well as the resnet50 ONNX model on my GPU. The usage of the MixedPrecisionPass() along with Batch size = 16 results in the following error : "Block no longer exists in IRModule

I troubleshot this, and found this Bug issue thread from @zxybazh : [Bug] Tensorization Failure During Multilevel Tiling with Tensor Intrin · Issue #16614 · apache/tvm · GitHub

TL;DR → The issue says Multilevel Tiling is not supported by metascheduler. This still needs to be resolved, apparently. Will post here if this gains any traction.

Regards,

Krishna

Thank you again. Actually i tried tensorizing resnet50 and it’s fine with a batchsize of 16. However i come accross the runtime error

Check failed: (ana.CanProveEqual(dom->extent, new_iter_var->dom->extent)) is false: ValueError: All loops that are bound to ``threadIdx.y`` should have the same extent. However, there are two loops with extent T.int64(6) and T.int64(2), which are not equal

which you mentioned before. Meta-schedule have restrictions on the input size for convolutions when it comes to computing with tensorcores. So maybe it is of this reason.