Quantized models and legalization pass

Hi @FrozenGene, @anijain2305
I can confirm that this works :partying_face:! Very good! Now we can implement algorithms like QNNPack and let the tuner try them together! Thanks both guys!

As for the API change, I agree with @FrozenGene that maybe it would be cleaner adding tinfos to the qnn_conv2d_legalize signature.

I have a related question, that I always meant to ask: in conv2d_alter_layout, we don’t execute the function if the current configuration is a Fallback. Do you guys know why? And also, what should the behavior be in legalize? I am referring to this code:

    _, outs = relay.backend.compile_engine.select_implementation(
        relay.op.get("nn.conv2d"), attrs, tinfos, out_type, target
    )
    workload = autotvm.task.get_workload(outs)
    if workload is None:
        # The best implementation is not an AutoTVM template,
        # we then assume it's not necessary to alter this op.
        return None
    cfg = dispatch_ctx.query(target, workload)
    if cfg.is_fallback:  # if is fallback, clear query cache and return None
        autotvm.task.clear_fallback_cache(target, workload)
        return None

    topi_tmpl = workload[0]

In theory, we could gather topi_tmpl directly from the first parameter returned by relay.backend.compile_engine.select_implementation. But instead, in conv2d_alter_layout we query the dispatch_ctx for the current configuration and, if it is a Fallback, we return None. To sum up the follow-up questions are:

  • Why this behavior is there?
  • What should we do in legalize? Simply return back a default legalization?

Thanks once more for your help!