[Frontend] Preserving Pytorch Spans

navya-mehta · November 25, 2023, 10:05pm

Hey all, wanted to discuss a new feature proposal with the community. I’ve not contributed to TVM before so please let me know if there is a strict process to follow. When loading a traced pytorch model using tvm.relay.frontend.from_pytorch, TVM auto assigns span names since 0.14.0 based on the operator (like aten::conv2d...). However, for a large number of applications, preserving the true Pytorch scope name is important.

Consider a BERT model being imported, we may not want the a dense operator to be renamed to 'aten::dense…` but actually preserve the true “scope name” in Pytorch “bert.encoder.layer.11.output.dense”. Scope name preservation is important for some TVM projects if the users wish to specify additional configs, quantization information etc. that is all tied to the names / layers they see in the original pytorch input which gets lost in conversion today.

Our company has a fork of TVM where we already have a fix for this that I’d love to bring into open-source TVM. Basically, with tvm.relay.frontend.from_pytorch, we can accept an additional boolean called preserve_pytorch_scope_names which has default False (aka current behavior) but if True, we can preserve the scopes. The change required is something as simple as:

Current Code:

def _rename_outputs(node, source_map, op_type_dict, use_parser_friendly_name):
    """Rewrite debug name of node outputs with its operator type"""

    def _get_source_name(op_type):
        op_idx = 0
        if op_type in op_type_dict:
            op_idx = op_type_dict[op_type] + 1
        op_type_dict[op_type] = op_idx
        return "_".join([op_type, str(op_idx)])

    # get source name of operator and rename all of its outputs
    # e.g. node.kind(): aten::adaptive_max_pool2d
    # node_src_name -> aten::adaptive_max_pool2d_x
    # output_1 -> aten::adaptive_max_pool2d_x_0
    # output_2 -> aten::adaptive_max_pool2d_x_1
    if node.kind() != "prim::GetAttr":
        node_src_name = _get_source_name(node.kind())
        for index, output in enumerate(node.outputs()):
            output.setDebugName("_".join([node_src_name, str(index)]))
        # update source map
        # if use_parser_friendly_name is True: e.g. prim::Constant_0 -> prim__Constant_0
        if use_parser_friendly_name:
            node_src_name = re.sub(r":|\.", "_", node_src_name)
        source_map[node] = node_src_name

New Code to preserve Pytorch Scopes:

def _rename_outputs(node, source_map, op_type_dict, use_parser_friendly_name):
    """Rewrite debug name of node outputs with its operator type"""

    def _get_source_name(op_type):
        op_idx = 0
        if op_type in op_type_dict:
            op_idx = op_type_dict[op_type] + 1
        op_type_dict[op_type] = op_idx
        return "_".join([op_type, str(op_idx)])

    if node.kind() != "prim::GetAttr":
        node_src_name = node.scopeName().split("/")[-1]
        if node_src_name.startswith("__module."):
            node_src_name = node_src_name[len("__module.") :]
        for index, output in enumerate(node.outputs()):
            output.setDebugName("_".join([_get_source_name(node_src_name), str(index)]))
        # update source map
        # if use_parser_friendly_name is True: e.g. prim::Constant_0 -> prim__Constant_0
        if use_parser_friendly_name:
            node_src_name = re.sub(r":|\.", "_", node_src_name)
        source_map[node] = node_src_name

If this is something of interest to the community, I can clean this up and make a formal pull request to the TVM project. Thanks!

tqchen · November 27, 2023, 2:36pm

seems like something can be useful.

MJKlaiber · November 27, 2023, 2:46pm

Seems like a good idea!

navya-mehta · November 27, 2023, 7:26pm

Whipped up a RFC using the provided template.

Feature Name: pytorch_span_preservation
Start Date: 2023-11-27
RFC PR: N/A
GitHub Issue: N/A

Summary

Provide an option to preserve the raw Pytorch scope name when importing a model into TVM.

Motivation

A Pytorch model usually has a “scope name” based structure. Consider a model that looks like:

BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
	...
        (11): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (qa_outputs): Linear(in_features=768, out_features=2, bias=True)
)

This is a BERT QuestionAnswering model. When I import this into TVM with tvm.relay.frontend.from_pytorch, all the layers would get new names like aten::dense2, aten::conv2d … etc. However, some applications would be interested in preserving the raw scope name like “bert.encoder.11.output.dense” when naming the layer + naming the weights/bias entries etc.

This is primarily because projects may be accepting a pytorch model + other data from a user (like quantization information, intermediate vectors etc.) that would be tied to the name that the user sees in the pytorch model but those names get lost upon import.

Reference-level explanation

Facing the user there is simply a new boolean called preserve_pytorch_scopes (default value False and so preserves current behavior) on the tvm.relay.frontend.from_pytorch function.

If the boolean is False, we name the layers + weights/bias using a counter on the node “kind” (conv2d, dense etc.) as done right now.

If the boolean is True, we read the scope names from the graph and use that for naming.

Internally, we build a NodeNamer abstract class and have two children that inherit - the default namer and the pytorch scope namer that expose 2 functions to do the naming for each node.

All changes here are fully contained to the tvm/relay/frontend/pytorch.py file.

Perhaps better, they are fully contained to the _rename_outputs function inside that file

The real corner case is that 2 dense layers have the same scope name so we run into a conflict when naming the weight/bias entries etc. but this is something Pytorch natively prevents (the scope naming is done via a dictionary so you can’t have 2 keys with the same name). Basically, this is a “hidden invariant”.

Drawbacks

We could make this scope-based naming the default since preserving the current default (brand new names) gives us no foreseeable advantage over using Pytorch’s names.

Rationale and alternatives

The design chosen is very less intrusive with a small impact radius, impacting only the renaming utilities. Thus, naming is very highly encapsulated.

MJKlaiber · November 27, 2023, 7:32pm

CC @masahi @tqchen @siju-samuel @t-vi, we’d like to get you thoughts on this RFC.

masahi · November 27, 2023, 9:27pm

I’m +1 for making this naming convention by default. If I remember correctly, the current span name convention was a relatively new feature from https://github.com/apache/tvm/pull/14050, so I don’t think many people depend on it.

cc @chunit

navya-mehta · November 27, 2023, 10:19pm

Got a MR open for this at https://github.com/apache/tvm/pull/16171

chunit · November 28, 2023, 2:15am

Hi all,

That’s a good idea! In fact we are using similar strategy internally. Like a combination of DefaultNodeKindNamer and PytorchScopePreservingNamer in here.

When converting PyTorch IR to TVM Relay, that’s what we did:

For those node with scope_name, we set span as scope_name,
For those node without scope_name, we rename it to _

This solves the issue mentioned (pytorch model + other data from a user (like quantization information, intermediate vectors etc.) that would be tied to the name that the user sees in the pytorch model but those names get lost upon import) while keeping benefit for mapping a Relay IR back to PyTorch IR

Back to this new implementation, with the controlling flag it looks good to me. If you are interested we will post more details of our solution in here. Thank you!

MJKlaiber · November 28, 2023, 1:12pm

@chunit, that would be interesting

chunit · November 30, 2023, 7:38am

@MJKlaiber,

No problem, we will make another PR for it and post in this thread once we finished. It might take 1~2 weeks perhaps, so please go ahead on the discussion/merging process of the original PR.