[VM] An error from Context Analysis pass

masahi · September 5, 2020, 1:58am

I tried to enable VM gpu tests for our pytorch frontend, and I found that VM GPU PR https://github.com/apache/incubator-tvm/pull/6337 broke one of the tests at https://github.com/apache/incubator-tvm/blob/master/tests/python/frontend/pytorch/lstm_test.py#L261

I realized that the test is not being run on CI because it is not picked by pytest (my mistake, the test should be renamed to test_custom_lstm.py). I confirmed that the test passes if I go back to the commit before #6337.

When I run that test locally, I get an assertion error at https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/transform/memory_alloc.py#L102.

  File "/home/masa/projects/dev/tvm/python/tvm/relay/transform/memory_alloc.py", line 102, in get_context
    assert exp in self.context_analysis_map, exp.astext(False)
AssertionError: #[version = "0.0.5"]
free_var %t: Tensor[(2, 4), float32];
%0 = fn (%p0: Tensor[(2, 4), float32], Primitive=1) -> Tensor[(1, 2, 4), float32] {
  expand_dims(%p0, axis=0) /* ty=Tensor[(1, 2, 4), float32] */
};

Looking at the result of context analysis, it seems tensor_expand_dims_float32_2_4 below is not annotated with device type and id. All other functions are correctly annotated and I get the assertion error only for the function wrapping expand_dims above.

...
def @hd[A](%xs1: List[A]) -> A {
  let %x45: A = match? (%xs1) {
    Cons(%x46: A, _) => %x46,
  }<cpu(0)>;
  %x45<cpu(0)>
}

def @tensor_expand_dims_float32_2_4(%x47: static_tensor_float32_2_4_t[]) -> static_tensor_float32_?_2_4_t[] {
  let %x48: static_tensor_float32_?_2_4_t[] = match? (%x47) {
    tensor_constructor_float32_2_4(%t: Tensor[(2, 4), float32]) => {
      %51 = fn (%p018: Tensor[(2, 4), float32], Primitive=1) -> Tensor[(1, 2, 4), float32] {
        expand_dims(%p018, axis=0)
      };
      let %x49: Tensor[(1, 2, 4), float32] = %51(%t);
      let %x50: static_tensor_float32_?_2_4_t[] = tensor_constructor_float32_?_2_4(%x49);
      %x50
    },
  };
  %x48
}

def @tl[A](%xs2: List[A]) -> List[A] {
  let %x51: List[A] = match? (%xs2) {
    Cons(_, %rest: List[A]) => %rest,
  }<cpu(0)>;
  %x51<cpu(0)>
}
...

masahi · September 5, 2020, 2:50am

It seems the problem is due to memoization that kicks in at the line https://github.com/apache/incubator-tvm/blob/master/src/relay/analysis/context_analysis.cc#L619

The unannotated function tensor_expand_dims_float32_2_4 is an argument to the prelude map function. But since map is also used in other earlier place, the second visit to map won’t happen due to memoization. This prevents tensor_expand_dims_float32_2_4 from being visited.

zhiics · September 5, 2020, 5:13pm

@masahi you are right. The memoization made the visitor too restrict. We may want to remove it. I believe it would not affect efficiency much those because we are working on ANF. Interestingly, this was not revealed even in TF MobileNet SSD since we ran it in the CI. The reason why I added it was to avoid infinite loop for mutual recursion.

masahi · September 5, 2020, 10:04pm

Thanks @zhiics for the replay. To be clear, by memoization I meant the memoization done by ExprVisitor. I understand that ContextAnalyzer is doing its own memoization (via visited_ and checked at https://github.com/apache/incubator-tvm/blob/master/src/relay/analysis/context_analysis.cc#L327), But I don’t think this is the cause of my problem.

On the second visit to prelude map (with different closure as argument), ExprVisitor::VisitExpr(func) at https://github.com/apache/incubator-tvm/blob/master/src/relay/analysis/context_analysis.cc#L619 returns immediately without calling any VisitExpr_ override in ContextAnalyzer. This leaves tensor_expand_dims_float32_2_4 being not visited and thus not annotated.

Since ContextAnalyzer is built on top of ExprVisitor, I don’t see a simple way to “remove” memozation. I have a feeling that, to workaround this problem, we may need to restructure how ContextAnalyzer is implemented.

zhiics · September 5, 2020, 10:56pm

@masahi can you try https://github.com/apache/incubator-tvm/pull/6403?

I just spend some time to migrate it to the mixed mode visitor which has a visitor limit. It seems lstm works on gpu

masahi · September 5, 2020, 11:12pm

Thanks, it works!! Also tried gpu run, works too.

I didn’t know what MixedModeVisitor is for, it’s great.

Does it make sense to make visit_limit configurable? This works since map appears just twice in my test, but if it appears more than twice it would break, right?

zhiics · September 5, 2020, 11:16pm

yes, it is configurable from a pass’s pov. I am not sure why many times would be enough though. 2 was trying to avoid mutual recursion. Maybe we can relax it a bit.

masahi · September 5, 2020, 11:25pm

My lstm tests doesn’t involve mutual rec, but there are 2 uses of map below. I can imagine a situation where there are more than 2 maps, in which case we need visit_limit bigger than 2 to allow visiting map as many time as required (assuming my understanding of visit_limit is correct).

  ...
  %45 = @map(tensor_constructor_float32_2_4, %44) /* ty=List[static_tensor_float32_2_4_t[]] */;
  %46 = @tensor_array_stack_float32_2_4(%45) /* ty=static_tensor_float32_?_2_4_t[] */;
  ...

def @tensor_array_stack_float32_2_4(%tensor_array: List[static_tensor_float32_2_4_t[]]) -> static_tensor_float32_?_2_4_t[] {
  %49 = @map(@tensor_expand_dims_float32_2_4, %tensor_array) /* ty=List[static_tensor_float32_?_2_4_t[]] */;
  %50 = @hd(%49) /* ty=static_tensor_float32_?_2_4_t[] */;
  %51 = @tl(%49) /* ty=List[static_tensor_float32_?_2_4_t[]] */;
  @foldl(@tensor_concatenate_float32_?_2_4, %50, %51) /* ty=static_tensor_float32_?_2_4_t[] */
}

zhiics · September 6, 2020, 12:54am

Yeah, it should be fine because only leaf nodes are checked with visit limit (call node is not a leaf node).

zhiics · September 6, 2020, 4:54pm

@masahi I think you concern is right. Maybe I can relax the bound to 100 and set it to be 99 for this pass. This should normally be fine. Otherwise, we can reset the visitor_counter_ when entering a new function. This would allow us visit the nodes any number of times, but it would cause problems for mutual recursion.

@mbrookhart do you think it is okay to relax the limit?

mbrookhart · September 6, 2020, 5:25pm

I haven’t had time to read this fully, I’ll give it a closer look in an hour or two when my kids are napping.

As a brief history, though, I wrote the MixedModeVisitor to provide iterative graph traversals over dataflow regions of the graph to prevent stack overflows. I added the tag for visit_limit>1 to support dead code elimination which needs to visit nodes twice. I wasn’t aware of any passes that required higher redundant visitations at the time, so I set it to something that seemed absurdly high (10). If this is such a pass, I think boosting that limit is fine.

Like I said, I’ll take a closer look at the issue in a couple of hours

mbrookhart · September 6, 2020, 6:43pm

Reading through this a little bit more, I think cranking the visit limit in the MixedModeVisitor for this pass is a valid solution given that the ContextAnalyzer does it’s own memoization on top of the base Visitor.

@masahi, this is probably my bigger concern. I don’t fully understand what he prelude is doing, I haven’t looked into it very much. If we have two map nodes with different arguments, but they are both memoized to the same value, that seems to indicate that we constructed the map node and then altered it’s arguments in place? This violates relay’s assumptions of a functional IR with constant nodes after construction.

What exactly is this map object, and how is it non-functionally inserted into the Relay IR? If it’s a function operating on inputs (which makes sense given the name and the apparent use), is there a particular reason it’s not implemented as an Op wrapped in a Function that we Call?

Thanks! Matthew

masahi · September 6, 2020, 7:50pm

@mbrookhart

map is just like any other Relay functions, defined in prelude:

def @map[A, B](%f: fn (A) -> B, %xs: List[A]) -> List[B] {
  match (%xs) {
    Cons(%x: A, %rest: List[A]) => %0 = %f(%x) /* ty=B */;
    %1 = @map(%f, %rest) /* ty=List[B] */;
    Cons(%0, %1) /* ty=List[B] */,
    Nil => Nil /* ty=List[B] */,
  }
}

The LSTM test, which caused the error from Context Analysis pass, has two use of map:

@map(tensor_constructor_float32_2_4, %40)
@map(@tensor_expand_dims_float32_2_4, %tensor_array)

Both function arguments to map need to be visited and thus annotated (This is a requirement of Context Analysis pass). The visit to the function argument to map seems to happen during the visit to map's body (when ExprVisitor finds the use of a variable %f, the first argument of map), and this is the only chance the function argument to map can be annotated.

The problem is, the body of map is memoized on the first visit to map. So the second visit to map won’t happen, leaving the function argument to the second map call, @tensor_expand_dims_float32_2_4, not being visited and thus not annotated.

Hope this makes sense.

masahi · September 6, 2020, 8:13pm

I do wonder though why the visit to arguments of map won’t happen when we visit arguments of a CallNode corresponding to the call to map, https://github.com/apache/incubator-tvm/blob/master/src/relay/analysis/context_analysis.cc#L591.

It seems ExprVisitor::VisitExpr(arg) there finds @tensor_expand_dims_float32_2_4 but it immediately returns, without looking inside the function body.

mbrookhart · September 6, 2020, 8:18pm

There’s something missing here. If that actually is a Call(op=map, args), where map is a Function, the visitor will visit op (and it’s body, and it’s parameters) independently of the Call args so don’t see why the args would be skipped…

mbrookhart · September 6, 2020, 8:19pm

you responded with my confusion while I was typing. I’ll try to figure out what’s going on.

zhiics · September 6, 2020, 8:33pm

Yeah, they are just like call backs there. It is invoked in the map. So we don’t really know the args at that moment.

mbrookhart · September 7, 2020, 1:25am

This would seem to indicate that @tensor_expand_dims_float32_2_4 has already been visited in another part of the pass, is it referred to elsewhere in your model?

Anyway, I’m reading this pass more carefully, I’m probably missing something about how this pass traverses the IR.

I do think the increase to visit_limit with MixedModeVisitor is a valid work-around.

masahi · September 7, 2020, 2:14am

No, it is only used once.

mbrookhart · September 7, 2020, 2:19am

I think I see the problem.

@tensor_expand_dims_float32_2_4 is an argument to map, so it gets visited as an argument here:

github.com

apache/incubator-tvm/blob/3004285bebdfc0fc799b9278ff3ea763d41fe121/src/relay/analysis/context_analysis.cc#L591


auto device = DeviceFor(GetRef<Call>(call));
CHECK(mod_.defined()) << "Cannot analyze context on a globalvar without module";
GlobalVar gv = Downcast<GlobalVar>(call->op);
auto func = Downcast<Function>(mod_->Lookup(gv));
CHECK_EQ(call->args.size(), func->params.size())
    << "The number of arguments doesn't match the number of parameters of the function.";

for (size_t i = 0; i < call->args.size(); i++) {
  Expr arg = call->args[i];
  Expr param = func->params[i];
  ExprVisitor::VisitExpr(arg);

  // Save the the arg to function mapping for closures as it will
  // be invoked/unified later.
  CHECK(arg->checked_type().defined())
      << "Type inference is required to run the context analysis passes.";
  if (arg->checked_type()->IsInstance<FuncTypeNode>()) {
    auto it = closures_.find(arg);
    if (it != closures_.end()) {
      closures_[param] = it->second;
    } else {

Unfortunately, at that point, it’s just a GlobalVar, we’re not calling it, so the Call expansion doesn’t kick in. Thus, we just get

  void VisitExpr_(const GlobalVarNode* gvn) final { DeviceFor(GetRef<GlobalVar>(gvn)); }

Which doesn’t properly unify the device, but does memoize the result for that Function.

Later, we attempt to traverse the body of the map, which would see a call(GlobalVar…) and would properly unify the devices, but things have been memoized away, so

Unless there’s a strong reason to be visiting the arguments before the body, I think you can fix this just by visiting the body of the function before the args.

Unfortunately, doing so causes a failure in this check in a way I don’t understand at the moment and refuse to debug on a holiday weekend

github.com

apache/incubator-tvm/blob/3004285bebdfc0fc799b9278ff3ea763d41fe121/src/relay/analysis/context_analysis.cc#L602



  // Save the the arg to function mapping for closures as it will
  // be invoked/unified later.
  CHECK(arg->checked_type().defined())
      << "Type inference is required to run the context analysis passes.";
  if (arg->checked_type()->IsInstance<FuncTypeNode>()) {
    auto it = closures_.find(arg);
    if (it != closures_.end()) {
      closures_[param] = it->second;
    } else {
      CHECK(arg->IsInstance<GlobalVarNode>());
      closures_[param] = Downcast<GlobalVar>(arg);
    }
  }
  Unify(DeviceFor(arg), DeviceFor(param));
}
device = Unify(device, DeviceFor(call->op));
device = Unify(device, DeviceFor(func));
device = Unify(device, DeviceFor(func->body));

// Step into the callee. We need to skip recursive calls, otherwise, it

@masahi @zhiics I the error it’s strictly a result of the fact that Map is a higher order function and we’re dealing with GlobalVars. The number of maps in the model shouldn’t matter, but the depth of references to GlobalVars in higher order functions will. We can probably get away with a visit_limit ~10 if we want to use that workaround, I don’t think there’s a need to head to 100.