Round-tripping objects through the FFI

Hi,

I’ve been implementing graph transformations in Python and sometimes it is handy to add annotations to nodes. Now, running these through an ExprMutator will give me all new nodes and they’re gone. But actually, they’re gone ealier than that:

x = tvm.relay.var('x', shape=(1,1))
x.my_annotation = 'something'
y = x * tvm.relay.const(2)
assert y.args[0] == x
y.args[0].my_annotation  # AttributeError

The underlying reason is that TVM’s custom FFI doesn’t attach the Python object to the C object but just returns a new Python object pointing to the same C object. I can work around it by keeping a dict originalizer = {o: o for o in all_original_objects} and then do originalizer[y.args[0]], but that seems clumsy.

For reference, the same thing works better in other frameworks:

x = torch.tensor(1.0, requires_grad=True)
x.my_annotation = 'something'
y = x * 2
y.grad_fn.next_functions[0][0].variable.my_annotation  # works!

Should TVM do the same?

Best regards

Thomas

In this particular case, because most of the FFI objects are IR nodes, we want to keep them self-contained. Putting opaque object inside the IR node would also make things related to the IR harder, such as print the code as textformat.

On the other hand, the need to attach meta-data of specific node is valid. Right now we encourage doing so via a Map from the object to the attribute of interest. e.g.

my_annotation[x] = "somthing"
my_annotation[y.args[0]] 

It would be great to hear about your usecase about what are the annotations you want to attach and the particular usecase in mind.

The difficulty with the map is that you need to pass the map around or make it global state. My usecase is the PyTorch frontend (which could be made object-oriented instead of made purely functional to keep track of the map and prelude which is currently kept track of by indirection). Concretely tracking types:

The case is even more delicate as I’m close to (but not quite, probably) wanting checked_type as the attribute.

I see, if we build a complete function then we can attach the map as an attribute of the function. But in your case the usage seems to be the conversion phase. In this case usually map(context) is can be kept as a member of Mutator(if we make most transformations as member function of that mutator) or pass around as a member of context variable like you mentioned

However your example code brings up a great point that we should perhaps shoot a warning when users tries to assign to subclasses of the FFI objects. (perhaps via overloading setattr or add empty slots to all subclasses)

I don’t think guarding against specific use cases is more a hack. In the end, you’ll have many funny effects with this type of logic. If you add something to a set, do you expect the exact object to be in the set afterwards? In the end, one would need to think about whether having the Python interface be Pythonic is a goal.

Thanks @t-vi I agree that keeping the python interface pythonic is one goal we should strive to achieve. One principle there is to cause less surprises.

In the meantime, we do have the other design constraints, that requires us to make sure core data structures are self-contained without opaque fields. For example, we want to be able to serializing the entire IR into a text format, which do not play well with opaque field. Many of the backend passes also need to know all the possible types and field before hand.

So one way to view these IR data structures is to view they as typed immutable python objects with fixed slots, where attaching new field is not permitted(that i why I suggested perhaps we should provide a clear error message when user attempted to do so). Such restricted settings are still valid and pythonic(e.g. in the case of namedtuples and others) while also gives us the properties needed for the features in an IR (self-contained, can be round-trip to a text format).

@tqchen Thank you for providing the rationale here. I have so much to learn.

I don’t quite understand yet why serialization would necessarily preclude you from having a 1-1 relation between C++ and Python objects – deserializing would create a new C++ object so I would not expect that you get back the same Python object. Now I can see how we would not serialize all stuff people attach to it, but I think the much more modest goal of having a better correspondence at runtime would be feasible by linking the Python object to the C++ object if the Python object exists and if it does not setting up the link when we first create a Python object from it.

Thanks @t-vi . To achieve you are describing we might need to use pyobject as primary container and source of reference counting. Notably due to the design constraint of making Object language invariant we cannot do that. The plus side though is that the object can be access in deployment from other languages.

On the comment of 1-1 relation between c++ and python object. If we indeed restrict the object to be immutable after construction (or simply forbid setting to uknown fields), then we can view it as a 1-1 relation. In such cases two python objects a, b with will come with the same handle (they strictly don’t have the same id, but everything else, hash equality and field content all works out). While I understand such restriction loses certain flexibility like attaching opaque information to an object, it does on the other hand improves the clarity and expectation of the IR itself, since we know that there won’t be additional information attached and all the info are clearly in the typed field.

So the current design rationale comes from both: (1) the complexity to support such feature while retaining language invariance property; (2) The potential desire to keep IR objects cleanly typed and immutable while remaining pythonic as possible.

1 Like