[RFC] Relay Containers Array/Map/String

@tqchen @MarisaKirisame @jroesch

Would you like to enhance the relay type system, adding types like string, array and map?

Yes, that will help convert torchscript model and integrate more business to tvm stack. we will need to discuss the best way to support them

I can understand the motivation of adding string help for NLP-related models. Just some curious questions:

  1. While embedding lookup can be implemented using a map lookup, TVM does not provide the functionality of tokenization, does it imply that tokenization is done outside but the lookup has to be done inside TVM module? Is there any specific reason that it must be done inside TVM module?
  2. TVM String is using “char” as storage, and is not aimed for unicode handling. Does it imply that right now we only focus on ASCII string?
  3. What is the plan for array type and map type in Relay? Do you introduce an ADT, or expand the type system to support arrays and maps, for example, support type inference with arrays and maps? Are types inside arrays and maps homogeneous?
  4. How does it lower to TIR? Into an intrinsic call that invokes array indexing and map lookup?
  5. Map is not inside the runtime right now, due to concerns of the binary size, but to use map in generated code, it seems that we need to move them into runtime. Is that correct?
1 Like

First of all, thank you very much for your question, in my opinion:

  1. Tokenizer and lookup can be implemented within the tvm through adding tokenizer op or substring operations. this is similar to torchscript. All nlp business placed inside the tvm mainly to consider the convenience of deployment and application promotion.

  2. wo now fucus on utf8 string.

  3. Relay has implemented List by ADT. However, we intend to expand the type because there may be performance issues with the adt list. The item types of arrays and maps are homogeneous which already meets the business needs.

  4. we plan to add some relay vm instructions to support the operation on the string, array and map container.

  5. yes,need map in runtime.

The above is still in the discussion stage, Hope to get more support and feedback from the community, thank you!

1 Like

Thank you for your quick response. It certainly makes sense to me!

1 Like

wo now fucus on utf8 string.

I don’t follow. Are you planning on expanding support to include utf8 as well?

yes, need map in runtime.

Runtime binary size implications are increasingly important to eg uTVM. CC @areusch . Could this support be optionally compiled into the runtime? Though that might get confusing when models run in some runtimes but not others. Is there anyway for a runtime to advertise what capabilities it supports and then validate models against their needed set of capabilities?

1 Like

Yeah I agree the point for runtime binary size, and that is the reason why I finally decide not to put tvm::Map into runtime before. In the case of uTVM, it implements its own pure C runtime IIRC @areusch, so won’t be affected by this proposal.

But if TVM’s various runtime capabilities start to diverge, isn’t that a bad thing for TVM user experience?

Good point, I agree! Another approach might be having a compilation flag to move related functionalities in/out of the runtime, for example, a CMake flag “USE_MAP_IN_RUNTIME”, and if we need minimal binary size, we can turn the flag off

1 Like

Good point, FYI :airplane:

Maybe we can refer to lua ext lutf8 and provide some utf8 functions to relay?

follow @junrushao proposal

Thanks you and best regards !

UTF-8 string operations are highly non-trivial. If no other operations are needed besides look-up, a better way might be that we treat them as an opaque buffer.

1 Like

I modify the two files,but it didn’t work,it still have the problem, I think I will crazy

opaque buffer is also ok to me.

the pre-problem,I can not reply,I have only three times,do you have mail address,I need you help ,I use the method,modify two file,but it does not work,I think I will crazy,I try lots of methods,but all not succeed

Maybe we can also add a UnicodeString type and then add two conversion functions:

  1. encode(str: UnicodeString) -> String

  2. decode(str: String) -> UnicodeString

Thanks for the proposal. There are a few things that we need to figure out besides the types themselves:

  • P0: mutability of the container
    • Right now all the containers are immutable, which also makes the relay analysis simpler. My understanding is that immutable containers are good for our usecase(as we mainly do lookup). but please confirm if that is the case
  • P1: the type signature of Type themselves, there are a few alternatives. Having a concrete Type for Array/Map/String is certainly one possibility. But we will need to confirm that we won’t add additional times besides these ones. and additional ones should be added through ADT
1 Like

Thank you very much for your reply. I need to think about your two questions again.

We have created a new project to support this demand. Details: Acknowledgement