[BYOC] How backwards compatible does the TensorRT partition_for_tensorrt function need to be?

Hi all, I’m looking to do to the TensorRT BYOC integration what I just did for CUTLASS, namely make sure all compilation configuration is captured within a “tensorrt” Target instance rather than the current combination of PassContext and environment variables. This helps Collage, both because the overall configuration is just a list-of-Targets, and for some infrastructure issues internal to us in OctoML.

(I’ll also switch TensoRT to be IRModule-at-a-time instead of function-at-a-time, however since TensorRT engines can only have one entry point this won’t have any performance or sharing benefits, it will just be an internal engineering cleanup.)

Just want to check if there are any existing users of the partition_for_tensorrt function and how sensitive I should be to maintaining backwards compatibility?

Given we’ve broken large parts of the integration at various stages over the last few months I suspect this is not being actively used, but please give me a shout otherwise.

Best, -m

cc @comaniac @Laurawly

Thanks for asking. Some AWS teams may have some dependencies. I’ll inform them to comment on this thread if they have any concern.

Thanks Cody. It looks like the API will be:

  trt_target = tvm.target.Target("tensorrt -use_fp16=True -implicit_batch_mode=False")
  mod = partition_for_tensorrt(mod, params=params, target=trt_target)
  exe = vm.compile(mod, target=["cuda", trt_target], params=params)

(and similarly for the other build APIs).

If the default TRT options are good then there’s no need for any additional targets at all:

  mod = partition_for_tensorrt(mod, params=params)
  exe = vm.compile(mod, target="cuda", params=params)

Thanks @comaniac.

The compilation phase changes looks good. Could you maintain the runtime environment variable compatibility otherwise it will impact our existing users.

Good to know, will do, thanks.