Model parallism of inference for large model like GPT2 with TVM

hjiang · April 27, 2021, 8:08pm

this topic is very interesting, currently we have a pending RFC/PR([RFC] Compute graph pipeline with new subgraph executor) related Model parallelism , it not designed for model papalism but it do some of the work what model parallelism ask like horizon split model, pipeline execution, reduce memory requirement, cross device memory movement, with tvm RPC help, the device/target also can be distributed.

I think it should can help for large model deploy, after just by pass communication operator.