Can somebody shed some lights on that whether it’s technical feasible to do model parallism of inference for large model on TVM?
From my personal perspective, followings field should be investigated:
- Whether the computation graph (torch.jit.trace) can capture communication operator like allgather?
- Need TVM support allgather op for specific device. (need TVM integrate with underlying communication library like MPI, nccl)
- Need launcher provied by original framework (like pytorch) so that it can provide communication context to each communicator (process). Anything else?