[RFC][VTA] Support for Cloud Devices (OpenCL-compatible)

Hi,

I think the runtime support here (https://github.com/apache/incubator-tvm/pull/3554) is for uop and instructions sync via PCIe. However, if we want to run a full network (e.g., Resnet), we’re still missing layer-wise synchronization/device_copy if two adjacent layers are resident in different devices.

For example, in the above figure, we have to auto-insert a device_copy op between maxpool and conv2d.