tvm.build is for individual op (or a few fused ops), while relay.build is to build the whole model, which will call into tvm.build. So to build a TF model, you need relay.build
I’m trying to understand the relationship between relay.build() and tvm.build().
@vinx13, thanks for you reply. Does this mean a neural-network import from certain framework, say MXNET (e.g. this tutorial), can only use relay.build() to do full-model compilation? Is there a way to optimize further using tvm.build() or it is actually doing that under the hood? It is appreciated anyone can specify the code there. Thanks.
shape_dict = {'data': x.shape}
mod, params = relay.frontend.from_mxnet(block, shape_dict)
## we want a probability so add a softmax operator
func = mod["main"]
func = relay.Function(func.params, relay.nn.softmax(func.body), None, func.type_params, func.attrs)
target = 'llvm'
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(func, target, params=params)
##########################
# Anything we can do here to further optimize individual operators/steps
# in the imported resnet using tvm.build()?
########################
...
from tvm.contrib import graph_runtime
ctx = tvm.gpu(0)
dtype = 'float32'
m = graph_runtime.create(graph, lib, ctx)
# set inputs
m.set_input('data', tvm.nd.array(x.astype(dtype)))
m.set_input(**params)
# execute
m.run()
# get outputs
tvm_output = m.get_output(0)
top1 = np.argmax(tvm_output.asnumpy()[0])
print('TVM prediction top-1:', top1, synset[top1])
It seems to me relay.build() and tvm.build() are processed through completely different paths in TVM source repo. It is appreciated anyone can help correct me or confirm. Thanks in advance.
Relay.build() call in python is quickly passed on into C++ side, mainly processed within src/relay/backend/build_module.cc.
/*!
* \brief Compile a Relay IR module to runtime module.
*
* \param relay_module The Relay IR module.
* \param params The parameters.
*/
void BuildRelay(
IRModule relay_module,
const std::unordered_map<std::string, tvm::runtime::NDArray>& params) {
// Relay IRModule -> IRModule optimizations.
relay_module = Optimize(relay_module, targets_, params); // <== **Various graph-level optimizations**.
// Get the updated function.
auto func = Downcast<Function>(relay_module->Lookup("main"));
// Generate code for the updated function.
graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
graph_codegen_->Init(nullptr, targets_);
graph_codegen_->Codegen(func);
ret_.graph_json = graph_codegen_->GetJSON();
ret_.params = graph_codegen_->GetParams();
auto lowered_funcs = graph_codegen_->GetLoweredFunc();
if (lowered_funcs.size() == 0) {
LOG(WARNING) << "no lowered funcs exist in the compiled module";
} else {
ret_.mod = tvm::build( <= **Calls into tvm.build()???**
lowered_funcs,**
target_host_,**
BuildConfig::Current());**
}
...
}
In contrast, the tvm.build(…) python call is mainly processed via python code python/tvm/driver/build_module.py
...
fhost_all = []
device_modules = []
for tar, flist in target_flist.items():
fhost, mdev = _build_for_device(flist, tar, target_host)
# Save the current lowered functions of the host and the device module.
fhost_all += fhost
device_modules.append(mdev)
# Generate a unified host module.
mhost = codegen.build_module(fhost_all, str(target_host))
# Import all modules.
for mdev in device_modules:
if mdev:
mhost.import_module(mdev)
return mhost