Why "c --unpacked-api" not work anymore?

Mousius · January 17, 2022, 12:00pm

Sorry about this, I added the logic to deprecate to relay.build but not to tvm.build - I’ll put a PR together with the warnings enabled there as well. Adding unpacked-api to the Graph executor options would involve adding support when the JSON is parsed to run the operators using the correct arguments, which leads me to think we shouldn’t re-add the argument as they’re not compatible and leaves users in a weird state. What is your motivation to not use the Ahead of Time Executor here?

SebastianBoblestETAS · January 17, 2022, 1:47pm

I agree that this is confusing. The motivation for not using the aot is that I have my own AOT solution. An additional benefit of the graph executor is that I can analyze the graph.json outside of TVM. What would your proposal be? An additional drop_graph_to_json option for the aot would probably also work.

areusch · January 17, 2022, 2:35pm

ah I saw these updates after I reviewed the PR @SebastianBoblestETAS put together. Perhaps it makes sense to agree on a strategy here and then iterate on the PR.

I agree the PR needs additional work in GraphExecutor to be viable. It could be somewhat challenging to build calls to unpacked API at runtime within the constraints of the CRT.

We could also implement a graph export by either a) extracting it from the AOT-generated TIR or b) generating it during AOTExecutorCodegen. I think this might be a useful tool, but I’m also not convinced we should do this. Below are some thoughts about the overall problem.

Another approach would be for @SebastianBoblestETAS to parse the AOT-generated TIR instead of consulting the JSON. I think this might be a bit more complex for someone consuming the output of TVM, which seems worse. On the other hand, if you compare between the two approaches above, extracting from TIR is the more stable one in the long run as additional TIR-to-TIR compiler passes (e.g. USMP and anything else that wants to do whole-program) are added after AOT code generation. In this situation, we’re essentially requiring ourselves to translate TIR into a JSON format and then document this transformation enough that users could make sense of it. In that light, I wonder if it is easier to improve the docs of our TIR parsing library and make it easy for folks to consult the TIR rather than some kind of graph.json format.

We could also discuss this on Wednesday at the microTVM meetup too, if folks are interested.

cc @jroesch @tqchen

SebastianBoblestETAS · January 17, 2022, 4:20pm

Sure, a discussion in the microTVM meetup would be great!

areusch · January 18, 2022, 12:27am

great, I’ve added that to the agenda.

areusch · January 19, 2022, 4:46pm

following some discussion, we think it makes sense to add a facility to export the pre-codegen TIR in Model Library Format. I previously created ModelLibraryFormatPrinter to print TIR using consistent variable names. It might be worth looking into tvm.module.Module.AsTVMScript() as well, since that should parse better.

@SebastianBoblestETAS could you take a look into this and create a new pre-RFC thread with a proposal?

areusch · January 19, 2022, 4:48pm

@miraclezqc apologies I think this thread got a bit hijacked. I think the conclusion here was that -unpacked-api is not expected to produce functional code with -executor=graph, because GraphExecutor doesn’t know how to invoke unpacked API. did you have a different use for this? Sebastian also had a use for this that didn’t follow the checked-in code path, so I’d like to be sure we addressed your concern as well.

SebastianBoblestETAS · January 20, 2022, 6:11am

Sure, thanks for the links so that I know where to start Will take me a few days to get started on this I am afraid.

SebastianBoblestETAS · January 27, 2022, 2:16pm

Hi, we had a first look into this. For our purposes it seems actually sufficient to add support to emit the graph.json in the aot executor. What is your opinion on that?

areusch · January 27, 2022, 5:48pm

Two concerns about that:

maintenance of the JSON exporter. It would need to be robust to whatever type of TIR is generated by AOTExecutorCodegen. Since both the generator and the consumer are in the same codebase, it might not be that bad.
encoding the TIR program in JSON. For example, with USMP we would move to the idea of offset-into-memory-pool memory planning. However, graph.json expects a set of storage_id which correspond to buffers not necessarily placed in any particular section of memory. graph.json could limit us there. We could define our own JSON format or declare that we would then adapt the graph.json to accommodate the AOT program. We should make an RFC and solicit feedback from @manupa-arm @mousius about that.

SebastianBoblestETAS · January 28, 2022, 7:44am

Ok, great. I must admit that on a first glance a general TIR exporter seems a very complex task for me to do it. But still, I would be glad to implement it after I have some help in where to start exactly.

SebastianBoblestETAS · February 3, 2022, 4:09pm

Sorry, for the long time that passed by. I would like to understand better what you propose so that I can finally start writing a pre-RFC.

Is it correct that you propose to subclass TVMScriptPrinter to create a TIR exporter? That looks manageable. We already have a CodeGenerator that does this to some extent for individual kernels. I have trouble figuring out where in the workflow I would create that instance and use it. Could you maybe give me a hint here? If this class then gets the TIR representation of the entire network after AOTExecutorCodegen, then the topology of the network should be reconstructable from the input output relations of the individual kernels, right?

areusch · February 3, 2022, 11:55pm

Hey @SebastianBoblestETAS sorry I missed your past message. in thinking about this a bit more, there are a couple of places this could be added.

Right now we generate a map ir_module_by_target in tvm.build. We export a similar thing in tvm.relay.build but it’s not present on all ExecutorFactory, so we can’t use it downstream.

Model Library Format currently will export the TIR representation of this using its own printer. If we had the IRModule from tvm.relay.build, we could also export the TIR representation as well (e.g. in src/tir-N.txt). Most ideally, we would first add the IRModule to tvm.relay.build output, and then modify the Model Library Format artifact to also contain that output.

Two caveats here:

Model Library Format can’t handle heterogeneous execution, so this might not work for you.
There might be some limitations with TVMScript cc @junrushao @shingjan but based on initial discussion with them we think this should work fine.

Maybe you could try doing the first bit and see if it’s useful to you and permits the approach you described above (that’s indeed what I was suggesting you do, analyze the TIR to construct the producer/consumer relationship between tensors). If so we could explore adding it to Model Library Format.

junrushao · February 4, 2022, 12:46am

If there is any bug in TVMScript, please feel free to open an issue with a minimal reproducible example

fPecc · February 4, 2022, 7:49am

For me, target=“c --unpacked-api=1” in tvm.build is also not working, in the latest main commit. It does work when I use relay.build though. Is this the expected behavior?

SebastianBoblestETAS · March 8, 2022, 12:46pm

Hi, we are currently working on a prototype for the TIR export functionality. After pulling the latest changes from main, we are no longer able to create C code with the AOT executor. We tried many different options and all fail.

This works:

target = "c" executor = "graph" executor_options = {}

with tvm.transform.PassContext( opt_level=opt_level, config={ "tir.disable_vectorize": True, }, ): module = relay.build( loaded_model.mod, executor=tvm.relay.backend.Executor(name=executor, options=executor_options), target=tvm.target.Target( target, host=target, ), params=loaded_model.params, ) source = module.lib.get_source()

This

target = "c" executor = "aot" executor_options = {}

and this

target = "c" executor = "aot" executor_options = {"unpacked-api": 0}

and this

target = "c" executor = "aot" executor_options = {"unpacked-api": 0, "interface-api": "packed"}

all fail with

File "$HOME/MyGithub/tvm/src/runtime/module.cc", line 104 TVMError: Module[metadata_module] does not support GetSource

This

target = "c" executor = "aot" executor_options = {"unpacked-api": 1}

and this

target = "c" executor = "aot" executor_options = {"unpacked-api": 1, "interface-api": "packed"}

fail with

`File “$HOME/MyGithub/tvm/src/relay/backend/aot_executor_codegen.cc”, line 921 TVMError:

An error occurred during the execution of TVM. For more information, please see: Handle TVM Errors — tvm 0.9.dev0 documentation

Check failed: (static_cast(use_unpacked_api_) == false) is false: Need unpacked-api == false (got: 1) and interface-api == “packed” (got: packed) when targeting c++ runtime`

This target = "c --unpacked-api" executor = "aot" executor_options = {}

fails with

File "/home/bbo9fe/MyGithub/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__ raise get_last_ffi_error() tvm._ffi.base.TVMError: TVMError: Executor "graph": Attribute "unpacked-api" not in "[link-params]".

(The error message is more detailed than on main. But that is the only difference in our code)

The reason for this is that in python/tvm/relay/build_module.py in the function _reconstruct_from_deprecated_options(deprecated_params_target) the executor is replaced by a graph executor if deprecated_executor_args contains elements. Is this behavior intended? Because in our opinion it is hard for users to understand what happens here.

This target = "llvm" executor = "aot" executor_options = {"unpacked-api": 0, "interface-api": "packed"}

fails with

4: tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::SeqStmtNode const*) 3: tvm::tir::ExprFunctor<llvm::Value* (tvm::PrimExpr const&)>::VisitExpr(tvm::PrimExpr const&) 2: tvm::codegen::CodeGenLLVM::VisitExpr_(tvm::tir::CallNode const*) 1: tvm::codegen::CodeGenCPU::CreateIntrinsic(tvm::tir::CallNode const*) 0: tvm::codegen::CodeGenLLVM::CreateIntrinsic(tvm::tir::CallNode const*) File "$HOME/MyGithub/tvm/src/target/llvm/codegen_llvm.cc", line 1091 TVMError: unknown intrinsic Op(tir.tvm_call_cpacked_lowered)

This error message is hard to understand: Check failed: (static_cast<bool>(use_unpacked_api_) == false) is false: Need unpacked-api == false (got: 1) and interface-api == "packed" (got: packed) when targeting c++ runtime

Why is the c++ runtime referenced in the aot_executor_codegen.cc ?

The current behavior is really hard to understand. We would like to work on this issue but do not have a deep enough technical understanding. So any input is highly welcome.

areusch · March 8, 2022, 6:16pm

Hi @SebastianBoblestETAS, can you try building with runtime = "c"? This should mirror the previous AOT configuration. PR 10283 added support for the C++ runtime, so your previous config may have actually been under-specified (but TVM was probably not properly complaining before). You should be able to use the previous options with runtime = "c".

SebastianBoblestETAS · March 9, 2022, 4:58pm

Hi @areusch, thanks for helping me here. I actually can run everything now with runtime=tvm.relay.backend.Runtime(“crt”), executor=tvm.relay.backend.Executor(name=“aot”, options={“unpacked-api”: 1})

however, source = module.lib.get_source() now only gives me the main function:

#include "tvm/runtime/c_runtime_api.h" #ifdef __cplusplus extern "C" { #endif TVM_DLL int32_t tvmgen_default___tvm_main__(void* conv1d_input,void* output0); int32_t tvmgen_default_run(void* args, void* type_code, int num_args, void* out_value, void* out_type_code, void* resource_handle) { return tvmgen_default___tvm_main__(((DLTensor*)(((TVMValue*)args)[0].v_handle))[0].data,((DLTensor*)(((TVMValue*)args)[1].v_handle))[0].data); } #ifdef __cplusplus } #endif

The code for the kernels is no longer there and I do not know how to find it. Maybe a little context on what we want to do is helpful: Currently we work on a prototype of the TIR exporter we recently discussed. We added a new function get_json() to Module.

The goal that we have is to be able to extract the full TIR representation of the host module and all device modules regardless of the specific executor, target or runtime.

We are currently writing a JsonScriptPrinter class similar to the TVMScriptPrinter class. We currently use it in driver_api.cc in the build function like this:

const auto* bf = runtime::Registry::Get(“script.AsJSONScript”);

std::string jsonTir = (*bf)(mhost_all, “T”, false);

mhost->SetJson(jsonTir);

Currently we think this is a good approach to provide an API for this, but this can be discussed in an RFC once we have a working version. For the moment it would be sufficient to again get access to the kernel code for executor = Executor(“aot”). For executor = Executor(“graph”) everthing works fine. @UlrikHjort @Khoi @MJKlaiber

areusch · March 9, 2022, 5:32pm

we discussed this a bit at the TVM Community Meeting this morning:

Sebastian raised three concerns:
1. Can we add a clean set of unit tests to document how tvm.relay.build is called and catch API changes?
2. Error messages are not detailed enough. The error message Sebastian received didn’t make sense unless he already knew in advance that the default executor was graph.
3. Deprecated parameters are not handled in an ideal way. It’s hard to know what the set of options are.
We discussed that this breakage was likely due to relay.build allowing invalid build options in the past. PR 10283 tightened the set of allowed options.
Gave some context on PR 9352 which originally introduced these runtime and executor parameters. This change is a half-step to the desired end state discussed in [pre-RFC] Compilation Configuration Representation. That RFC needs to be resolved before we can proceed here. We want to resolve this before releasing 0.9.
In the meantime, it seems like we should fix up the _reconstruct_from_deprecated_options to provide better error messages since a few different folks have ran into problems here.
@gromero raised that tvmgen_default_run_model was renamed to tvmgen_default___main__, and this was an externally-visible change. This shouldn’t have been an API change, but @gromero was working around a limitation in AOT where -link-params=0 was not supported. @gromero will reproduce this and raise a GH issue to capture this.

areusch · March 9, 2022, 5:36pm

@SebastianBoblestETAS glad you were able to get this working again.

I think we have changed the module hierarchy with that PR. You can explore the Module hierarchy like so:

def print_mod_tree(m, indent=0):
    print(f"{' ' * indent} - {m!r}")
    for i in m.imported_modules:
        print_mod_tree(i, indent + 2)

You might need to pick through imported_modules to find the kernels you’re looking for.

I think it would be great to discuss this use case with folks working more closely on TIR. But, in order to avoid hijacking this thread away from the tvm.relay.build API incompatibilities, could you start another thread and perhaps outline your use case there? This could be a “pre-RFC” (but a very simple one…no need to follow the RFC template, just give a brief summary of your motivations, goals, and approach). I can cc some relevant folks there.