CodeGenCHost and CodeGenCBase and relation to internal and external compilers?

aca88 · September 17, 2020, 7:40am

Hello all,

while trying to study the DNNL external compiler I came about the differences between how the external (DNNL) and the internal (target=‘c’) C codes are being generated:

In the codegen which is triggered with tvm.build(...,target='c'), the process eventually creates a CodeGenCHost object. The code stream of this object is then forwarded to the CSourceModuleCreate function but only gives it 2 arguments
In the dnnl external compiler example, the CodegenDNNL is derived from CodegenCBase and then CSourceModuleCreate constructor is fetched from the registry and gives it 4 arguments to it

Q1: Why does it seem to be two different “base” C codegen classes?

Q2: In general I have the question of what exactly does the CodegenCHost imply, that the CodegenDNLL could not simply derive from it? So in other words, when is it wrong to derive an external compiler from the CodegenCHost?

Notices that some of the includes in the CodegenDNLL are actually also included in the CodegenCHost

Q3: Why is the CsourceModuleCreate in the CodeGenCHost only given 2 arguments?

Thanks

ramana-arm · September 17, 2020, 8:17pm

@comaniac @lhutton1 - maybe you could help ?

comaniac · September 18, 2020, 12:16am

For Q1, sorry for the confusion. CodegenCBase does not relate to CodeGenC at all. It is the base class we created for BYOC.

For Q2, CodegenDNNL does not have to worry about the C host as it only processes subgraphs.

For Q3, CsourceModuleCreate takes 4 arguments while the last two are optional.

github.com

apache/incubator-tvm/blob/eacfe890669d026c3d3aea4d03f4f773819242dd/src/target/source/codegen_source_base.h#L143


runtime::Module SourceModuleCreate(std::string code, std::string fmt);

/*!
 * \brief Create a C source module for viewing and compiling GCC code.
 * \param code The code to be viewed.
 * \param fmt The code format.
 * \param symbol The symbol that the c source module represents.
 * \param const_vars. The constant variables that the c source module needs.
 * \return The created module.
 */
runtime::Module CSourceModuleCreate(const String& code, const String& fmt,
                                    const String& symbol = "",
                                    const Array<String>& const_vars = {});

/*!
 * \brief Wrap the submodules in a metadata module.
 * \param params The variable to constant mapping that is collected by the host
 *        module.
 * \param dso_module The host module to be wrapped.
 * \param modules The modules to be wrapped.
 * \return The wrapped module.

@zhiics may commet more on details.

aca88 · September 18, 2020, 6:06am

Thanks for the comments

I looked further into the code and this is what I came up with:

The CodeGenC (which is the base class for the CodeGenCHost) also deals with subgraphs. The main difference (AFAIK) between both types of subgraphs is that for DNNL they are in Relay, while the other are in TIR. This is because they have gone the Relay → Topi → TE → TIR process, which is the standard lowering process. BYOC is an alternative way to that. Therefore a BYOC C generator should not be derived from CodeGenC since its inputs are not in TIR format (its routines for visiting nodes expect TIR nodes).

Yeah I was aware that the last two were optional, but I was wondering why the CodeGenCHost doesnt set the other variables. I have a hypothesis, but unsure how correct it is since it only concerns the third argument symbol:

When generating C code with the CodeGenCHost, the process collects all operators beforehand and then bundles them up in one call to CodeGenCHost. This means that the runtime::Module can have the ''symbol since it is assumed that there will not be a second module with that name. In BYOC, and therefore in the DNNL example, each subgraph is sent to CodegenDNNL separately. Generating n runtime:Modules. If all are given the same symbol, the compiler will throw an error about name collision.

What my previous insight does not respond is:

Why, in the CodeGenCHost example, is symbol=''? How does the runtime know that the module where all routines are found is called '' and not any other string?
- In the BYOC example it is given the Relay composite function name which was partitioned from the original Relay program. Which makes sense because the Relay program calls that function and therefore having the same symbol makes it a simple mapping.
- Based on that logic I would expect it to have the default name 'main'