Hi,
I have a small test case where I do a 2-d matrix addition.
A = tvm.placeholder((4,16), name=‘input1’)
B = tvm.placeholder((4,16), name=‘input2’)
C = tvm.compute((4,16),
lambda m,n: A[m,n] + B[m,n],
name=‘output’)
I tile on the leading dimension by 2 and parallelize as follows:
I then match the inner matrix add with a tensorize routine:
And I add my own c function that does the matrix add as a packed func.
I print the address of the pointer passed into the routine. When there is a single thread(TVM_NUM_THREADS=1) the pointers are updated correctly prior to being passed to the function
But when there are 2 threads are more, the input pointer is not offset for threads other than the first one.
After tensorize
produce output {
parallel (m.outer, 0, 2) {
tvm_call_packed(“test_intrin”, tvm_address_of(input1[(m.outer*32)]), tvm_address_of(input2[(m.outer32)]), tvm_address_of(output[(m.outer32)]))
}
}
We see that the tvm_address_of is pointing to the right offset in the buffer. Can the implementation of tvm_address_of be modifed to yield the right offset, taking into account parallelized loops?
OK, this might have something to do with a bug in TVM code generation when supporting packed function calling in the parallel body, Can you try swap call_packed by call_extern for now, and directly provide an extern “C” function with the same signature? This will get around this
Can you give an example of how to use call_extern please? I’m passing the output in a s a pointer to the external function so I have a void return type, is this supported?
Some qns:
Do you use TVM_REGISTER_GLOBAL to register the extern function?
How do you export the extern function ,by loading a .so?
do you need get_global_func?
are the arguments to the function C++ datatypes or TVMArgValues as for packed func?
For normal extern function, you don’t need to register them, just make sure they are exported as extern C in the runtime is fine. You cannot get them through tvm.get_global as they are typed c functions
Thank you for the response. The issue I’m facing is that I want to use call_extern to invoke a function in a dynamic .so library which I link by using tvm.module.load.
In your approach you implement the extern function as llvm inline assembly, but for my case the function is implemented in the dynamic .so library and I want to call that… Is there a way to do this?
The .so library is an external library which I have compiled using a c++ compiler. Now I load it using tvm.module.load to expose it to tvm runtime. But the external function implemented in the library is not resolved as indicated by the error I reported earlier.
Please check out the example here https://github.com/dmlc/tvm/pull/2156, as I said in my last post, it is fine as long as .so library is loaded in the tvm runtime.
However, we do need to load the so as RTLD_GLOBAL object, so it is visible to other so library. Or you can re-expose it in the tvm runtime and have tvm runtime link against your dll