Hi there,
I am trying to get llama-2 over mlc_chat_cli to work with VirtualMachineProfiler
to get per op profiling.
Judging from the reply here, this should be possible.
So far, I have changed the following:
auto fload_exec = executable->GetFunction("vm_profiler_load_executable");
// [...]
PackedFunc profile_func_ = vm_->GetFunction("profile");
// [...]
Then, in order to profile an operation, from what I understand (from here), I need to call the profile function with the first argument being the function name I want to profile and the rest of the arguments being as they were. However, so far I have had no real success.
I have tested against the decode
function, as well as softmax_with_temperature
. But I am getting issues wrt the passed arguments.
The code I am trying looks like this:
NDArray Softmax(NDArray input, float temperature) {
// [...]
tvm::runtime::profiling::Report report = profile_func_("softmax_with_temperature", input,
temperature_arr);
std::cout << report->AsTable() << "\n";
The error I am getting is:
mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:1890: InternalError: Check failed: type_code_ == kTVMObjectHandle (11 vs. 8) : expected Object but got str
My questions:
- Is there a way to get a per op report for the whole execution graph? If so, how?
- If not, do I need to do the profiling per
PackedFunction
exposed through the executable? - In that case, am I doing something wrong wrt the passed arguments?
Thanks in advance.