Also, if TVM enables int4 type computable somehow, this would need to be simulated in software since it’s not normal cpu primitive type. Adding this implementation may require some tedious handling such as promote, trunc, overflow detection and etc. In this case, I think implementing arbitrary precision type handling is reasonable not int4 specific one. it would be able to make it easier other to support other precisions and more special hardware backends.
Intel’s method also requires calibration dataset.
I know the calibration needs dataset but it’s not a one having been used to train the model. I think this way is like the quantization TVM currently supports (KL divergence mode).
We can consider to support Bfloat16 data type in native. More and more hardware will support this type since this year. Because it’s not a standard type in generic programming language(c++ / numpy) and traditional compiler (gcc / llvm), developer is facing a challenge of writing kernel with it. It’s a huge advantage if TVM can generate Bfloat16 kernel efficiently, and TVM is designed to do this well.
Thanks for all the suggestions so far, we will summarize the roadmap near the end of the month. The current goal is to aim for a three month timeframe (April).
Besides the set of features we want to see, we would love to what our community members want to work on(either new proposals) or some of these existing ones. It would help us coordinate and estimate feasibilities of these items
I will mainly work on the automated quantization project in the next three months, see the RFC for details:
In summary, I hope that with this project we can provide a easy-to-use quantization tool for users, which can be adapted to different models, and different hardwares quickly.
It would be very helpful to support dynamic batch size.
Thanks everyone for a great discussion, a draft roadmap is posted here https://github.com/apache/incubator-tvm/issues/4845
Can you add nvdla in tvm/vta too, as a milestone?
I see an increasing demand for replacing Relay recursive visitor/mutator by non-recursive ones (due to stack limit). Would you think it is doable in v0.7?
I am against this idea. Let me explain.
It is definitely doable by using continuation passing style/trampoline.
However, they require rewriting code in a much uglier manner. Also, as most call are not tail call, there wont be much memory saving, and we had only trade stack space for discontinuous heap space.
A better solution imo is to call setrlimit(RLIMIT_STACK, &R);
How is GNN going? I am thinking of doing some program optimization in GNN training.
but increasing stack size might be against internal security policies.
There is the raw IRVisitor which doesnt recurse. The difficulty is migrating all the pass… I imagine whoever is against setrlimit can help by migrating the pass.
How about we aim to have at least an RFC and an infra landed in the next cycle of release?
cc @jroesch @mbrookhart , I agree that it is important to introduce non-recursive version for most cases, in particular the PostOrderRewriting case where we can visit the dataflow part of the Expr and use a callback to rewrite them. As long as we manually manage the stack, there won’t be stack overflow problem
I agree that it is super ugly (and great amount of work) to migrate, but it is existential problems when we want to optimize a medium-sized network. I also agree that
setrlimit is a good workaround if I am working alone on my personal laptop. However, industrial issues may require a potentially different solution, as @yzhliu has mentioned
If there is better approach (less ugly, less amount of work) to manually manage the stack, I think I would vote for it. So, why not think about it
I am open to better solution then CPS. I just personally dont know any.
I hope a
name argument can be added to Relay ops. Absence of a
name makes debugging difficult and losing connection with ops in frontend frameworks.