As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.
Feedback and suggestions are welcomed so that we can further improve these updates.
Community
- #17018 - New committer: Balint Cristian
RFCs
Add the Khronos Neural Network Exchange Format (NNEF) as a frontend to TVM Relay and Relax. Link about discussion.
- #108 - [RFC] Add NNEF frontend
We continue to improve Relax, TIR, Frontend and other Runtimes.
Disco
- #17035 - [QoL] Implement broadcast/scatter methods for Session
- #16997 - [Runtime]Restore checks for hangup of disco pipe
- #16992 - [Bugfix]Handle NDArray larger than OS buffer for pipe
-
#16978 - Implement
num_workers
property fordisco.Session
- #16989 - Treat hangup of disco worker process as kShutdown
- #16993 - Allow allocation that only exists on worker0
- #16979 - Expose disco.Session.shutdown through the python API
Dlight
- #17026 - Perf improvement for low_batch_gemv on Metal
- #17016 - Update Adreno GEMV Rules
- #16972 - [GPU] Enhance opencl thread limit for schedules
- #16973 - [GPU] Improved gemv outer fallback schedule
- #16958 - Check for target in function attributes
Frontend
- #16961 - [Bugfix][ONNX] Improve broadcast and batch_matmul conversion
LLVM
- #16966 - [SVE] Add support for representing and creating buffer-level predicates
- #17001 - [SVE] Use only powers of two as possible vscale values
-
#16962 - [SVE] Add codegen support for
vscale_range()
function attribute - #16968 - Stringref API deprecation fixes
- #16965 - [SVE] Add get_active_lane_mask builtin
Metal
- #17025 - Support metal device profiling
ROCm
- #17037 - [Runtime]Enable ROCm host memory support
Relax
- #17033 - [Bugfix] Apply FuseOps to nested DataflowBlock
- #17032 - [Bugfix] Annotate ComputePrimValue output as host function
- #17034 - [Bugfix] Bind symbolic variables in R.match_cast
- #16960 - [UnitTest] Validate IRModule with multiple targets
- #16995 - [KVCache] Support KVCache decode from forked sequence and pop more tokens
- #16959 - [Transform] Handle identical PrimFunc with distinct VDevice
- #16589 - [Unity] Check for transpose and dynamic shape in AdjustMatmulOrder
- #16988 - [KVCache] Fix the aux data syncing order of paged KV cache
- #16922 - [BugFix]change FuseOpsByPattern strategy to pattern-match maximal subgraph
- #16982 - [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas
- #16955 - Implement relax.op.view
- #16971 - Support nested ModuleList in nn.Module
- #16826 - Express dynamic arguments of strided_slice as arguments
Relay
- #16996 - fixed to make TupleGetItem inherits the previous span
Runtime
- #17045 - Fix PagedKVCache for PopN and enhance tests
- #16998 - Compatibility with dmlc::Stream API changes
- #17036 - Use preferred host memory (pinned memory) in KV cache
- #16994 - Allow query of available device memory through DeviceAPI
TIR
- #17039 - Fix Bug in VectorizeLoop
- #17030 - Fix Shuffle rewrite
- #16947 - Support narrow dtype for let binding
TOPI
- #17040 - Fix SME conv2d schedule import and intrin argument
- #17003 - [SME]Add conv2d NHWC SME fp32 schedule
-
#17007 - [Testing] Enable conv2d NHWC fp16 topi testing for
arm_cpu
-
#16977 - Remove
blockIdx.z
in topi sort
TVMScript
- #16967 - Fix error reporting inside Macro func
cuda & cutlass & tensorrt
- #16980 - [Cuda] Skip FreeDataSpace when CUDA driver is in inconsistent state
web
- #17031 - Fix string to uint8 array for special characters
- #17028 - Add dtype and offset for CreateView in runtime
Misc
- #16981 - [SME] Add scalable fp16->fp32 dense schedule
- #17029 - [Contrib] Implement NDArray cache update
- #17027 - [picojson] Let objects be ordered when serializing
- #17021 - [WebGPU] Update error messages to be more user-friendly
- #17010 - Support multinomial_from_uniform dispatch
- #16999 - [USMP] add missing const specifier for global_const_workspace
- #17005 - [WebGPU] Handle device OOM in createBuffer
- #16921 - [SME] Introduce scalable fp32 dense schedule
- #16957 - chore: remove repetitive words
- #16909 - [QoL][IR] Provide std::hash and std::equal_to for IR Variable types
- #16987 - [JVM] Automatic Compatibility of JVM AttachCurrentThread
- #16974 - [CUBLAS][FP8] Enable R.matmul + R.multiply offloading
- #16896 - [CUBLAS] Enable offloading of R.matmul + R.dequantize