Note: There may be some duplicates or missing items. As before branch switching, 1. Some pull requests were merged into both branch; 2. Reports are based on Pull Request records of main branch.
As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.
Feedback and suggestions are welcomed so that we can further improve these updates.
Community
- Announcement
- Discuss
- preRFC
- #16419 - Add new key for release signing
RFCs
- #104 - [RFC] Scalable vectors in TIR
We continue to improve Relax, TIR, Frontend and other Runtimes .
BugFix
- #16436 - Ensure that bf16 arrays are created as expected
- #16361 - Disable SingleEnvThreadVerifier
- #16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search
- #16269 - Update pillow usage
CI
- #16435 - Update image tag to 20240126-070121-8ade9c30e
- #16384 - Remove NVIDIA_DISABLE_REQUIRE
- #16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
- #16366 - Upgrade sccache version to 0.7.*
- #16369 - Upgrade Unity ci images
- #16344 - Update docker images tag to 20240105-165030-51bdaec6
- #16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
- #16337 - [Hexagon][UnitTest] Disable flaky quantization test
- #16336 - Upgrade cmake version to 3.24.0
Docker
- #16348 - Upgrade pip in i386 container
Docs
-
#16482 - [Doc] Fix Docstring in
extern.py
for Sphinx - #16346 - [Doc] Fix minor error in âExpressions in Relayâ
- #16282 - [Doc] Fix minor error in doc (Add an operator to Relay)
Frontend
- #16483 - [Unity]Add Sigmoid and Square Op
- #16478 - [PaddlePaddle] Fixed the bug that prevented the model from being successfully converted to microTVM on MacOS
- #16427 - [Unity][NN] Better support for dynamic convolutions
- #16417 - [Relay][Torch] fix pytorch frontend linspace op
- #16400 - [Relay][Torch] fix pytorch frontend not support logical or
- #16395 - [Relax][ONNX]fix onnx frontend parse
- #16390 - [Relay][Torch] fix a typo mistake in nonzero_numpy
- #16319 - [Relay][Torch] add aten:broadcast_to
- #16316 - [Unity]Introducing Object
Hexagon
- #16448 - [VM]Implement dma_copy and dma_wait builtin for hexagon
LLVM
- #16373 - Update Host.h path
Metal
- #16438 - Dispatch numerically stable tanh for metal
OpenCL & CLML
- #16328 - [RUNTIME][CLML] Fix for Softmax op for 4D tensors
- #16394 - [OpenCL][CMake] Fix OpenCL tests compilation
ROCm
Relax
- #16467 - [Unity][MSC][Refactor] Reconstruct BYOC and runner
- #16422 - [Unity][CodeGen] RunCodegen based on externally-exposed functions
- #16472 - [Unity] Improved error message in tvm::relax::UpdateStructInfo
- #16473 - [Unity] Improve error message in tensor_to_shape struct inference
- #16466 - Memory planning for âpartially dynamicâ shapes
- #16464 - NDArray Cache Update with DLTensor Support
-
#16379 - [Unity][TVMScript] Update
call_packed
semantics to support empty sinfo_args - #16315 - [Unity][Transform] Implement relax.transform.ReorderTakeAfterMatmul
- #16313 - [Unity][Transform] Implement relax.transform.ExpandMatmulOfSum
- #16411 - [Unity][Transform] Handle symbolic variables in LambdaLift
- #16443 - [Unity][FIX] fix thread dtype mismatch
- #16442 - Revert â[Unity] Split DecomposeOpsForTraining into two stepsâ
- #16437 - [Unity] Improve buffer allocation for handling duplicated buffer names.
- #16439 - [Unity] Support cumsum with pure int32
- #16432 - [Unity] downgrade cmake version requirement
- #16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
- #16418 - [Unity][Fix] Fix mismatched intrinsic name
- #16129 - [Unity][Transform] Replace eligible operators with in-place versions in dataflow blocks
- #16414 - [Bugfix][Unity] Recover MSVC/NVCC/ROCm/Vulkan
- #15954 - [Unity] Split DecomposeOpsForTraining into two steps
- #16111 - [Unity][Transform] Memory planning for dynamic-shape func return
- #16396 - [Unity] PagedKVCache supporting on-the-fly RoPE calculation
- #16385 - [Unity][Op] Add Conv3D Operator
- #16284 - [Unity][nnModule] Dynamic shape support in nn Module
- #16378 - [Unity][BlockBuilder] Restore bb.get()
- #16374 - [Unity] Support TIR kernel for PagedKVCache
- #16314 - [Unity][Transform] Implement relax.transform.AdjustMatmulOrder
- #16349 - [Unity][MSC] Avoid depending on trivial bindings in Relax intermediate
-
#16376 - [Unity][Contrib] Fix a bug due to typo in vllm
reconstruct_from_cache
kernel and add test - #16375 - [Unity] Fix creation of disco ProcessSession
- #16388 - [Unity] Update dispatch test cases following the merge from main
- #16335 - [Unity] Set CMAKE_CUDA_ARCHITECTURES default to native
- #16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
- #16306 - [Unity][Transform] Update LambdaLift to use name of lifted lambda
- #16310 - [Unity][Analysis] Show objects instead of names in WellFormedChecker
- #16362 - [Unity][Fix] Memory planning check value type of âtir_var_upper_boundâ
- #16367 - [Unity][Transform] Handle replacement at both var binding and usage
- #16309 - [Unity][Transform] Use parameter name in BundleModelParams
- #16307 - [Unity] Improved error message in ExprMutator::ReEmitBinding
- #16308 - [Unity] Improved error message for matmul shape mismatch
- #16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
- #16360 - [Unity] Enhance Torch-consistency in rehsape
- #16350 - [Unity][Contrib] Add vLLM paged attention kernel
- #16303 - [Unity][NN] Use Linear name for nn.op.permute_dims
- #16325 - [Unity][MSC][Legalize] legalize codes and mute logging
- #16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
- #16312 - [Unity][Analysis] Add utility for collecting compile-time bindings
- #16330 - [Unity][WEBGPU] Enable wasm exception propagation
- #16304 - [Unity][Analysis] Handle PrimStructInfo in EraseToWellDefined
- #16305 - [Unity][Transform] Implement UpdateParamStructInfo
- #16331 - [Unity] Alter op impl handling empty transform for output
- #16254 - [Unity] Dispatch cumsum and sort
- #16120 - [Unity][Transform] Extract partial-tuple-usage from FuseTIR
- #16311 - [Unity] Validate struct info in relax::Call constructor
- #16333 - [Unity] Fix nn.op.tensor_ir_op signature
- #16302 - [Unity] Cutlass kernel compatibility with cmake 3.18+
- #16323 - [Unity] Upgrade flashinfer 3rdparty submodule
- #16317 - [Unity] Fix PagedKVCache per FlashInfer update
-
#16327 - [Unity][nn.Module] Introduce operator
empty
Relay
- #16324 - make âToScalarâ support directly obtaining âint64_tâ
Runtime
- #16486 - KV cache providing workspace for attn kernel
- #16456 - [KVCache] AttentionWithFusedQKV and RoPE mode
- #16415 - [Memory] Implement support for non-zero offset within a storage object in AllocNDArrâŚ
- #16387 - [RPC] Enable RPCObjectRef return in RPC
- #16377 - Use cudaGetDeviceCount to check if device exists
TIR
- #16406 - Fix of inter thread reduction with shared memory prefetch
- #16293 - Extend DP4A tensor intrin
- #16345 - Allow sync threads inside condition
- #16250 - In SplitHostDevice, check for variables in thread extents
- #16184 - [Transform] Implement InlinePrivateFunctions
TOPI
-
#16383 - [Target] Add fp16 SIMD support for conv2d on
arm_cpu
targets
TVMC
- #16261 - Add tvmc flag to print ir before and print ir after named pass
cuda & cutlass & tensorrt
- #16342 - [CUDA] Simple extend to optimize reuse for static shared memory.
web
- #16485 - [wasm] Enlarge initial memory for emcc
- #16444 - [Unity]Temp disable wasm exception
- #16420 - [CI][WASM] Update emsdk and nodejs version
- #16294 - [Unity][Fix] Fix fetchNDArray for f32-to-bf16
Misc
- #16453 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm
- #16454 - [BugTIR] fix thread_sync occurs in letstmt
- #16468 - [LINT] Fix pylint issues in test_dma_builtin.py
- #16413 - [Contrib] Workspace for cuBLAS backend
- #16460 - [Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in different frameworks (#16397)
- #16461 - [Minor] Fix Docstring for sphinx-build
- #16431 - [Schedule] Loop-Partition Scheduling Primitive
- #16451 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/ethosu
- #16452 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/cmsisnn
- #16445 - [skip ci] update branch rule to prepare for unity transition
- #16426 - [CMake] Enable cuda lang if USE_CUDA is on
- #16407 - Add NVIDIA Hopper H100 target tag
- #16398 - [DeviceAPI] Support querying total global memory
- #16357 - [RPC] Fix tuning on macOS and Windows (#15771)
- #16386 - [Thrust] Use no sync exec policy and caching allocator
- #16321 - [DLight] Skip rule if target is not suitable
- #16343 - [CMake][MSVC] Disable permissive mode for MSVC builds
- #16242 - [Codegen] Fix if_then_else codegen
- #16341 - [CMake] Use ccache as CMAKE_CUDA_COMPILER_LAUNCHER
- #16332 - Change metal dtype of ceil_log2 to fp32
- #16326 - [release][Dont Squash] Update version to 0.15.0 and 0.16.0.dev on main branch