TVM Monthly - January 2024

ysh329 · February 3, 2024, 6:05am

Note: There may be some duplicates or missing items. As before branch switching, 1. Some pull requests were merged into both branch; 2. Reports are based on Pull Request records of main branch.

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

Announcement
Discuss
preRFC
- [RFC] Combine Ansor and AutoTVM to Improve Scheduling
- [RFC] Sphinx and the Documentation Refactor
#16419 - Add new key for release signing

RFCs

#104 - [RFC] Scalable vectors in TIR

We continue to improve Relax, TIR, Frontend and other Runtimes .

BugFix

#16436 - Ensure that bf16 arrays are created as expected
#16361 - Disable SingleEnvThreadVerifier
#16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search
#16269 - Update pillow usage

CI

#16435 - Update image tag to 20240126-070121-8ade9c30e
#16384 - Remove NVIDIA_DISABLE_REQUIRE
#16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
#16366 - Upgrade sccache version to 0.7.*
#16369 - Upgrade Unity ci images
#16344 - Update docker images tag to 20240105-165030-51bdaec6
#16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
#16337 - [Hexagon][UnitTest] Disable flaky quantization test
#16336 - Upgrade cmake version to 3.24.0

Docker

#16348 - Upgrade pip in i386 container

Docs

#16482 - [Doc] Fix Docstring in extern.py for Sphinx
#16346 - [Doc] Fix minor error in “Expressions in Relay”
#16282 - [Doc] Fix minor error in doc (Add an operator to Relay)

Frontend

#16483 - [Unity]Add Sigmoid and Square Op
#16478 - [PaddlePaddle] Fixed the bug that prevented the model from being successfully converted to microTVM on MacOS
#16427 - [Unity][NN] Better support for dynamic convolutions
#16417 - [Relay][Torch] fix pytorch frontend linspace op
#16400 - [Relay][Torch] fix pytorch frontend not support logical or
#16395 - [Relax][ONNX]fix onnx frontend parse
#16390 - [Relay][Torch] fix a typo mistake in nonzero_numpy
#16319 - [Relay][Torch] add aten:broadcast_to
#16316 - [Unity]Introducing Object

Hexagon

#16448 - [VM]Implement dma_copy and dma_wait builtin for hexagon

LLVM

#16373 - Update Host.h path

Metal

#16438 - Dispatch numerically stable tanh for metal

OpenCL & CLML

#16328 - [RUNTIME][CLML] Fix for Softmax op for 4D tensors
#16394 - [OpenCL][CMake] Fix OpenCL tests compilation

ROCm

#16441 - [WebGPU] Intrin Dispatch: tanh, erf, log
#16404 - Some fixes of ROCm codegen

Relax

#16467 - [Unity][MSC][Refactor] Reconstruct BYOC and runner
#16422 - [Unity][CodeGen] RunCodegen based on externally-exposed functions
#16472 - [Unity] Improved error message in tvm::relax::UpdateStructInfo
#16473 - [Unity] Improve error message in tensor_to_shape struct inference
#16466 - Memory planning for “partially dynamic” shapes
#16464 - NDArray Cache Update with DLTensor Support
#16379 - [Unity][TVMScript] Update call_packed semantics to support empty sinfo_args
#16315 - [Unity][Transform] Implement relax.transform.ReorderTakeAfterMatmul
#16313 - [Unity][Transform] Implement relax.transform.ExpandMatmulOfSum
#16411 - [Unity][Transform] Handle symbolic variables in LambdaLift
#16443 - [Unity][FIX] fix thread dtype mismatch
#16442 - Revert “[Unity] Split DecomposeOpsForTraining into two steps”
#16437 - [Unity] Improve buffer allocation for handling duplicated buffer names.
#16439 - [Unity] Support cumsum with pure int32
#16432 - [Unity] downgrade cmake version requirement
#16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
#16418 - [Unity][Fix] Fix mismatched intrinsic name
#16129 - [Unity][Transform] Replace eligible operators with in-place versions in dataflow blocks
#16414 - [Bugfix][Unity] Recover MSVC/NVCC/ROCm/Vulkan
#15954 - [Unity] Split DecomposeOpsForTraining into two steps
#16111 - [Unity][Transform] Memory planning for dynamic-shape func return
#16396 - [Unity] PagedKVCache supporting on-the-fly RoPE calculation
#16385 - [Unity][Op] Add Conv3D Operator
#16284 - [Unity][nnModule] Dynamic shape support in nn Module
#16378 - [Unity][BlockBuilder] Restore bb.get()
#16374 - [Unity] Support TIR kernel for PagedKVCache
#16314 - [Unity][Transform] Implement relax.transform.AdjustMatmulOrder
#16349 - [Unity][MSC] Avoid depending on trivial bindings in Relax intermediate
#16376 - [Unity][Contrib] Fix a bug due to typo in vllm reconstruct_from_cache kernel and add test
#16375 - [Unity] Fix creation of disco ProcessSession
#16388 - [Unity] Update dispatch test cases following the merge from main
#16335 - [Unity] Set CMAKE_CUDA_ARCHITECTURES default to native
#16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
#16306 - [Unity][Transform] Update LambdaLift to use name of lifted lambda
#16310 - [Unity][Analysis] Show objects instead of names in WellFormedChecker
#16362 - [Unity][Fix] Memory planning check value type of ‘tir_var_upper_bound’
#16367 - [Unity][Transform] Handle replacement at both var binding and usage
#16309 - [Unity][Transform] Use parameter name in BundleModelParams
#16307 - [Unity] Improved error message in ExprMutator::ReEmitBinding
#16308 - [Unity] Improved error message for matmul shape mismatch
#16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
#16360 - [Unity] Enhance Torch-consistency in rehsape
#16350 - [Unity][Contrib] Add vLLM paged attention kernel
#16303 - [Unity][NN] Use Linear name for nn.op.permute_dims
#16325 - [Unity][MSC][Legalize] legalize codes and mute logging
#16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
#16312 - [Unity][Analysis] Add utility for collecting compile-time bindings
#16330 - [Unity][WEBGPU] Enable wasm exception propagation
#16304 - [Unity][Analysis] Handle PrimStructInfo in EraseToWellDefined
#16305 - [Unity][Transform] Implement UpdateParamStructInfo
#16331 - [Unity] Alter op impl handling empty transform for output
#16254 - [Unity] Dispatch cumsum and sort
#16120 - [Unity][Transform] Extract partial-tuple-usage from FuseTIR
#16311 - [Unity] Validate struct info in relax::Call constructor
#16333 - [Unity] Fix nn.op.tensor_ir_op signature
#16302 - [Unity] Cutlass kernel compatibility with cmake 3.18+
#16323 - [Unity] Upgrade flashinfer 3rdparty submodule
#16317 - [Unity] Fix PagedKVCache per FlashInfer update
#16327 - [Unity][nn.Module] Introduce operator empty

Relay

#16324 - make “ToScalar” support directly obtaining “int64_t”

Runtime

#16486 - KV cache providing workspace for attn kernel
#16456 - [KVCache] AttentionWithFusedQKV and RoPE mode
#16415 - [Memory] Implement support for non-zero offset within a storage object in AllocNDArr…
#16387 - [RPC] Enable RPCObjectRef return in RPC
#16377 - Use cudaGetDeviceCount to check if device exists

TIR

#16406 - Fix of inter thread reduction with shared memory prefetch
#16293 - Extend DP4A tensor intrin
#16345 - Allow sync threads inside condition
#16250 - In SplitHostDevice, check for variables in thread extents
#16184 - [Transform] Implement InlinePrivateFunctions

TOPI

#16383 - [Target] Add fp16 SIMD support for conv2d on arm_cpu targets

TVMC

#16261 - Add tvmc flag to print ir before and print ir after named pass

cuda & cutlass & tensorrt

#16342 - [CUDA] Simple extend to optimize reuse for static shared memory.

web

#16485 - [wasm] Enlarge initial memory for emcc
#16444 - [Unity]Temp disable wasm exception
#16420 - [CI][WASM] Update emsdk and nodejs version
#16294 - [Unity][Fix] Fix fetchNDArray for f32-to-bf16

Misc

#16453 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm
#16454 - [BugTIR] fix thread_sync occurs in letstmt
#16468 - [LINT] Fix pylint issues in test_dma_builtin.py
#16413 - [Contrib] Workspace for cuBLAS backend
#16460 - [Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in different frameworks (#16397)
#16461 - [Minor] Fix Docstring for sphinx-build
#16431 - [Schedule] Loop-Partition Scheduling Primitive
#16451 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/ethosu
#16452 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/cmsisnn
#16445 - [skip ci] update branch rule to prepare for unity transition
#16426 - [CMake] Enable cuda lang if USE_CUDA is on
#16407 - Add NVIDIA Hopper H100 target tag
#16398 - [DeviceAPI] Support querying total global memory
#16357 - [RPC] Fix tuning on macOS and Windows (#15771)
#16386 - [Thrust] Use no sync exec policy and caching allocator
#16321 - [DLight] Skip rule if target is not suitable
#16343 - [CMake][MSVC] Disable permissive mode for MSVC builds
#16242 - [Codegen] Fix if_then_else codegen
#16341 - [CMake] Use ccache as CMAKE_CUDA_COMPILER_LAUNCHER
#16332 - Change metal dtype of ceil_log2 to fp32
#16326 - [release][Dont Squash] Update version to 0.15.0 and 0.16.0.dev on main branch