TVM Monthly - January 2024

Note: There may be some duplicates or missing items. As before branch switching, 1. Some pull requests were merged into both branch; 2. Reports are based on Pull Request records of main branch.

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

RFCs

  • #104 - [RFC] Scalable vectors in TIR

We continue to improve Relax, TIR, Frontend and other Runtimes .

BugFix

  • #16436 - Ensure that bf16 arrays are created as expected
  • #16361 - Disable SingleEnvThreadVerifier
  • #16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search
  • #16269 - Update pillow usage

CI

  • #16435 - Update image tag to 20240126-070121-8ade9c30e
  • #16384 - Remove NVIDIA_DISABLE_REQUIRE
  • #16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
  • #16366 - Upgrade sccache version to 0.7.*
  • #16369 - Upgrade Unity ci images
  • #16344 - Update docker images tag to 20240105-165030-51bdaec6
  • #16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
  • #16337 - [Hexagon][UnitTest] Disable flaky quantization test
  • #16336 - Upgrade cmake version to 3.24.0

Docker

  • #16348 - Upgrade pip in i386 container

Docs

  • #16482 - [Doc] Fix Docstring in extern.py for Sphinx
  • #16346 - [Doc] Fix minor error in “Expressions in Relay”
  • #16282 - [Doc] Fix minor error in doc (Add an operator to Relay)

Frontend

  • #16483 - [Unity]Add Sigmoid and Square Op
  • #16478 - [PaddlePaddle] Fixed the bug that prevented the model from being successfully converted to microTVM on MacOS
  • #16427 - [Unity][NN] Better support for dynamic convolutions
  • #16417 - [Relay][Torch] fix pytorch frontend linspace op
  • #16400 - [Relay][Torch] fix pytorch frontend not support logical or
  • #16395 - [Relax][ONNX]fix onnx frontend parse
  • #16390 - [Relay][Torch] fix a typo mistake in nonzero_numpy
  • #16319 - [Relay][Torch] add aten:broadcast_to
  • #16316 - [Unity]Introducing Object

Hexagon

  • #16448 - [VM]Implement dma_copy and dma_wait builtin for hexagon

LLVM

Metal

  • #16438 - Dispatch numerically stable tanh for metal

OpenCL & CLML

  • #16328 - [RUNTIME][CLML] Fix for Softmax op for 4D tensors
  • #16394 - [OpenCL][CMake] Fix OpenCL tests compilation

ROCm

  • #16441 - [WebGPU] Intrin Dispatch: tanh, erf, log
  • #16404 - Some fixes of ROCm codegen

Relax

  • #16467 - [Unity][MSC][Refactor] Reconstruct BYOC and runner
  • #16422 - [Unity][CodeGen] RunCodegen based on externally-exposed functions
  • #16472 - [Unity] Improved error message in tvm::relax::UpdateStructInfo
  • #16473 - [Unity] Improve error message in tensor_to_shape struct inference
  • #16466 - Memory planning for “partially dynamic” shapes
  • #16464 - NDArray Cache Update with DLTensor Support
  • #16379 - [Unity][TVMScript] Update call_packed semantics to support empty sinfo_args
  • #16315 - [Unity][Transform] Implement relax.transform.ReorderTakeAfterMatmul
  • #16313 - [Unity][Transform] Implement relax.transform.ExpandMatmulOfSum
  • #16411 - [Unity][Transform] Handle symbolic variables in LambdaLift
  • #16443 - [Unity][FIX] fix thread dtype mismatch
  • #16442 - Revert “[Unity] Split DecomposeOpsForTraining into two steps”
  • #16437 - [Unity] Improve buffer allocation for handling duplicated buffer names.
  • #16439 - [Unity] Support cumsum with pure int32
  • #16432 - [Unity] downgrade cmake version requirement
  • #16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
  • #16418 - [Unity][Fix] Fix mismatched intrinsic name
  • #16129 - [Unity][Transform] Replace eligible operators with in-place versions in dataflow blocks
  • #16414 - [Bugfix][Unity] Recover MSVC/NVCC/ROCm/Vulkan
  • #15954 - [Unity] Split DecomposeOpsForTraining into two steps
  • #16111 - [Unity][Transform] Memory planning for dynamic-shape func return
  • #16396 - [Unity] PagedKVCache supporting on-the-fly RoPE calculation
  • #16385 - [Unity][Op] Add Conv3D Operator
  • #16284 - [Unity][nnModule] Dynamic shape support in nn Module
  • #16378 - [Unity][BlockBuilder] Restore bb.get()
  • #16374 - [Unity] Support TIR kernel for PagedKVCache
  • #16314 - [Unity][Transform] Implement relax.transform.AdjustMatmulOrder
  • #16349 - [Unity][MSC] Avoid depending on trivial bindings in Relax intermediate
  • #16376 - [Unity][Contrib] Fix a bug due to typo in vllm reconstruct_from_cache kernel and add test
  • #16375 - [Unity] Fix creation of disco ProcessSession
  • #16388 - [Unity] Update dispatch test cases following the merge from main
  • #16335 - [Unity] Set CMAKE_CUDA_ARCHITECTURES default to native
  • #16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
  • #16306 - [Unity][Transform] Update LambdaLift to use name of lifted lambda
  • #16310 - [Unity][Analysis] Show objects instead of names in WellFormedChecker
  • #16362 - [Unity][Fix] Memory planning check value type of ‘tir_var_upper_bound’
  • #16367 - [Unity][Transform] Handle replacement at both var binding and usage
  • #16309 - [Unity][Transform] Use parameter name in BundleModelParams
  • #16307 - [Unity] Improved error message in ExprMutator::ReEmitBinding
  • #16308 - [Unity] Improved error message for matmul shape mismatch
  • #16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
  • #16360 - [Unity] Enhance Torch-consistency in rehsape
  • #16350 - [Unity][Contrib] Add vLLM paged attention kernel
  • #16303 - [Unity][NN] Use Linear name for nn.op.permute_dims
  • #16325 - [Unity][MSC][Legalize] legalize codes and mute logging
  • #16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
  • #16312 - [Unity][Analysis] Add utility for collecting compile-time bindings
  • #16330 - [Unity][WEBGPU] Enable wasm exception propagation
  • #16304 - [Unity][Analysis] Handle PrimStructInfo in EraseToWellDefined
  • #16305 - [Unity][Transform] Implement UpdateParamStructInfo
  • #16331 - [Unity] Alter op impl handling empty transform for output
  • #16254 - [Unity] Dispatch cumsum and sort
  • #16120 - [Unity][Transform] Extract partial-tuple-usage from FuseTIR
  • #16311 - [Unity] Validate struct info in relax::Call constructor
  • #16333 - [Unity] Fix nn.op.tensor_ir_op signature
  • #16302 - [Unity] Cutlass kernel compatibility with cmake 3.18+
  • #16323 - [Unity] Upgrade flashinfer 3rdparty submodule
  • #16317 - [Unity] Fix PagedKVCache per FlashInfer update
  • #16327 - [Unity][nn.Module] Introduce operator empty

Relay

  • #16324 - make “ToScalar” support directly obtaining “int64_t”

Runtime

  • #16486 - KV cache providing workspace for attn kernel
  • #16456 - [KVCache] AttentionWithFusedQKV and RoPE mode
  • #16415 - [Memory] Implement support for non-zero offset within a storage object in AllocNDArr…
  • #16387 - [RPC] Enable RPCObjectRef return in RPC
  • #16377 - Use cudaGetDeviceCount to check if device exists

TIR

  • #16406 - Fix of inter thread reduction with shared memory prefetch
  • #16293 - Extend DP4A tensor intrin
  • #16345 - Allow sync threads inside condition
  • #16250 - In SplitHostDevice, check for variables in thread extents
  • #16184 - [Transform] Implement InlinePrivateFunctions

TOPI

  • #16383 - [Target] Add fp16 SIMD support for conv2d on arm_cpu targets

TVMC

  • #16261 - Add tvmc flag to print ir before and print ir after named pass

cuda & cutlass & tensorrt

  • #16342 - [CUDA] Simple extend to optimize reuse for static shared memory.

web

  • #16485 - [wasm] Enlarge initial memory for emcc
  • #16444 - [Unity]Temp disable wasm exception
  • #16420 - [CI][WASM] Update emsdk and nodejs version
  • #16294 - [Unity][Fix] Fix fetchNDArray for f32-to-bf16

Misc

  • #16453 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm
  • #16454 - [BugTIR] fix thread_sync occurs in letstmt
  • #16468 - [LINT] Fix pylint issues in test_dma_builtin.py
  • #16413 - [Contrib] Workspace for cuBLAS backend
  • #16460 - [Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in different frameworks (#16397)
  • #16461 - [Minor] Fix Docstring for sphinx-build
  • #16431 - [Schedule] Loop-Partition Scheduling Primitive
  • #16451 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/ethosu
  • #16452 - Bump pillow from 10.0.1 to 10.2.0 in /apps/microtvm/cmsisnn
  • #16445 - [skip ci] update branch rule to prepare for unity transition
  • #16426 - [CMake] Enable cuda lang if USE_CUDA is on
  • #16407 - Add NVIDIA Hopper H100 target tag
  • #16398 - [DeviceAPI] Support querying total global memory
  • #16357 - [RPC] Fix tuning on macOS and Windows (#15771)
  • #16386 - [Thrust] Use no sync exec policy and caching allocator
  • #16321 - [DLight] Skip rule if target is not suitable
  • #16343 - [CMake][MSVC] Disable permissive mode for MSVC builds
  • #16242 - [Codegen] Fix if_then_else codegen
  • #16341 - [CMake] Use ccache as CMAKE_CUDA_COMPILER_LAUNCHER
  • #16332 - Change metal dtype of ceil_log2 to fp32
  • #16326 - [release][Dont Squash] Update version to 0.15.0 and 0.16.0.dev on main branch
2 Likes