TVM Monthly - Feburary 2024

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

RFCs

None


We continue to improve Relax, TIR, Frontend and other Runtimes .

BYOC

  • #16567 - Skip processed functions in FuseOpsByPattern and RunCodegen

BugFix

  • #16649 - [FFI] Add a missing default for datatype lanes
  • #16492 - [Executor] fix debug_executor function debug_get_output
  • #16598 - [Transform]Handle non-composite lambda functions in FuseOps
  • #16565 - [Transform] Keep private non-primitive functions in FuseTIR
  • #16518 - Use xxx instead of pow(x,3)

CI

  • #16611 - [AOT][Testing] Print output values on test failure
  • #16546 - Disable testing that downloads from mxnet
  • #16521 - Fix CI Script and Broken Tests
  • #16502 - Support tvm-bot rerun for tvm-unity task

Docs

  • #16610 - [Doc] Fixed Docstring usage example in tvm.ir.make_node
  • #16572 - [Doc] Remove MxNet related tutorials
  • #16514 - [Unity][Doc] Document passes that depend on DataflowBlocks and encourage using ConvertToDataflow

Frontend

  • #16604 - [Relax][Onnx] fix clip unsqueeze opset implement
  • #16616 - [PaddlePaddle] Support conv2d when data_format is NHWC
  • #16526 - [Keras] Enable Dense operator for any input dims

LLVM

  • #16612 - [SVE] Add support for scalable data type strings
  • #16523 - [SVE] Change the dtype of Ramp and Broadcast lanes to PrimExpr

Metal

  • #16605 - [RUNTIME]Fix multithreading access of metal runtime

ROCm

  • #16550 - [RUNTIME]Properly align rocm parameter buffer

Relax

  • #16591 - [Unity][Transform] Handle dynamic shapes in CombineParallelMatmul
  • #16594 - [Transform] Preserve param names in LiftTransformParams
  • #16575 - [Unity] GPU sampling
  • #16574 - Additional unit tests for RemoveUnusedParameters
  • #16585 - [Unity][Analysis] Include impure call in VerifyWellFormed errors
  • #16421 - [Unity][Transform] Raise error in FuseOpsByPattern for SSA violation
  • #16629 - Fix error message in BlockBuilder
  • #16592 - Handle dynamic arguments in legalization of nn.attention
  • #16590 - [Unity][Transform] Check for permute_dims in ExpandMatmulOfSum
  • #16563 - Implement operators to read runtime DLTensor* information
  • #16581 - [Unity][MSC][M4.2][Step2] Enable plugin with manager, test plugins in compile pipeline
  • #16600 - Expose name_hint field for BlockBuilder.match_cast
  • #16601 - [Transform] Canonicalize let var = R.const bindings
  • #16583 - [Unity][VM] Recursively visit match bindings in VMShapeLowerMutator
  • #16586 - Ignore non-relax functions in relax.transform.RunCodegen
  • #16573 - [VM] Re-implementation of callback functions
  • #16561 - [Bugfix]Remove call to tvm.build for empty TIR module
  • #16564 - [Unity] Check for symbolic vars in PrimValue in when lowering to TIR
  • #16558 - Minor updates for NN frontend
  • #16542 - Support callback as argument
  • #16487 - [Unity][Transform] Handle call_tir_inplace in FuseTIR and FuseOps
  • #16355 - [Unity] Infer struct info for relax.op.split on dynamic-sized index
  • #16465 - [Redo][Unity] Split DecomposeOpsForTraining into two steps
  • #16495 - [Unity][MSC][M4.2][Step1] Enable plugin with manager, test plugins in compile pipeline
  • #16498 - [Frontent] “tensor_ir_inplace” op
  • #16500 - [Unity] Support storage reuse for dynamic shapes

Relay

  • #16622 - [ONNX] Fix the attribute mode parse of operator Upsample
  • #16626 - [ONNX] Fix the Resize operator in ONNX frontend
  • #16624 - [ONNX] fix the wrong default value about dtype in Multinomial converter

Runtime

  • #16635 - [RPC] Enable RPCObjectRef over multi-hop RPC
  • #16630 - Add TVM_DLL to threading backend funcs
  • #16568 - [Relax]RNNState for Space State Models
  • #16541 - Add “TVM_DLL” to NDArray cache load func
  • #16545 - Fix dtype conversion for bf16 and fp8
  • #16508 - ParallelFor skipping thread backend for unit extent

TIR

  • #16544 - Expand debug symbol output for CodeGenLLVM
  • #16553 - Fix get_block_access_region for let bindings
  • #16515 - Require exactly same-dtype matching for Vulkan smem reuse

TVMScript

  • #16640 - Represent tir::builtin::ret() using python “return”
  • #16562 - [Bugfix]Handle R.match_cast as last binding in if/else
  • #16593 - [Unity]Parse R.Object return type from call_pure_packed
  • #16356 - [Unity]Optionally hide StructInfo that can be inferred

cuda & cutlass & tensorrt

  • #16619 - [Bugfix][Cutlass] Check if function attributes is None

micoNPU

  • #16401 - [microNPU][ETHOSU] Add fixed point for matmul

web

  • #16631 - Fix NDArrayCache loading report callback
  • #16525 - Move ArtifactCache to Interface, Support Cache delete and Batch Delete, Remove typo
  • #16554 - Compatibility with PagedKVCache in WebGPU
  • #16527 - Revert “[Unity]Temp disable wasm exception (#16444)”
  • #16504 - [Relax]Add ApplyPresenceAndRequencyPenalty

Misc

  • #16595 - [Transform] Check for zero-param operators in LiftTransformParams
  • #16639 - [Disco] Expose functions to query the per-worker device/rank
  • #16617 - [Disco] Implement Session.import_python_module method
  • #16599 - [Transform] De-duplicate MatchCast nodes in EliminateCommonSubexpr
  • #16596 - [Transform] Implement relax.transform.ReorderPermuteDimsAfterConcat
  • #16597 - [Transform] Allow explicit name of bundled model parameters
  • #16602 - [Transform] Improvements to LazyTransformParams
  • #16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
  • #16606 - [KVCache] Support passing in attn_score_scaling_factor into KV cache
  • #16608 - Extend gpu memory bandwidth test to work through RPC
  • #16587 - [Debug] Improve error message for codegen pattern mismatches
  • #16570 - [Marvell BYOC]: Marvell AI Accelerator Integration - Phase 1
  • #16576 - Update the 3rdparty/libflash_attn submodule
  • #16580 - [KVCache] Support mode “None” for Rotary Embebdding
  • #16578 - [KVCache] Support returning query positions
  • #16571 - Fix compile warnings
  • #16540 - [Upd] Enable lld search to include /opt/rocm/llvm/bin for rocm
  • #16539 - Improve error message in NDArray::CopyFromTo
  • #16524 - [Build] Improving debug and build-dir options
  • #16551 - [KVCache] Fix attention kernel for ROCm
  • #16512 - Cut pytest-lazy-fixture
  • #16506 - Bump 3rdparty/cutlass_fpA_intB_gemm version
  • #16511 - [Minor] Fix Clang compilation warning in fuse_tir.cc and codegen_c_host.cc
  • #16516 - Add Relax, Unity Tags in make_notes.py
  • #16497 - [Instrument] Add default instrument to print all passes
  • #16494 - [DPL] Support tir_vars field in is_call_tir pattern