TVM Monthly - July 2024

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

RFCs

None


We continue to improve Relax, TIR, Frontend and other Runtimes .

BugFix

  • #17142 - Allow import of TVM when current directory is read-only
  • #17138 - [Fix][TIR] Fix outdated call to create extern buffer in make_extern
  • #17132 - Restrict CopyOnWrite to _type_final

CI

  • #17221 - Reduce logging level when checking if docker image exists
  • #17206 - Update dummy-variable regex for pylint
  • #17117 - [CLML]Fix for few clml regression issues
  • #17155 - Remove lint step from unity/pr-head step

Disco

  • #17182 - Implement SocketSession
  • #17191 - Cross-group and p2p send/receive primitives
  • #17180 - Group-wise operation

Dlight

  • #17187 - [GPU] Add OpenCL dequant matmul schedule

Docs

  • #17146 - [DOC] Fix typo for the “We utilize the intermediate representation of nn.Graph to convert the OneFlow model to Reley.”

Hexagon

  • #17204 - Fix LWP assembly handler (predicate register)
  • #17169 - [CMake] Fix v66 build issue
  • #17162 - Support RPC execution of existing shared lib
  • #17123 - Add support for v75

LLVM

  • #17199 - Fix for getHostCPUFeatures API change

MetaSchedule

  • #17166 - Replace xgboost.rabit with xgboost.collective because it’s deprecated
  • #17171 - Add a testcase for padded conv2d in meta_schedule

ROCm

  • #17141 - [Backend]Fix error when building TVM with LLVM 19

Relax

  • #17201 - [Transform]Handle is_group argument in IPC AllReduce
  • #17198 - Disable fusion for fetching from the packed params in FuseOps
  • #17149 - Implement Rewriter class for pattern-rewrite
  • #17189 - [PyTorch] Add support for torch.nn.functional.max_pool2d
  • #17192 - [KVCache] Partial layers support
  • #17186 - [PyTorch] Add support for torch.einsum
  • #17184 - [PyTorch] Add support for torch.permute
  • #17157 - Integrate cuDNN attention
  • #17167 - [ONNX] Add support for Sign and Not
  • #17121 - [BugFix] Fix a bug about the IR construction in test file
  • #17160 - Fix fuseOps via pattern
  • #17139 - Fix cublas dispatch for corner cases
  • #17127 - [KVCache] Support fork in sliding window sink part

Relay

  • #17177 - [FQ2I]: Use appropriate dtype while quantizing relay.op.nn.pad…

Runtime

  • #17208 - Allow aborting fetchNDArray through AbortSignal

TIR

  • #17158 - [Analyzer] Simplify x==x expressions for all dtypes
  • #17134 - [Schedule] Remove @type_check for set_axis_separator

TOPI

  • #17091 - Add dense schedule for fp16 and fp32 using gemm

Misc

  • #17190 - [Cython][FFI] Fix crash when call del operator for handle
  • #17170 - Pass to eliminate redundant branch and overcompute
  • #17185 - Remove and replace deprecated distutils.util.strtobool()
  • #17188 - Add packaging to python/gen_requirements.py
  • #17181 - [FFI] Add python signal handler for ctypes FFI
  • #17173 - Use packaging.version.parse instead of distutils.version.LooseVersion
  • #17174 - [TVMJS] Check DataType.NUMPY2STR when saving array
  • #17168 - [Meta Schedule][XGBoost] enable custom callback func test with xgboost>=1.6.0
  • #17156 - [release][Dont Squash] Update version to 0.17.0 and 0.18.0.dev on main branch
  • #17156 - [release][Dont Squash] Update version to 0.17.0 and 0.18.0.dev on main branch
  • #17135 - [QoL][IR] Provide default constructor for NameSupply/GlobalVarSupply
  • #17125 - [Utils] Define line-length for “ruff format”
  • #17152 - GraphExecutor: Fix wild pointer assign when input and output are reshape
  • #17150 - [WebGPU] Fall back to 256MB for maxBufferSize if needed
  • #17128 - [Compute-inline] Prefer T.where for reverse compute-inlined block with predicate
  • #16976 - [WebGPU] Implement tir.dp4a with WGSL built-in function dot4I8Packed
  • #17124 - [WebGPU] Add tir.dp4a