TVM Monthly - May 2024

ysh329 · June 2, 2024, 1:56pm

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

#17018 - New committer: Balint Cristian

RFCs

Add the Khronos Neural Network Exchange Format (NNEF) as a frontend to TVM Relay and Relax. Link about discussion.

#108 - [RFC] Add NNEF frontend

We continue to improve Relax, TIR, Frontend and other Runtimes.

Disco

#17035 - [QoL] Implement broadcast/scatter methods for Session
#16997 - [Runtime]Restore checks for hangup of disco pipe
#16992 - [Bugfix]Handle NDArray larger than OS buffer for pipe
#16978 - Implement num_workers property for disco.Session
#16989 - Treat hangup of disco worker process as kShutdown
#16993 - Allow allocation that only exists on worker0
#16979 - Expose disco.Session.shutdown through the python API

Dlight

#17026 - Perf improvement for low_batch_gemv on Metal
#17016 - Update Adreno GEMV Rules
#16972 - [GPU] Enhance opencl thread limit for schedules
#16973 - [GPU] Improved gemv outer fallback schedule
#16958 - Check for target in function attributes

Frontend

#16961 - [Bugfix][ONNX] Improve broadcast and batch_matmul conversion

LLVM

#16966 - [SVE] Add support for representing and creating buffer-level predicates
#17001 - [SVE] Use only powers of two as possible vscale values
#16962 - [SVE] Add codegen support for vscale_range() function attribute
#16968 - Stringref API deprecation fixes
#16965 - [SVE] Add get_active_lane_mask builtin

Metal

#17025 - Support metal device profiling

ROCm

#17037 - [Runtime]Enable ROCm host memory support

Relax

#17033 - [Bugfix] Apply FuseOps to nested DataflowBlock
#17032 - [Bugfix] Annotate ComputePrimValue output as host function
#17034 - [Bugfix] Bind symbolic variables in R.match_cast
#16960 - [UnitTest] Validate IRModule with multiple targets
#16995 - [KVCache] Support KVCache decode from forked sequence and pop more tokens
#16959 - [Transform] Handle identical PrimFunc with distinct VDevice
#16589 - [Unity] Check for transpose and dynamic shape in AdjustMatmulOrder
#16988 - [KVCache] Fix the aux data syncing order of paged KV cache
#16922 - [BugFix]change FuseOpsByPattern strategy to pattern-match maximal subgraph
#16982 - [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas
#16955 - Implement relax.op.view
#16971 - Support nested ModuleList in nn.Module
#16826 - Express dynamic arguments of strided_slice as arguments

Relay

#16996 - fixed to make TupleGetItem inherits the previous span

Runtime

#17045 - Fix PagedKVCache for PopN and enhance tests
#16998 - Compatibility with dmlc::Stream API changes
#17036 - Use preferred host memory (pinned memory) in KV cache
#16994 - Allow query of available device memory through DeviceAPI

TIR

#17039 - Fix Bug in VectorizeLoop
#17030 - Fix Shuffle rewrite
#16947 - Support narrow dtype for let binding

TOPI

#17040 - Fix SME conv2d schedule import and intrin argument
#17003 - [SME]Add conv2d NHWC SME fp32 schedule
#17007 - [Testing] Enable conv2d NHWC fp16 topi testing for arm_cpu
#16977 - Remove blockIdx.z in topi sort

TVMScript

#16967 - Fix error reporting inside Macro func

cuda & cutlass & tensorrt

#16980 - [Cuda] Skip FreeDataSpace when CUDA driver is in inconsistent state

web

#17031 - Fix string to uint8 array for special characters
#17028 - Add dtype and offset for CreateView in runtime

Misc

#16981 - [SME] Add scalable fp16->fp32 dense schedule
#17029 - [Contrib] Implement NDArray cache update
#17027 - [picojson] Let objects be ordered when serializing
#17021 - [WebGPU] Update error messages to be more user-friendly
#17010 - Support multinomial_from_uniform dispatch
#16999 - [USMP] add missing const specifier for global_const_workspace
#17005 - [WebGPU] Handle device OOM in createBuffer
#16921 - [SME] Introduce scalable fp32 dense schedule
#16957 - chore: remove repetitive words
#16909 - [QoL][IR] Provide std::hash and std::equal_to for IR Variable types
#16987 - [JVM] Automatic Compatibility of JVM AttachCurrentThread
#16974 - [CUBLAS][FP8] Enable R.matmul + R.multiply offloading
#16896 - [CUBLAS] Enable offloading of R.matmul + R.dequantize