TVM Monthly - May 2021

junrushao · May 31, 2021, 9:43pm

TVM Monthly - May 2021

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During May of 2021 we welcomed many new contributors to the project. Importantly we welcomed @slyubomirsky, @leandron, @trevor-m as new committers, and @manupa-arm as the new reviewer. Thanks to everyone for the hard work and contributions!

We continue to improve TOPI and frontend support, especially the ONNX importer, and dynamic support in various operators. Vulkan support is being enhanced greatly in this month, including better codegen and runtime. TensorIR, tracked in the GitHub issue, is in steady process, while AutoTensorIR, the new auto-scheduling system on top of TensorIR, is officially under the community discussion phase in the new RFC process. Moreover, we landed various improvements on Relay, TIR, executor, AOT, TVMC and CI.

This forum got 117k pageviews, 2.7k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

TensorIR

Add TIR Level Legalization Function Registration And Update Intrinsic Lowering Pass #7936
FlattenBuffer #7962
CreatePrimFunc from TE #7987
Add storage scope to PointerType #8017
Lower and build TensorIR #8044
change IntRV to ExprRV #8077
Verification of cached flags #8114
Structural Error Reporting #8121

Relay

Update SimplifyTranspose to correctly simplify rank changing layout transforms #7807
Pass instrument framework #7952
Fix parsing hierarchical attribute names #7976
Allow printing annotation in the Relay text printer for var #8000
Enable registering op with python #8002
Dismantler: Added handling of packed func #8004
add removeUnusedFunctions pass in vm memoryopt #8040
Add fast_softmax support in fast_math pass #8138

Executor & AOT

Introducing AOT in TVM #7785
Fix get_outputs on the vm with a single output #7902
Fix parameter dump #7903
Improved MLF to contain workspace info #7938
Turn reshape into nop in graph executor backend. #7945
Fix a memory leak in SetParams #7960
Remove lookup parameter function in AOT #7988
AOT Demo #8075
Fix executor for different compilers #8006

Frontend

Improve dtype detection in loop to fix onnx tests. #7934
Support gather_nd batch_dims attribute for TF/ONNX #8084
Fix bug with non-fp32 gemm in onnx frontend. #8011
More Unit Tests! #7956
QLinearConv Support #8007
add onnx reverse sequence op #7771
Fix Dense with 3d inputs #7753
add batch_dim support for gatherV2 #7951
Support nested layers recursively in keras frontend #7949
Move infer_value to _get_list_param #8051
Use axis.size instead of len(axis) #8060
Added test infrastructure for TF2 frozen models #8074
Pytorch Conv Transpose Padding Fix #7958
Quantized TANH operator support in TF Lite Frontend #8024
update ops and add MobileNet #7972

AutoScheduler & AutoTVM & RPC

Fix autoscheduler matmul without units. #7957
Support AutoTVM for int4 tensorcore #7831
Explicitly set HardwareParams in test_auto_scheduler_sketch_generation. #8018
Remove minimum seed constraint on XGB Tuner #7992
Add sparse conv2d(1*1) support for auto_scheduler #8065
Make RecordReader error-free #8066
Add workaround to alter op layout bug in task extraction. #8143
Fix autoscheduler tuning on sparse matrices where there are multiple with the same shape #7974
Remove warning which is adding too much noise #7975
Bugfix. Removed server forcing IPv4 protocol #7953
Replace 0.0.0.0 with 127.0.0.1 for client connections #7766
Make tracker jupyter friendly via PopenWorker #7961

TOPI & Operators

Fix recast of relay ops without attributes #8043
Support dilations in pooling operators #7928
Support dynamic slicing on first few axes, keeping the rest static #8068
Add uniform distribution generator wrt threefry PRNG #8041
Support generating data of any shape in threefry_generate #8085
Support dynamic indices size in gather_nd and scatter_nd #8105
Fix compute and schedule bugs for conv2d_winograd_nhwc on mali device. #8091
Fix strided slice type change. #8022
Fix arm_cpu bitserial schedule with elemwise ops. #7929
Custom schedule for standalone transpose in cuda #8030
Remove deprecated CUBLAS_TENSOR_OP_MATH flag #8130
Fix conv2d HWNC type strategy #8147
sort.cc added to runtime for nms compatability #7942
Support concat in recast #8028

Vulkan

Added dummy implementations for TVMStreamHandle operations #7969
Uniform buffer bugfix, minor cleanup #7966
Call VulkanDeviceAPI destructor on program exit #7997
Spir-V codegen, correct labels/blocks in WhileNode. #8013
Remove some interface block decoration #8102
Added spvValidate check after vulkan shader generation #8098
Broke out implicit device requirements into SPIRVSupport #8048
Add device capabilities to Target, use in codegen #8127
Split out vulkan.cc into separate distinct functionality #8157
Add a default warp size 1 for vulkan and opencl #8109

Codegen

Fix assertion errors in llvm backend when using llvm debug build #7959
Fix make_int4x cuda codegen vectorize #8137
Check for cuda include dir in /usr/include. #8135
Refactor cl_program generation #7834
Fix codegen for inf and erf #8054
Metal: Split kernels and compile them separately #7980
Custom dyld linker for iOS mach-o executable files #7875
Bugfix nvcc command tool that relies on the compile time env #7964
Correctly build with -runtime=c without -system-lib #7954

CI & Build

Ignore invalid git tags when running "git describe" in version.py #8009
Added llvm-12 to ubuntu1804_install_llvm.sh #8008
Add PAPI to docker images #8016
set environment variables for UTF-8, to prevent errors when running black #8089
Bump gpu image to cuda 11.0.3 #8119
Cleanup stale logs for auto-tuning #8160
Hotfix the CI after image update #8164
Zephyr: Add mps2_an521 board to the CI #7914
CI QEMU Install libpython3.8 #8020
Update CI images #8031
rev jenkins containers for #7995 #8155
Pin black version #8139
Fix black whitespace errors #8124
Removed unnecessary file creation from unit tests. #7998
Bumped Ubuntu version to 18.04 for ci_gpu #7970
Always docker/build.sh with --no-cache. #8038
Fix C-style cast linting errors #8106
Remove clang-7 compiler pin for vulkan #8107
Add flag to build static version of TVM runtime #8059
Update CMake warning flags #8152
Fix requires_gpu #8050
Fix post-merge conflict between #7785 and #7945. #7982
Mark zephyr install world-writable in docker image to unblock #7995. #8037
Fix AttributeError when TEST_DATA_ROOT_PATH is set #8047
Infinite recursive device_api.ext_dev call fix #7985

TVMC

A simplified TVMC API for python scripting (Part 1). #7823
Add support for the MLF to ‘compile’ command #8086
Fix tvmc for cases when uTVM is not enabled #8153
Fix minor issues in the tvmc tune CLI #8039
add the support of the cross compiler options #7922

BYOC

TensorRT: Fixes for explicit batch mode, Support reduce to scalar, Support split op #7967
Remove ext params stored in metadata from params to avoid duplication #7977
TensorRT: Add nn.batch_matmul, nn.layer_norm, erf #8005
TensorRT: Only allow 4d or 5d inputs to TRT nn.pad #8073
Verilator: Skip mobilenet test if Verilator is not available #8094

Docs

Fixes a link in doc. #8064
Fix some typos #8101
Update to show github version #7948
Update links and fix typos in docs and readme #7965
Update stale links #8111
Added developer documentation for DeviceAPI and Target. #8082
Fix docs of threefry_split and threefry_generate #8035
Fix Relay build docstring #7963
Fix typos and format in comments #8132
Fix typo in a comment #8129
Change a, n, l to A, N, L in tutorials/get_started/autotvm_matmul.py #8027
doc: fix description of stop_fusion annotation #8095
Add how to enable IR debug messages. #7978
Fix some syntax errors #8116
Fix deploy_sparse tutorial #7939

Misc

Improve sparse performance on ROCM #7935
Rename gpu to cuda, and bump dlpack to v0.5 #8032
Rename gpu to cuda in java/rust/typescript #8036
Support the new python array api with DLPack #7993
Add default python iterator for Map. #8061
Improve signal handling in python env. #7919
Rename asnumpy → numpy in NDArray #8083
Increase host memory size #7933
Avoid round-trip Target-str-Target conversions #8161
Use flatBuffersBuffer_ in EdgeTPURuntime::Init() #8034
remove self-include in runtime/container.h #8117
Add logging to the bundle. #8115
allow Module exits without del #8063
Adding workspace byte alignment #8019
Add shape, structural hash, and layout information to profiling #7894

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (74), comaniac (45), areusch (41), leandron (26), masahi (24), tkonolige (22), jcf94 (19), junrushao1994 (18), mbrookhart (16), jwfromm (11), manupa-arm (10), jroesch (9), tmoreau89 (9), merrymercy (8), FrozenGene (7), altanh (7), zhiics (5), trevor-m (5), mbaret (4), giuseros (4), Hzfengsy (4), u99127 (4), gromero (4), csullivan (4), hogepodge (4), icemelon9 (3), ZihengJiang (3), mehrdadh (3), elvin-n (3), liangfu (2), yongwww (2), lhutton1 (2), Lunderberg (2), zxybazh (2), MarisaKirisame (1), vinx13 (1), kevinthesun (1), vegaluisjose (1), t-vi (1), d-smirnov (1), ANSHUMAN87 (1), yidawang (1), xqdan (1), electriclilies (1), zxy844288792 (1), echuraev (1), mdw-octoml (1), zackcquic (1), apeskov (1), bernhardklein (1), AlexanderSerov (1), Mousius (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

Lunderberg (17), tkonolige (15), tqchen (11), mehrdadh (10), areusch (9), jwfromm (8), trevor-m (7), d-smirnov (7), csullivan (6), YuchenJin (6), masahi (5), giuseros (5), Hzfengsy (4), gromero (4), echuraev (4), NicolaLancellotti (4), zhuzilin (4), icemelon9 (3), mbrookhart (3), junrushao1994 (3), kparzysz-quic (3), leandron (3), rkimball (3), zackcquic (3), rohanmukh (3), akmaru (3), tmoreau89 (2), vegaluisjose (2), jcf94 (2), xqdan (2), hypercubestart (2), manupa-arm (2), wyc-ruiker (2), zxy844288792 (2), leeexyz (2), huochaitiantang (2), Johnson9009 (2), AndrewZhaoLuo (2), alter-xp (2), cgerum (2), elvin-n (2), rafzi (2), lmxyy (2), hsuanguo (2), apeskov (2), t-vi (1), mbaret (1), huajsj (1), altanh (1), electriclilies (1), hogepodge (1), gussmith23 (1), tristan-arm (1), CircleSpin (1), zxybazh (1), Beya2019 (1), zhanghaohit (1), anwang2009 (1), yuchaoli (1), llehtahw (1), mherkazandjian (1), vinceab (1), AlexanderSerov (1), Mousius (1), ekalda (1), fantasyRqg (1), JackYoustra (1), Jeffrey-Sima (1), nodeav (1), rijulg (1)