TVM Monthly - May 2021

TVM Monthly - May 2021

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During May of 2021 we welcomed many new contributors to the project. Importantly we welcomed @slyubomirsky, @leandron, @trevor-m as new committers, and @manupa-arm as the new reviewer. Thanks to everyone for the hard work and contributions!

We continue to improve TOPI and frontend support, especially the ONNX importer, and dynamic support in various operators. Vulkan support is being enhanced greatly in this month, including better codegen and runtime. TensorIR, tracked in the GitHub issue, is in steady process, while AutoTensorIR, the new auto-scheduling system on top of TensorIR, is officially under the community discussion phase in the new RFC process. Moreover, we landed various improvements on Relay, TIR, executor, AOT, TVMC and CI.

This forum got 117k pageviews, 2.7k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

TensorIR

  • Add TIR Level Legalization Function Registration And Update Intrinsic Lowering Pass #7936
  • FlattenBuffer #7962
  • CreatePrimFunc from TE #7987
  • Add storage scope to PointerType #8017
  • Lower and build TensorIR #8044
  • change IntRV to ExprRV #8077
  • Verification of cached flags #8114
  • Structural Error Reporting #8121

Relay

  • Update SimplifyTranspose to correctly simplify rank changing layout transforms #7807
  • Pass instrument framework #7952
  • Fix parsing hierarchical attribute names #7976
  • Allow printing annotation in the Relay text printer for var #8000
  • Enable registering op with python #8002
  • Dismantler: Added handling of packed func #8004
  • add removeUnusedFunctions pass in vm memoryopt #8040
  • Add fast_softmax support in fast_math pass #8138

Executor & AOT

  • Introducing AOT in TVM #7785
  • Fix get_outputs on the vm with a single output #7902
  • Fix parameter dump #7903
  • Improved MLF to contain workspace info #7938
  • Turn reshape into nop in graph executor backend. #7945
  • Fix a memory leak in SetParams #7960
  • Remove lookup parameter function in AOT #7988
  • AOT Demo #8075
  • Fix executor for different compilers #8006

Frontend

  • Improve dtype detection in loop to fix onnx tests. #7934
  • Support gather_nd batch_dims attribute for TF/ONNX #8084
  • Fix bug with non-fp32 gemm in onnx frontend. #8011
  • More Unit Tests! #7956
  • QLinearConv Support #8007
  • add onnx reverse sequence op #7771
  • Fix Dense with 3d inputs #7753
  • add batch_dim support for gatherV2 #7951
  • Support nested layers recursively in keras frontend #7949
  • Move infer_value to _get_list_param #8051
  • Use axis.size instead of len(axis) #8060
  • Added test infrastructure for TF2 frozen models #8074
  • Pytorch Conv Transpose Padding Fix #7958
  • Quantized TANH operator support in TF Lite Frontend #8024
  • update ops and add MobileNet #7972

AutoScheduler & AutoTVM & RPC

  • Fix autoscheduler matmul without units. #7957
  • Support AutoTVM for int4 tensorcore #7831
  • Explicitly set HardwareParams in test_auto_scheduler_sketch_generation. #8018
  • Remove minimum seed constraint on XGB Tuner #7992
  • Add sparse conv2d(1*1) support for auto_scheduler #8065
  • Make RecordReader error-free #8066
  • Add workaround to alter op layout bug in task extraction. #8143
  • Fix autoscheduler tuning on sparse matrices where there are multiple with the same shape #7974
  • Remove warning which is adding too much noise #7975
  • Bugfix. Removed server forcing IPv4 protocol #7953
  • Replace 0.0.0.0 with 127.0.0.1 for client connections #7766
  • Make tracker jupyter friendly via PopenWorker #7961

TOPI & Operators

  • Fix recast of relay ops without attributes #8043
  • Support dilations in pooling operators #7928
  • Support dynamic slicing on first few axes, keeping the rest static #8068
  • Add uniform distribution generator wrt threefry PRNG #8041
  • Support generating data of any shape in threefry_generate #8085
  • Support dynamic indices size in gather_nd and scatter_nd #8105
  • Fix compute and schedule bugs for conv2d_winograd_nhwc on mali device. #8091
  • Fix strided slice type change. #8022
  • Fix arm_cpu bitserial schedule with elemwise ops. #7929
  • Custom schedule for standalone transpose in cuda #8030
  • Remove deprecated CUBLAS_TENSOR_OP_MATH flag #8130
  • Fix conv2d HWNC type strategy #8147
  • sort.cc added to runtime for nms compatability #7942
  • Support concat in recast #8028

Vulkan

  • Added dummy implementations for TVMStreamHandle operations #7969
  • Uniform buffer bugfix, minor cleanup #7966
  • Call VulkanDeviceAPI destructor on program exit #7997
  • Spir-V codegen, correct labels/blocks in WhileNode. #8013
  • Remove some interface block decoration #8102
  • Added spvValidate check after vulkan shader generation #8098
  • Broke out implicit device requirements into SPIRVSupport #8048
  • Add device capabilities to Target, use in codegen #8127
  • Split out vulkan.cc into separate distinct functionality #8157
  • Add a default warp size 1 for vulkan and opencl #8109

Codegen

  • Fix assertion errors in llvm backend when using llvm debug build #7959
  • Fix make_int4x cuda codegen vectorize #8137
  • Check for cuda include dir in /usr/include. #8135
  • Refactor cl_program generation #7834
  • Fix codegen for inf and erf #8054
  • Metal: Split kernels and compile them separately #7980
  • Custom dyld linker for iOS mach-o executable files #7875
  • Bugfix nvcc command tool that relies on the compile time env #7964
  • Correctly build with -runtime=c without -system-lib #7954

CI & Build

  • Ignore invalid git tags when running "git describe" in version.py #8009
  • Added llvm-12 to ubuntu1804_install_llvm.sh #8008
  • Add PAPI to docker images #8016
  • set environment variables for UTF-8, to prevent errors when running black #8089
  • Bump gpu image to cuda 11.0.3 #8119
  • Cleanup stale logs for auto-tuning #8160
  • Hotfix the CI after image update #8164
  • Zephyr: Add mps2_an521 board to the CI #7914
  • CI QEMU Install libpython3.8 #8020
  • Update CI images #8031
  • rev jenkins containers for #7995 #8155
  • Pin black version #8139
  • Fix black whitespace errors #8124
  • Removed unnecessary file creation from unit tests. #7998
  • Bumped Ubuntu version to 18.04 for ci_gpu #7970
  • Always docker/build.sh with --no-cache. #8038
  • Fix C-style cast linting errors #8106
  • Remove clang-7 compiler pin for vulkan #8107
  • Add flag to build static version of TVM runtime #8059
  • Update CMake warning flags #8152
  • Fix requires_gpu #8050
  • Fix post-merge conflict between #7785 and #7945. #7982
  • Mark zephyr install world-writable in docker image to unblock #7995. #8037
  • Fix AttributeError when TEST_DATA_ROOT_PATH is set #8047
  • Infinite recursive device_api.ext_dev call fix #7985

TVMC

  • A simplified TVMC API for python scripting (Part 1). #7823
  • Add support for the MLF to ‘compile’ command #8086
  • Fix tvmc for cases when uTVM is not enabled #8153
  • Fix minor issues in the tvmc tune CLI #8039
  • add the support of the cross compiler options #7922

BYOC

  • TensorRT: Fixes for explicit batch mode, Support reduce to scalar, Support split op #7967
  • Remove ext params stored in metadata from params to avoid duplication #7977
  • TensorRT: Add nn.batch_matmul, nn.layer_norm, erf #8005
  • TensorRT: Only allow 4d or 5d inputs to TRT nn.pad #8073
  • Verilator: Skip mobilenet test if Verilator is not available #8094

Docs

  • Fixes a link in doc. #8064
  • Fix some typos #8101
  • Update to show github version #7948
  • Update links and fix typos in docs and readme #7965
  • Update stale links #8111
  • Added developer documentation for DeviceAPI and Target. #8082
  • Fix docs of threefry_split and threefry_generate #8035
  • Fix Relay build docstring #7963
  • Fix typos and format in comments #8132
  • Fix typo in a comment #8129
  • Change a, n, l to A, N, L in tutorials/get_started/autotvm_matmul.py #8027
  • doc: fix description of stop_fusion annotation #8095
  • Add how to enable IR debug messages. #7978
  • Fix some syntax errors #8116
  • Fix deploy_sparse tutorial #7939

Misc

  • Improve sparse performance on ROCM #7935
  • Rename gpu to cuda, and bump dlpack to v0.5 #8032
  • Rename gpu to cuda in java/rust/typescript #8036
  • Support the new python array api with DLPack #7993
  • Add default python iterator for Map. #8061
  • Improve signal handling in python env. #7919
  • Rename asnumpy → numpy in NDArray #8083
  • Increase host memory size #7933
  • Avoid round-trip Target-str-Target conversions #8161
  • Use flatBuffersBuffer_ in EdgeTPURuntime::Init() #8034
  • remove self-include in runtime/container.h #8117
  • Add logging to the bundle. #8115
  • allow Module exits without del #8063
  • Adding workspace byte alignment #8019
  • Add shape, structural hash, and layout information to profiling #7894

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (74), comaniac (45), areusch (41), leandron (26), masahi (24), tkonolige (22), jcf94 (19), junrushao1994 (18), mbrookhart (16), jwfromm (11), manupa-arm (10), jroesch (9), tmoreau89 (9), merrymercy (8), FrozenGene (7), altanh (7), zhiics (5), trevor-m (5), mbaret (4), giuseros (4), Hzfengsy (4), u99127 (4), gromero (4), csullivan (4), hogepodge (4), icemelon9 (3), ZihengJiang (3), mehrdadh (3), elvin-n (3), liangfu (2), yongwww (2), lhutton1 (2), Lunderberg (2), zxybazh (2), MarisaKirisame (1), vinx13 (1), kevinthesun (1), vegaluisjose (1), t-vi (1), d-smirnov (1), ANSHUMAN87 (1), yidawang (1), xqdan (1), electriclilies (1), zxy844288792 (1), echuraev (1), mdw-octoml (1), zackcquic (1), apeskov (1), bernhardklein (1), AlexanderSerov (1), Mousius (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

Lunderberg (17), tkonolige (15), tqchen (11), mehrdadh (10), areusch (9), jwfromm (8), trevor-m (7), d-smirnov (7), csullivan (6), YuchenJin (6), masahi (5), giuseros (5), Hzfengsy (4), gromero (4), echuraev (4), NicolaLancellotti (4), zhuzilin (4), icemelon9 (3), mbrookhart (3), junrushao1994 (3), kparzysz-quic (3), leandron (3), rkimball (3), zackcquic (3), rohanmukh (3), akmaru (3), tmoreau89 (2), vegaluisjose (2), jcf94 (2), xqdan (2), hypercubestart (2), manupa-arm (2), wyc-ruiker (2), zxy844288792 (2), leeexyz (2), huochaitiantang (2), Johnson9009 (2), AndrewZhaoLuo (2), alter-xp (2), cgerum (2), elvin-n (2), rafzi (2), lmxyy (2), hsuanguo (2), apeskov (2), t-vi (1), mbaret (1), huajsj (1), altanh (1), electriclilies (1), hogepodge (1), gussmith23 (1), tristan-arm (1), CircleSpin (1), zxybazh (1), Beya2019 (1), zhanghaohit (1), anwang2009 (1), yuchaoli (1), llehtahw (1), mherkazandjian (1), vinceab (1), AlexanderSerov (1), Mousius (1), ekalda (1), fantasyRqg (1), JackYoustra (1), Jeffrey-Sima (1), nodeav (1), rijulg (1)