TVM Monthly - August 2021

vinx13 · September 1, 2021, 9:31pm

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During August of 2021 we welcomed many new contributors to the project. Importantly we welcomed @manupa-arm as a new committer, @electriclilies, @Mousius, @gromero, @Lunderberg, and @mdw-octoml as new reviewers. Thanks to everyone for the hard work and contributions!

We continue to improve TOPI and frontend support, especially on the ONNX importer and new frontends (PaddlePaddle, OneFlow). TensorIR is in steady progress, several schedule primitives have been added. We started adding features for Meta Schedule (AutoTIR), the new auto-scheduling system on top of TensorIR. We improved Relay with better profilers, executors and mixed-precision support. We landed Project API, an infrastructure for MicroTVM platforms. The community has also made various improvements to CI and documentation.

This forum got 122k pageviews, 2.8k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

TensorIR

Fix a typo in include/tvm/ir/function.h #8617
Add from_legacy_te_schdule attr to TE PrimFuncs #8641
LowerWarpMemory: remove unneeded shuffle when accessing from the same thread #8681
Storage Align #8693
Improve the error message in module.cc #8694
Parallel, Vectorize, Bind & Unroll #8716
Reorder #8767
CacheRead/Write #8863
enhance tir signed-unsigned cast #8706
Change Integer Implicit Conversion Rule to C Standard Way #8733
Support fold constants in specialize process #8803
Fix buffer scope in structural equal #8768
Add LowerTEPass, and convert calls to LowerTE to application of LowerTEPass #8802
GetBlockReadWriteRegion #8875
Bug fix for a floormod rewrite simplify rule #8852
Fix opaque access in buffer locator pass and match_buffer in region detector #8855
Fix printing ForNode annotations #8891

Relay

Extend FakeQuantizationToInteger to more ops #8241
Change Default "opt_level" of Sequential from 2 to 0 #8634
Support for non scalar zero points in qnn.conv2d #8620
Remove redundant cuda kernels caused by fusion of less & logical or #8618
Replace compile engine with TE compiler in the VM #8501
Dense alter layout fixed for packed input #8669
Refactor Interpreter to treat lowering as IRModule->IRModule rewrite. #8597
Extract dataflow matcher data structure into header #8774
Support of depthwise conv2d NHWC for Mali/Bifrost. #8584
Avoid Override Generic Op Strategy in "hls.py" #8614
Add batch_matmul convertion to FQ2I pass #8635
Expose FTVMInferCorrectLayout Python interface #8755
ToBasicBlockNormalForm immutability #8778
Disallow fp16 conversion for summation-like ops #8810
Add an option to rewrite the graph only once #8843

Frontend

add suppport for 'aten::upsample_bicubic2d' #8648
Support for nn.SiLU added #8753
Implement fake quant #8780
GRU layer #8781
Unified LSTM cell #8599
Add onnx opset v13 support for softmax, logsoftmax #8625
Add a PaddlePaddle Frontend #8645
Support TensorFlow < 1.13 for test_sparse_add #8647
add support for half_pixel_centers in resize #8689
in-place methods (sigmoid_ and tanh_) used by Tacotron2 were added #8692
Fix ELU conversion #8699
chunk and unsafe chunk #8718
Make from_tensorflow.py more GPU memory friendly. #8763
Add support for QLinearMul ONNX op #8773
Increased tolerance on onnx test_forward::test_aten #8798
extend repeat_interleave op for relay.Expr #8839
Simplify onnx input since name accesses are not reliable. #8867

Topi & Operators

Improve the performance of scatter_nd #8479
Float16 unittests for dense, conv2d, depthwise conv2d #8529
Sparse Conv2d Implementation for 3x3 kernels #8605
Add transpose_a/b for TensorRT batch_matmul #8607
minor bugs #8622
CMSIS-NN graph partitioner for softmax #8653
remove wrong fix in x86's dense_nopack operator #8687
densenet implementation fix #8704
Celu #8741
Bug fix for batch_matmul parameters mismatch #8785
Support select_last_index for argmin/max #8816

Executor & AOT

add set_output_zero_copy #8497
Remove unused parameter. #8580
Add get_input_index support. #8661
Add graph_executor get_input_index API. #8633
Remove unused variables in AOT tests #8686
Refactor AOT Test Utils parameters into object #8650
Convert AOT to TECompiler #8697
Run AOT tests against reference system #8744
Remove old AOT Executor code #8758
Change AOT from ExprVisitor to MixedModeVisitor #8856
Better reflect allocator names in CRT tests #8828
Remove unused allocated memory in crt initialization #8819
Switch profile flag to use new profiler #8710
Add benchmarking function to graph executor and vm #8807
Add end to end benchmarking of models #8858
Correctly link to PAPI #8691

AutoTVM & AutoScheduler & MetaSchedule

Fix deserization of workload registry entry #8662
Fix FLOPS estimation #8695
Use PopenPool instead of multiprocessing.pool #8492
Update AutoScheduler Docs – Units for cooldown_interval #8736
Fix exception handling in measure.py #8754
Configurable workload keys #8862
Fix use of fallback AutoTVM knobs in default scheduling #8707
Updated tolerances to avoid flaky unit test. #8723
Add parameter to allow caller to supply a Runner #8747
Extend tune_relay_x86 tutorial to measure default and kernel level tune #8794
Use PopenPool in XGBoostCostModel #8820
Traced Schedule #8623
Linear Congruential Random Engine #8642
Add Sampling Primitive SampleCategorical. #8817
Instruction and Trace #8615

Target & Codegen

[Texture support] TIR lowering and OpenCL support #7686
Allow spaces in target attributes #8587
Add support for AOT in external code generation tests #8591
Framework for device querying for all targets. #8602
Fix test_external_codegen, broken by #8591 #8630
Add __launch_bounds__ directive as part of the CUDA code generation #8678
Disallow fp16 conversion for arange op #8644
Several minor corrections to the device property query #8651
Correct passing of target-queried bool/int parameters #8660
Support fp16 input in cpu sort #8672
Fix builtin_fp16.h path according to: https://discuss.tvm.apache.org/… #8705
fix tir.erf codegen to opencl directly #8756
Check at codegen if the shader is within shared memory limits. #8746
Fix Vulkan runtime support #8791
Remote target.h #include #8813
Remove uses of LLVM from simulator runtime #8821
Reuse Hexagon SDK analysis across cmake files #8822
Rework tvm.target.hexagon() interface #8823
Change target string to Target object in the TE compiler and interpreter #8835
Add support for llvm parameter -mabi (aka -target-abi) #8860
Added the driver name to the vulkan target string. #8882

MicroTVM

Introduce --interface-api={c,packed} parameter #8280
Set the number of cores based on the VM sizing #8624
Fix platform name in base-box-tool #8612
Add skip for AOT test #8628
Project API infrastructure #8380
Add Arduino CLI support to ci-qemu #8504
Rev ci-qemu to 0.07 (add arduino-cli to ci-qemu) #8698
Zephyr Test Refactor #8713
Remove QEMU installation from RVM #8701
Fix warnings on Zephyr tests #8740
Fix ci-qemu Arduino install dir #8766
Project API Arduino support #8708
Fix base-box-tool command in README.md #8613
Fix: Test fails on hardware because of short timeout #8677
Fix platform name for qemu_x86 in Zephyr AOT tests #8762
skip aot checks when USE_MICRO=OFF #8772
Increase timeout to fix flaky tests #8846
Add Arduino RVM #8748
Update QemuTransport#write() to match new write API contract. #8761
Remove AOT Executor header from Arduino project #8857

VTA

Fix vta rpc server, refactor launch cond to not depend on sys.argv #8671
Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. #8731
VTA cmake change to include Verilator header for building tsim library #8797

Rust

Fix rust rt link #8631
Allow rust tvm build configuration through cargo features #8665
Memory leak #8714
Fix memory leak #2 #8725

Docs

Fix scipy docs inv #8619
Fix the usage of executors in tutorials #8586
TVM install addenda for M1 Macs #8568
Added documentation on pytest target parametrization. #8638
Updated target parametrization documentation #8724
Moved the generated tutorials folders into a _staging folder. #8735
refactor optimize GEMM on CPU tutorial #8825
Correct function signatures for CreateXPass functions in docs #8829
Add link to docs and tutorials in the README. #8832

CI & Build

Add caching to CMake #8373
Add pre-commit configuration to perform minimal checks locally #8382
Docker env for Arm® Ethos™-U55 Port #8514
Add USE_PAPI configuration to config.cmake #8567
Fix global pip cache disable change #8590
Move flake8 to ci_lint #8652
Refactor RPC test to isolate runs into a sub-function #8656
Restore the Rust CI testing after Docker image update #8657
Refactor/clean-up of docker/bash.sh #8670
Fix error when compile tvm with latest llvm14git #8682
Increase atol for CI #8712
Add Arm Compute Library to Arm CI unit test pipeline #8734
Enable custom images to be set in TVM Jenkinsfile #8721
Add PaddlePaddle dependency in docker file #8742
Rev ci-cpu to v0.76 #8786
Move Rust Format Script #8726
Install rust in ci-lint so cargo fmt can move to lint stage #8727
Allow Linker script files to be committed #8745
Add params.* to Jenkins file parameters #8771
Rev ci-qemu to v0.08 #8776
Allow Vulkan GPU access in docker container. #8784
Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG #8796
Update CPU and GPU Image #8853
Add synr==0.3.0 dependency for Docker images and Python dependency. #8801
A small bug fix on the CmakeLists #8826
Support for CMSIS-NN in Corstone300 Makefile #8831
Force CMake targets in top-level Makefile to run #8840
Update CI Lint Image Version #8841
make pre-commit hooks to run on every push instead of every commit #8888

Unit tests

Added cuDNN to default test targets #8383
Expose TVM pytest helpers as plugin #8532
Apply correct requires_gpu() pytest marks for parametrized target #8542
Parametrize ONNX Unit tests #8621
Use CTest for C++ tests #8809
Apply CPPLint to C++ Unit Tests #8827
Apply CPPLint to CRT Tests #8844
Bump up tolerance on flaky test #8850
Require cached fixtures to be copy-able, with opt-in. #8451
Remove duplicated PackedFunc C++ test #8812

Misc

Rename .asnumpy() to .numpy() #8659
Add DictAttrs to IRModule and refactor DictAttrs utility functions #8750
Force a gc between sphinx-gallery items to reclaim GPU memory. #8722
Restore License #8779
Remove reference to Apache Incubator status. #8837
Allow customized initializer in PopenPool #8789
Fix typos #8787
Remove unnecessary memset in TVMMutableFuncRegistry initialization #8818
Fix threadpool reset by killing threads before destroying their shared queue #8658
Fix ios_rpc build #8864
Change declaration order of unique_ptr objects to fix crash #8859

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)
Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

junrushao1994 (119), tqchen (56), comaniac (56), jroesch (50), areusch (46), leandron (39), tkonolige (28), masahi (26), Mousius (25), jcf94 (24), mbrookhart (22), tmoreau89 (19), mehrdadh (15), mbs-octoml (11), vinx13 (10), Lunderberg (10), mbaret (8), manupa-arm (8), jwfromm (7), gromero (7), electriclilies (7), hogepodge (6), YuchenJin (6), MasterJH5574 (6), u99127 (5), csullivan (5), guberti (5), FrozenGene (4), Hzfengsy (4), AndrewZhaoLuo (4), icemelon (3), trevor-m (3), xqdan (3), zxybazh (3), mikepapadim (3), zhiics (2), yzhliu (2), vegaluisjose (2), ANSHUMAN87 (2), huajsj (2), altanh (2), elvin-n (2), Johnson9009 (2), ZihengJiang (1), MarisaKirisame (1), anijain2305 (1), kevinthesun (1), liangfu (1), mshawcroft (1), t-vi (1), weberlo (1), lhutton1 (1), echuraev (1), cclauss (1), maheshambule (1), eric-haibin-lin (1), mdw-octoml (1), leeexyz (1), ganler (1), tom-gall (1), zhanghaohit (1), mwillsey (1), shingjan (1), CaptainDuke (1), grant-arm (1), Lyken17 (1), Wheest (1), chiwwang (1), robo-corg (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

Lunderberg (24), Mousius (20), mehrdadh (15), gromero (11), areusch (10), tkonolige (9), vvchernov (9), leandron (6), Johnson9009 (6), guberti (6), masahi (5), mbrookhart (5), jroesch (5), electriclilies (5), AndrewZhaoLuo (5), elvin-n (5), shingjan (5), tqchen (4), kparzysz-quic (4), jcf94 (4), huajsj (4), Hzfengsy (4), manupa-arm (4), mikepapadim (4), ganler (4), mbs-octoml (4), vinx13 (3), tmoreau89 (3), MasterJH5574 (3), lygztq (3), comaniac (2), jwfromm (2), junrushao1994 (2), slyubomirsky (2), u99127 (2), echuraev (2), mdw-octoml (2), zxybazh (2), euntaik (2), hgt312 (2), jtuyls (2), AnastasiaStulova (2), anwang2009 (2), ekalda (2), jiangjiajun (2), schell (2), kueitang (2), ryujaehun (2), alprnbg (2), zhiics (1), icemelon (1), ZihengJiang (1), MarisaKirisame (1), yzhliu (1), anijain2305 (1), kazum (1), apivovarov (1), yongwww (1), mbaret (1), altanh (1), lhutton1 (1), rkimball (1), csullivan (1), codeislife99 (1), wyc-ruiker (1), hogepodge (1), cclauss (1), maheshambule (1), ymwangg (1), CircleSpin (1), hzfan (1), zhanghaohit (1), wrongtest (1), alter-xp (1), monklof (1), sunjiweiswift (1), zhuwenxi (1), ashutosh-arm (1), grant-arm (1), jinhongyii (1), Lyken17 (1), syang-ng (1), lsy643 (1), Tantalus13A98B5F (1), chiwwang (1), adstraw (1), aasorokiin (1), ArmageddonKnight (1), ya0guang (1)