TVM Monthly - November and December

jroesch · January 27, 2021, 8:24am

TVM Monthly - November, and December 2020

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

The months of November and December were a busy time for the TVM community with many end of the year improvements leading up to TVM conference in early December. The conference was completely digital this year due to Covid-19 and allowing more people to attend. We had over 900 people register for the conference! Thank you to Chris Hoge and many others who planned and participated, especially the speakers who went through many extra levels of planning to pre-record their talks, and talk materials.

During this period we welcomed many new contributors to the project.

Importantly we welcomed @leandron and @tkonolige as reviewers, @mbaret, @mbrookhart, @jcf94, and @jwfromm as committers, and finally @Laurawly has joined the PMC.

Thanks to everyone for their hardwork and contributions!

On the technical side the past few months have had a few major themes:

Improved coverage and operator support at the Relay level and in the importers.
Major improvements and features for new auto scheduler based on the Ansor paper.
Work on production-izing uTVM including better CI integration, better board support, and a variety other features detailed below.
Lots of stability and bug fixes.

In January we hope to see even more progress on stability, and coverage and hopefully see some fresh

RFCs and new features landing. Happy new year everyone!

This forum got 113k pageviews, 2k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

Ci

Move back Keras to 2.4.3 #6810
Update to latest #6812
Torch 1.7 update staging #6825
Torch 1.7 update to mainline #6828
remove unused environment var #6824
Disable flaky tests #6841
Add python setup script #6844
Add more guidelines about additional local setup #6848
Update actions miniconda #6926
Install libc6-dev-i386 to compile wasm32 #6886
Disable ASF header checking on untracked files #6975
Hotfix CI (see #7010) #7025
Update docs style dependency. #7034
Switched to ACL 20.11 #7106
Add ACL to the CI for AArch64 #7122
make sure submodule checkout in clean state #7228
add Verilator regression test to CI #7098

Fixes

Update types slots for baseexpr and primexpr #6814
TF frontend: add softsign op #6799
Sparse2Dense support #5767
Update stale link #6820
Update stale link to new location #6819
Improve AArch64 depthwise convolution through smlal/smlal2 intrinsic #6711
Fix a bug in _stridedSlice() #6829
Fix Annotate Target to support freevars(relay.zeros, relay.ones etc) of any size (including zero) #6826
Update tophub link to new location #6838
Register shape functions for some image related ops #6373
Update version #6837
Support nested tuples #6809
Syntax error String::fromwe() should be String::from() #6846
Update SimplifyInference documentation #6853
Dynamic scale, zero point in qnn.op.dequantize #6849
Update path in arm_compute_lib.rst #6861
Using diagnostics for TVM Script #6797
‘tvmc tune’ --rpc-tracker and --rpc-tracker fail due to argparse misconfiguration #6822
Add smmla/ummla support in quantized Conv2d #6802
Support for more ops (conv1d) #6731
conv1d_transpose speedup #6840
Fix bug in processing script #6867
Update search for bitcode files for rocm 3.9 #6865
More flexible conv2d_NCHWc_int8 generic operator. #6714
Fix the build error for wasm-standalone app #6862
Improve the order of tutorials within a subsection #6880
TF frontend: add rint op #6818
Fix GCC8.1 and GCC8.2 template dispatch compilation issue #6893
Fix bug of generate-unmatched-brackets in CodeGenC::PrintSSAAssign #6887
Bump the versions #6896
Fix the cmake error for TensorRT #6902
Dynamic gpu tests, add dynamic strided slice to topi #6870
Bump up tophup cuda version #6908
Allow to set number of threads to TFLite interpreter #6901
Consolidate RPC Context helper functions #6915
Make TVMLogf platform-independent #6916
Handle int64 dtype in range #6918
Handle weights in shape func #6912
Minor improvements for auto-tuning tutorials #6919
Extract tasks via compile engine #6903
Make AutoScheduler handling of errors during measure consistent with AutoTvm #6909
timeout is not passed correctly #6924
Fix typo #6920
Add Handling of Zero Len Arguments #6923
Explicitly use new to avoid exit-time destruction of global state #6938
fix tvm.relay.build() docs #6940
Fix tir allocation with multiple lanes #6941
AArch64 base algorithm refactoring in LLVM #6907
Cleanup comments in partition pass #6951
Lazy import XGBoost #6939
Fix #6954 uTVM, fix when building the runtime for native hardware #6957
Raise ImportError for XGBoost #6969
Bug fix for debug builds in micro_session.cc #6968
Don’t fuse take with dynamic inputs #6979
bumping vta version #6977
Add Relay option to link parameters into runtime Modules #6917
Add initial support for quantized transpose convolution in Relay #6899
Fix GraphRuntime with -link-params over RPC #6985
Fix C runtime NDArray allocation bug #6991
fix rust installation in CI #7004
use target_host when it is set #6855
Dynamic Batch Support for TRT #6955
Fix call mkl gemm in mkldnn.py #7007
Fix trt Test #7016
Fix edge cases in const_int_bound and fold_scale_axis #6911
Add environment variable for controlling top-level printing and fix issue with pretty printing/parsing roundtrip. #6874
Save PyTorch frontend state in object #7023
Prefer IPv4 between IPv4 and IPv6 #7013
Add version 11.1 in finding CUDA libdevice #7033
remove print from GetInputIndex #7027
Implement Keras Conv1D #7035
Attach span information to tir nodes in tvmscript #6910
Part.1 metal default hardware params #7022
Support atomic for GPU backend (NVPTX, ROCm) #7051
fix missing ffi binding of relay.attrs.DequantizeAttrs #7054
Fix nvcc compile option to be compatible with older cuda #7065
PyTorch frontend: make type inference incremental #6900
Compatibility improvement with XGBoost v1.3.0 #7069
Fix QNN type inference #7074
Add a standalone regression test for running MergeComposite on a QNN graph #7080
Rollback changes to SSA begin/end scope for Store in C codegen #7073
Fix missing header inclusion. #7097
clean standalone CRT files in microTVM VM rebuild script #7095
support int64 #7105
update language version #7116
Remove support of double type #7118
add support for StridedSlice to input a single constant #6949
Fix spelling in some comments #7124
add cl support in tvmc runner #6831
Add is_floating_point and div_ PyTorch ops #7128
Fix a bug in batch_matmul that te.max should be used #7111
Update is_floating_point to handle bfloat16 #7133
PopenPoolExecutor #6959
Compatibility improvement with XGBoost v1.3.0 #7076
Added additional information to the from_onnx tutorial #7127
Fix a few OpNode argument field descriptions when registered #7140
Created CSourceMetaData module for model metadata #7002
Add is_floating_point() test and better type support in verify_model_vm() #7134
Add a FunctionPattern, remove unused attributes in CallPattern #7151
missing header for GraphRuntimeFactory in android_rpc #7160
Slight optimize the default injective schedule #7158
Fix PyTorch NMS conversion for negative scores #7137
Update the docs stale links #7169
Support hard_swish op #7174
Asymmetric padding and dilation in conv2d workload #7142
Makes sure g_last_error is null terminated. #7190
Fix ICHECK_NOTNULL in logging.h #7193
Fixed temporary lock_guard instances. #7199
Parallel Cuda Mergesort #7099
Support dynamic batch size #7194
slice_like support #7184
Simplify cast #7045
Support transpose #7214
Fix Get Valid Counts when the number of boxes is zero #7229
Fix code to work with cmake 3.2 #6952
avoid unexpected value(1) of search space when get length for uninitiated search space #7175
avoid unexpected value(1) of search space when get length for uninitiated search space" #7236
Add index boundary check in ConfigSpace.get() #7234
Add autoscheduler support to tvmc #7070
Do not use ICHECK in nnvm #7255
Add op_name in error message for Pool #7243
Remove check_correctness in AutoTVM, which is busted #7250
Restore class-aware NMS for detection models by graph rewrite #7154
add default value for leaky relu alpha #7259
Faster multi dimensional argsort by segmented sort #7195
Change the all #pragma once to ifdef include guard #7264
Reorder dynamic to static and simplify inference, lower DynamicToStatic Opt Level #7213
batch_matmul tensorcore schedule #7146
Ensuring atleast one thread block to handle empty tensor #7273
switch to more portable bash pipeline syntax #7274
Add MicroTVM support for the STM32F746 Discovery board #7225
Bring back numbered lists to TVM docs. #7290
Per-input, data dependence specification for shape func #7210
Initial BYOC support with c-source module #6950
A few typo fixes in the uTVM design doc. #7291
Change const to used dtype if it is passed in #7285
Import errors in deploy_detection.py and deploy_classification.py #7059
Fix test_topi_batch_matmul_tensorcore.py:test_batch_matmul requirement #7294
Add QEMU setup to uTVM tutorial. #7296
Add gpu instructions and results to deploy_sparse #7298
Parallelize cumsum in get_valid_counts #7123
Adding aten::unsqueeze_ to PT Frontend #7231
Fix an issue with dynamic functions overwritting call arg types #7295
Made tensorflow IsNan actually work #7320
Add a shape function and dynamic test for round #7324
Relax tolerance for dlpack <-> pytorch test #7325
get_top_results works on a copy of output #7327
Autoscheduler on ARM devices #7326
Fix warning showed with GCC10 #7336
use wrong flag name #7341
Add resource_handle to TVM_DLL_EXPORT_TYPED_FUNC. #7338

Rust

Add initial boilerplate for Rust diagnostic interface. #6656
: maintain error sources when propagating errors #6815
Flesh out IRModule methods #6741
Impl IsObjectRef for Array #7138
More Rust bindings for Attrs #7082

Bugfix

Fix leak when Packed callback arg is ndarray. #6821
Fix recursive GetFunction in runtime::Module #6866
Fix recursive GetFunction in runtime::Module #6859
Change debug_runtime to represent times in seconds internally #7227
Ensure CallNode attrs are not undefined before checking #7278

Autoscheduler

New layout rewrite option: Weight pre-transpose #6750
Bug fix for layout rewrite CI error in i386 #6830
Register auto-scheduler to more ops #6879
Fix the occasional crash caused by split memo #6883
Improve tuning with random cost model #6835
Add winograd support in tuning networks #6877
Tutorial on auto-scheduling a network for GPU #6882
Fix task scheduler restoring #6934
Improve warning messages #6935
Make SearchTask and ComputeDAG serializable #6842
Register workload when deserializing tasks #6927
Strictly select impl using plevel #6956
Task scheduler callbacks #6945
Fix task extraction #6965
Print the time used for measurement #6972
Check duplicated names in the compute dag #6973
Accelerate feature extraction for winograd #6981
Skip useless calls to RewriteLayout #6993
Use a smaller retry number #6996
Use a smaller iteration number for GA to acclerate the search #6994
Support layout rewrite for whole networks #6987
Add a tutorial on auto-scheduling a network for x86 CPU #7019
Misc update to hardware parameter and task scheduler #7020
Refactor task interface for tuning single operators #7028
Improve CPU matmul tutorial #7037
Remove max_registers_per_block in HardwareParams #7040
Add tips on resuming the search from a log file #7039
Delete deprecated file auto_schedule.py #7071
Fix winograd infer tize #7092
Support string processing to records #7144
Improve SearchTask and ComputeDAG serialization #7145
Python based measure callbacks #7143
Fix the conflict of thread pool in measurement #7166
Improve hyperlinks in tutorials #7167
Enable winograd for conv2d and layout rewrite for conv3d #7168
Update layout rewrite option setting for measuring #7156
Use VM to extract tasks for dynamic models #7173
Add custom build function #7185
Control compile engine cache via PassContext #7220
Costmodel enhancement #7197
Do not return naive schedule in tracing mode #7226
Fix for zero-rank output #7180
Add layout rewrite support for dense and batch matmul on CPU #7161
Fix layout rewrite for iterator with extent=1 #7279
Fix typos in feature extraction and cost model #7280
Bug fix & Custom sketch support #7260
Fix conv3d’s op strategy for auto-scheduler #7328
Separate shapes from DAG hash and enable schedule sharing #7317

Docs

Enable theme with header and footer. #6834
Improve windows build instruction via conda #6944
Update to reflect the repo name change #6967
Document cloudpickle dependency in tutorials #7049
Fix figure links #7268

Byoc

FTVMAnnotateTarget method signature update #6786
20.05 memory corruption temporarely fix #6724
Vitis-AI codegen integration #6343
Allocate GPU data buffers and transfer data when needed #6872
handling dynamism in TensorRT to support OD models #6905
Use channels attribute in Conv2D op converter #7011
Support batch norm for all ranks <=5, and all axes #7026
Added “nclude_non_call_ops” parameter to AnnotateTarget pass #6655
include_non_call_ops = False #7121
Fix TRT conversion for reshape op - ReshapeAttrs no longer has reverse #7205
Depthwise convolution support #7206
Fix weight conversion when first dim of weight shape is 1 #7253
Handle empty tuples in annotation pass #7288
removed ACL 20.05 limitations #7251
add support to dynamically load hardware library #7286

Relay

SparseTensorDenseMatMul support for Tensorflow #6685
If Operator Support #6730
Support MXNet-style attributes for reshape_like #6851
Fix first-order AD on tuple arguments #6827
Mix mode type inference #6704
roi_pool operator alter layout #6516
Keep node name in span #6885
Add dynamic SparseToDense #6892
Add space_to_batch_nd and batch_to_space_nd operators #6477
fix unparsable yolo formals #6963
Add scatter_nd op #6854
Add DefuseOps pass #6946
Clean up DCE tests in preparation for refactoring. #7029
Add support for Size op in Onnx frontend. #7031
Fix GPU NMS when return_indices is True #7005
MaxUnpool Operator #7036
Support deformable Conv2D NHWC #7075
Allow cuda cross compilation without physical device. #7063
Add softplus operator conversion to Onnx. #7089
Support deformable conv2d #7087
Auto extract onnx input shapes when possible. #7115
Add Sort Op to Relay #6978
Add fast_softmax #7163
Stack should take exprs that evaluate to tuples #7130
Remove reverse attribute from reshape and reverse_reshape operators. #7086
Fix mismatch between Onnx Prelu definition and importer. #7208
Allow condition in if op to be an array. #7215
Fix reshape header file #7218
Threefry PRNG: splittable and stateless #7083
Compare against onnxruntime more consistently during testing #7300
Add more gradients #7323
Fix tanh gradient and update tests to use downstream gradient #7340
Add numpy style cumsum op #7334

Fix

Add task_ci_python_setup.sh to the arm CI #6850
Skip RPC tests when using multiprocessing’s spawn method #6858
disable cuda test for argwhere #7042
Improve error messages and docs #7064
Remove debugging print statement #7072
Update tune_relay_vta.py to support single board #7100
Fix using num_workers in omp #7078
Add dense strategy for mali #7181
Tensor core type issue for dense #7187
Import tvm.testing in tutorials that use it #7248
Remove leftovers from check_correctness #7272
Add flop counts to cublas #7297
Infer input shape in sparse_dense_padded’s alter_op if one does not exist #7308

Μtvm

Add virtual machine, test zephyr runtime on real hardware #6703
Fix problems with the debug flow #6930
Remove binutils module, no longer needed after microTVM refactor. #6947
Demote session traffic logs to DEBUG log level #6989
Include required CMSIS headers in Cortex-M micro kernel. #6988
Minor fixes to the Reference VM tutorial #7012
Modify reference VMs to support new µTVM demo #7001
Fix paths in the reference VM tutorial and add vbguest recommendation #7015
Allow for platform-specific malloc in runtime #6948
Add platform timer and RPCTimeEvaluator to enable AutoTVM #6964
Raise a better error when project_dir does not exist #7165
Add documentation #7164
Avoid listing links when probing serial ports #7265
Fix two warnings when deprecated forms are used #7269
Remove need for -mcpu=native #7276
Add ST STM32F746 disco board to tflite tutorial script #7254
Add TVMPlatformGenerateRandom, a non-cryptographic random number generator. #7266

Topi

Enable scatter_add on GPU #6856
deformable_conv2d in NHWC #6999
Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018
cuda for argwhere #6868
GPU scatter_add using atomic #7044
GPU scatter 1D via sorting based approach #7056
sparse_dense Op sparse_data input added #6889
Fix GPU Dynamic Op Schedule #7117
Simplify GPU NMS IR and optimize a bit #7136
cuda reduction schedule #7131
GPU sort IR refactor to enable sort by keys #7157
Parallelize GPU NMS inner loop #7172
Treat undefined elements as constants in Array #7232
Improve memory layout inside GPU NMS kernel #7257
Minor perf improvement for GPU scatter #7233
Make cumsum IR reusable, add thrust scan #7303
Rewrite GPU argwhere using exclusive scan #7314

Tir

Make loop unrolling in LoopPartition optional #6823
Do not show meta data when printing TIR #6881
Add spans to all ExprNodes #6860
Enforce allocate to use the correct var pointer hint. #7216
Support Return in TIR #7084
ForNode introduce thread binding and remove legacy field #7306

Community

New committer – @mbaret #6873
New committer – @mbrookhart #6936
New reviewer @leandron #7112
@jcf94 → Committer #7141
@Laurawly => PMC #7307
@tkonolige → Reviewer #7311
@jwfromm → Committer #7316

Frontend

Support NonMaxSuppressionV5 #6933
Prevent tflite frontend from producing int64 shape/parameters #7030
Handle case where output of model is python list #7088
Unnecessary default warning msg changed to debug #7119
Support mode=instance,spatial for l2_normalize #7062
Remove seemingly invalid SoftPlus #7189
add _npi_subtract_scalar #7191
add _npi_stack, issue #7186 #7209
Densify Op added #7048
Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86 #7148

Patternlang

Remove unnecessary check #6958
Add Syntatic Sugar to the C++ pattern API and support DataType Attribute Matching #7120
Add If pattern #7282
Add a relay LetPattern #7332

Verilator

Integrating and simulating hardware accelerators in TVM #6971
Multiple fixes #6995
regression tests #7000
Separate Verilator dependency from Chisel dependencies #6986

Vta

Fix the shape check for vta dense strategy #6983
add device_annot support in graphpack #6125
update 3rdparty submodule #7081
update version of 3rdparty vta-hw submodule #7271

Auto scheduler

Support Auto scheduler and NHWC convolution on ROCm #7038
Add target host to measure record #7046
Fix infer tile size for NHWC winograd for CUDA #7068
Mali Support #7132

Tflite

added scalar axis value handling in reduce #6970
add support for float16 #7093
pack operation extedned with const args #6984
Reshape - support different qnn params for input and output #7159
Quantized version of unit test for Dense #7113
Added ability to infer shapes for arguments #7293
Strided slice handling of shrink_axis_mask improved #6998

Onnx

NMS in ONNX #6839
Fix a bug with reshape imports when an initialized target shape is used more than once #7109
Fix issues for Clip and RoiAlign #7237

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)

Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (109), comaniac (75), zhiics (68), masahi (40), junrushao1994 (33), FrozenGene (31), merrymercy (30), tmoreau89 (23), giuseros (23), mbrookhart (20), jroesch (19), kevinthesun (19), tkonolige (17), anijain2305 (15), jcf94 (14), mbaret (12), Laurawly (11), jwfromm (10), vinx13 (9), icemelon9 (8), MarisaKirisame (8), leandron (8), trevor-m (8), ZihengJiang (7), areusch (7), liangfu (6), u99127 (6), electriclilies (5), altanh (5), yzhliu (4), ANSHUMAN87 (4), lhutton1 (4), manupa-arm (4), siju-samuel (3), wweic (3), t-vi (3), yongwww (3), srkreddy1238 (2), lixiaoquan (2), apivovarov (2), soiferj (2), antinucleon (2), cbalint13 (2), spectrometerHBH (2), insop (2), adelbertc (2), kparzysz-quic (1), cchung100m (1), rkimball (1), gussmith23 (1), jmorrill (1), mwillsey (1), Hzfengsy (1), hogepodge (1), wrongtest (1), samskalicky (1), anwang2009 (1), jtuyls (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

merrymercy (33), mbrookhart (27), comaniac (22), tqchen (18), areusch (17), yongwww (14), masahi (13), tkonolige (11), jwfromm (8), lixiaoquan (7), trevor-m (7), antinucleon (7), d-smirnov (7), codeislife99 (7), jroesch (6), giuseros (6), zhiics (5), kevinthesun (5), FrozenGene (5), ANSHUMAN87 (5), alexgl-github (5), anijain2305 (4), vegaluisjose (4), liangfu (4), t-vi (4), tobegit3hub (4), zhanghaohit (4), euntaik (4), tmoreau89 (3), jcf94 (3), rkimball (3), hypercubestart (3), altanh (3), manupa-arm (3), alter-xp (3), TylerADavis (3), siju-samuel (2), ZihengJiang (2), kazum (2), junrushao1994 (2), slyubomirsky (2), mbaret (2), leandron (2), cbalint13 (2), gussmith23 (2), insop (2), hogepodge (2), wrongtest (2), CaramelFc (2), lsy643 (2), Wheest (2), icemelon9 (1), yzhliu (1), vinx13 (1), Laurawly (1), wweic (1), mshawcroft (1), u99127 (1), xqdan (1), electriclilies (1), csullivan (1), tom-gall (1), hzfan (1), tristan-arm (1), leonwanghui (1), solin319 (1), Beya2019 (1), Meteorix (1), hgt312 (1), samskalicky (1), anilmartha (1), leowang1225 (1), bernhardklein (1), Xuxue1 (1), domin1985 (1), rohanmukh (1), 0x00-pl (1), adelbertc (1), BhushanIMG (1), corehalt (1), dsteger (1), echuraev (1), Light-of-Hers (1), TaylorZowtuk (1)