TVM Monthly - June 2020

thierry · July 5, 2020, 9:14pm

As discussed with TVM PPMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestions are welcomed so that we can further improve the report.

Community

The community also welcomes new committer Siju Samuel (@siju-samuel), and new reviewers @wpan11nv and Matthew Brookhart (@mbrookhart).

This forum got 103k pageviews, 2.8k user visits in the last month (down from 112K pageviews and 3.1k user visits in May).

Logan Weber and Andrew Reusch published a new TVM blogpost on How TVM is Taming Tiny, a.k.a. Micro-TVM.

Relatedly, there was an Embedded Focused Online Meetup on June 18th 2020. You can check out the video of the meetup here. Look for posts labeled “Meetup” if you won’t want to miss out on the next online meetup!

Features and Improvements

In the previous month, the community has made good progress on performance improvement, operator/backend coverage, codebase refactor.

Here are a few highlights.

The addition of a CoreML codegen using the BYOC feature to offload subgraphs to Apple’s Neural Engine on iOS devices #5634
Addition of bfloat16 #5601
A new TVM Target ID registry to streamline target specification #5838
Optimized convolution performance for armv8 architectures #5754
Rust bindings refactor in TVM #5527, #5769, #5830
New Micro-TVM tutorials and documentation #5655

More improvements along with details are listed below.

Arith

ExtendedEuclidean merge impl to int_operator #5625
Rewrite simplify fix for Vectorized Cooperative Fetching #5924

Fixes

fix typo: anchor windoes should be anchor windows #5706
ReplicationPad support added #5708
Simplify Pattern API Implementations #5703
Remove deprecated opengl files #5711
Remove opengl runtime and cmake #5712
Rename tvm_dso_op to libtvm_dso_op #5714
Unify StrMapNode and MapNode #5687
Introduce runtime::String::CanConvertFrom #5718
Restore the StrMap behavior in JSON/SHash/SEqual #5719
Fix generating types like float44 and float88 #5722
Avoid downloading when TOPHUB_LOCATION is NONE #5720
codegen llvm: move nvptx-specific intrinsic handling into codegen_nvptx #5726
ROCm warp shuffles and reductions #5727
/ delete mismatches in Relay VM #5735
Fix flaky test_topi_pooling.py:test_adaptive_pool #5736
Fix the values for test_fmod since it fails way too often otherwise #5723
fix small bug about dense_grad #5695
Clarify downstream consistency of TVMArgTypeCode #5742
Add Scatter to Topi/Relay/ONNX via hybrid script #5619
Clean WASM environment before build #5759
Second stage of Rust Refactor #5527
Fix gelu in PyTorch frontend, tighten numerical checks #5763
Add ShapePattern and DataTypePattern #5760
Make batch matrix multiplication on GPU tunable #5752
fix #5686: remove a overstrict assert in MakeAllreduce (#5686) #5785
CoreML codegen #5634
update vulkan build rule #5777
aten::norm support added #5776
@wpan11nv -> Reviewer #5790
Edit onnx parser to infer values in post order #5755
Support symbolic inputs of Fill #5762
support aten::type_as in the pytorch frontend #5787
Temporary disable fp16 type_as test for PyTorch Frontend #5799
Add config switch for nn.dense layer type. #5801
Move cpu-only frontend tests to a CPU stage #5807
Pin hand landmark network to version 0.7.4. #5813
Limit number of threads in all jobs #5815
Siju Samuel -> Committer #5817
Error msg update #5818
fix relay.build to not change the module argument in place #5822
Fix InferType when module contains Prelude #5797
Fix v0.6 CI #5832
Add a combine batch_matmul pass #5791
RepeatVector, Conv3DTranspose op support added #5833
Fix converting serialized quantized models #5839
ffi (Object): make class dict visible in instances #5843
tvm crate stage 3 of Rust refactor #5769
Additional canonicalization added for AddNode #5846
Suppress the warning messages when compile engine selects impls #5821
fix #5849 #5851
Introduce POD-C Compliant tvm::Map #5740
Add bfloat16 #5601
Add Python Classes for all Attrs #5853
Fix map assign issue in CI test #5854
Introduce Target Id Registry #5838
Update has_dtype/has_shape to pattern lang doc #5847
Add nn.batch_flatten as quantizable. #5805
Fail early before running invalid dynamic graphs #5856
Improve type handling in PyTorch frontend #5834
Matthew Brookhart -> Reviewer #5886
keep parameter names from PyTorch #5887
Improve quantized convolution performance for armv8 architectures #5754
HotFix the python intrin rule #5895
Rust Refactor Stage 4: Rewrite Rust graph runtime to use new APIs #5830
add a few gradients #5899
Add Binary Intrinsic ops to TIR Ops in C++ #5900
Allow implicit conversion in TVM FFI to tvm::Bool #5907
PyTorch frontend: fix handling of duplicate use of a model weight #5897
Don't multiply by constant 1 uselessly in dense #5911
Support any index matching for TupleGetItem #5909
Add MicroTVM tutorial using the STM32F746 discovery board #5655
Fix serialization of inf float value #5912
Fix CPU Thread Binding for Multiple Sockets #5918
CUDA device API & VerifyGPUCode pass update #5898
Update install.rst #5858
Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920
Add LegalizeInvalidAttach to legalize the compute_at location after split or fuse #5917
Update code_review.rst #5923
Don't rewrite expressions used outside of the pattern #5930
Add TupleGetItem to CSE #5931
Various update for CoreML codegen #5934
Update date in the NOTICE #5943
Update date in the NOTICE #5942
minor fix for release doc #5948
raise right error in tensorflow split op #5951
add rm xla attributes in tf docs #5950
Fix some typo errors in license header #5957
Fix OpenCL get_valid_counts errors due to intrinsic atomic_add #5857
Fix some typo errors in license header #5956
Amendments for gradients #5941
Fix the meaning of conv{1,2}d_transpose output_padding parameter. #5758
Make first order gradient graphs more efficient #5959
Raise an exception when extern function does not return Stmt #5964
Fix small typo in nn.conv2d_gemm_weight_transform #5925
Improve docker/bash.sh to handle git worktrees #5970
Install DNNL (OneDNN) to CI Environment #5936
Add Dynamic reshape to a dynamic namespace and add DynamicToStatic Pass #5826
Add meshgrid op in Relay, TOPI, Pytorch frontend #5961
Print right number of parentheses for LoadNode #5965
fix tvm relay testing tf.py typo error #5977
Migrate data structure of TargetNode #5960
Remove redundant function CreateBufferVecPtr #5982
Fix string argument mismatch in GraphRuntimeCodegen #5933
Demo showing how to run a pruned model. #5975
VectorType::get with two parameters is deprecated in LLVM 11+ #5984

Refactor

relay.op.Op -> tvm.ir.Op #5705
Separate ArgTypeCode from DLDataTypeCode #5730
Remove legacy compute_expr.h #5738
Call::Halide => ProducerLoad, DSL/TIR decouple. #5743
Provide->ProducerStore, Realize->ProducerRealize. #5750
Migrate the tvm/tir/expr.h to constructor #5773
Migrate tir/stmt.h to use constructor. #5778
Migrate all Object construction to constructor. #5784
Cleanup unused classes #5789
Finish std::string->String updates #5793
Add tir prefix to type keys #5802
Deprecate FreeStmt #5890
Change Call.name to Call.op(RelayExpr) #5863
Range/IntSet API style consistency. #5953

Bugfix

Fix Compilation Error in CRT #5713
Fix runtime::String backward compatibility in JSON #5725
Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796
Fix reshape #5739
Fix building with LLVM-10 on macOS #5859
Add cuda 11 to contrib.nvcc.find_libdevice_path() #5902

Mxnet

Softmin, trunc op support added #5715
conv3d and conv3d_transpose addedx #5814
Add parser for contrib.box_decode #5967

Onnx

ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721
MaxRoiPool, Mod & Xor op support added #5729
Skip multiply with 1.0f constant for GEMM import #5800
Fix an issue with #5755 and add Batch norm unit tests. #5845

Tensorflow

StatefulPartitionedCall/PartitionedCall Ops support added #5617
Don't add cast for batch norm when type isn't changing #5731
Conv3d Transpose OP added #5775

Relay

Clear compile engine after task extraction #5724
Sparse to dense operator #5447
support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
Conv3d_transpose op support added #5737
Fix for recursive let #5757
Fix Calibration Pass to Support Modules with Multiple Functions #5768
Add storage_order ignore in pooling layer. #5781
Tweak cublas/cudnn priority level #5820
ReverseSequence operator #5495
Add operation gather to relay. #5716
Skip Unknown Function Symbols #5888
Allow every runtime module to handle constants #5885
Some performance improvement to VM #5901
Add shape_of instruction #5855
symbolic max_output_size #5844
handle Tuple/TupleGetItem in first order gradient #5946
Add resnet-3d & Update network definitions for NHWC layout #5945

Frontend

Add parser support for shape and range #5329
Darknet support batch size for yolo #5688
Improve Control Flow and TensorArray #5699
Improve TF Parser to keep output nodes for saved_model #5794
Add parser support for relu6, leaky_relu, relu_n1_to_1, log_softmax #4805
Fix TF Dynamic input shape #5825
Support a few contrib ops in mxnet #5819
Check all unsupported ops before raising an exception #5929

Topi

Fix reshape usage in ARM schedule #5732
block sparse dense on cuda #5746
pass-by-value -> pass-by-const-reference #5783
fix sparse dense schedule on cuda #5803
fix strategy for sparse dense cuda #5782
Fix x86 conv2d template when tuning with unpacked layout #5938

Fix

Fix sequential cpp test #5745
Infer types in MergeComposite #5766
Fix some typos in git-clang-format.sh #5786
Fix recursive let for well formed check #5780
Recover global state after test_util.py #5824

Backport-0.6

fix a min/max simplify bug #5749
fix a min/max simplify bug #5761
Fix alpha_equal bug #5829
fix RemoveUnusedFunctions pass #5828
Add ConstantNode to IsAtomic #5831
Fix search path for libtvm_topi.so #5836
Fix Python debugger segfaults with TVM built with LLVM #5837
Fixed process termination routine in windows #5849
Fix annotation for multiply op (#4458) #5850
Fix NDArray SaveDLTensor declaration and implementation signature different #5852
fix serialization precision loss in float #5860
fix _parse_param bug #5861
Fix bias_add gradient #5862
Make sure to visit the arguments of inlined functions #5864
Fix Python syntax error in start_rpc_server_to_tracker.py #5865
Fixed crash caused by reversing bitwise operations #5866
Fix copy constructor #5867
fix small bug about dense_grad #5868
Fix compile errors of OpenCL FPGA backend #5869
Some Windows and MSVC fixes #5870
LRN only supports 4D tensors, remove it from alter_op #5871
fix topi.nn.global_pool layout=NHWC #5872
Fix hasattr by extracting Python error type from Windows error message #5873
Export GraphRuntime in tvm_runtime.dll #5874
Fix Base64OutStream portability issue #5875
Fix a bug in generating the search space #5876
Fix compilation of If-Elses #5877
Fix FuseBatchNorm output cast error if need_cast is True #5878
fskip of EliminateCommonSubexpr cannot always return false #5879
Fix multiple transfer issue in LoadUop module #5882
Enable streamlined GEMM execution #5893
Fixed a crash issue in TSIM driver #5894
Fix lambda lift pass for recursive call #5903
Fix conv2d alter op for arm cpu #5906
Fix alter op layout when calling a global var #5904
Fix dense x86 schedule #5905
End-to-end Inference with Chisel VTA #5896
keep div_mode during floordiv simplify #5927
keep div_mode during floordiv simplify #5922
fskip of EliminateCommonSubexpr cannot always return false #5880

Runtime

Add compile_shared option to linux compile utility fn #5751
Overload string operators #5806
Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770
Only initialize required module #5926

Tir

Remove CallNode.call_type in favor of attribute. #5937
Remove legacy HoistIfThenElse #5944
Improve Let/LetStmt support. #5949
Refine side effect analysis. #5954

Tflite

QNN support for TFLite 2.1.0 quantized models #5848

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)

Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (114), junrushao1994 (45), zhiics (43), masahi (28), comaniac (18), siju-samuel (14), mbrookhart (13), icemelon9 (12), anijain2305 (11), vinx13 (10), MarisaKirisame (9), ZihengJiang (8), yzhliu (8), kevinthesun (8), jroesch (8), ANSHUMAN87 (6), merrymercy (5), liangfu (5), FrozenGene (5), u99127 (5), cbalint13 (5), tmoreau89 (4), lixiaoquan (4), mbaret (4), srkreddy1238 (3), jwfromm (3), wpan11nv (3), robo-corg (3), Laurawly (2), kazum (2), t-vi (2), yidawang (2), areusch (2), Hzfengsy (2), binarybana (2), leonwanghui (2), mshawcroft (1), cchung100m (1), yongwww (1), ajtulloch (1), abergeron (1), antinucleon (1), sxjscience (1), shoubhik (1), maheshambule (1), wyc-ruiker (1), eric-haibin-lin (1), lhutton1 (1), roastduck (1), ehsanmok (1), zxy844288792 (1), notoraptor (1), yongfeng-nv (1), jackwish (1), dhruvaray (1), Lyken17 (1), ZhennanQin (1), Shawn-Inspur (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (32), yzhliu (26), t-vi (19), icemelon9 (13), comaniac (13), junrushao1994 (13), siju-samuel (11), zhiics (10), mbrookhart (8), ANSHUMAN87 (7), liangfu (5), kevinthesun (4), jroesch (4), lixiaoquan (4), abergeron (4), anijain2305 (3), FrozenGene (3), mbaret (3), cbalint13 (3), xqdan (3), Meteorix (3), jcf94 (3), leonwanghui (3), masahi (2), kazum (2), yongwww (2), inadob (2), antinucleon (2), maheshambule (2), trevor-m (2), lhutton1 (2), notoraptor (2), hypercubestart (2), badenh (2), dhruvaray (2), randxie (2), ceruleangu (2), merrymercy (1), vinx13 (1), hlu1 (1), jwfromm (1), kparzysz-quic (1), wpan11nv (1), leandron (1), tobegit3hub (1), gussmith23 (1), windclarion (1), LiangHao151941 (1), giuseros (1), Menooker (1), hzfan (1), tom-gall (1), deepakbabel23 (1), lsy643 (1), ymwangg (1), seanlatias (1), akosik-anyvision (1), handar423 (1), majiang31312 (1), wrongtest (1)