TVM Monthly - June 2020

As discussed with TVM PPMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestions are welcomed so that we can further improve the report.

Community

The community also welcomes new committer Siju Samuel (@siju-samuel), and new reviewers @wpan11nv and Matthew Brookhart (@mbrookhart).

This forum got 103k pageviews, 2.8k user visits in the last month (down from 112K pageviews and 3.1k user visits in May).

Logan Weber and Andrew Reusch published a new TVM blogpost on How TVM is Taming Tiny, a.k.a. Micro-TVM.

Relatedly, there was an Embedded Focused Online Meetup on June 18th 2020. You can check out the video of the meetup here. Look for posts labeled “Meetup” if you won’t want to miss out on the next online meetup!

Features and Improvements

In the previous month, the community has made good progress on performance improvement, operator/backend coverage, codebase refactor.

Here are a few highlights.

  • The addition of a CoreML codegen using the BYOC feature to offload subgraphs to Apple’s Neural Engine on iOS devices #5634

  • Addition of bfloat16 #5601

  • A new TVM Target ID registry to streamline target specification #5838

  • Optimized convolution performance for armv8 architectures #5754

  • Rust bindings refactor in TVM #5527, #5769, #5830

  • New Micro-TVM tutorials and documentation #5655

More improvements along with details are listed below.

Arith

  • ExtendedEuclidean merge impl to int_operator #5625

  • Rewrite simplify fix for Vectorized Cooperative Fetching #5924

Fixes

  • fix typo: anchor windoes should be anchor windows #5706

  • ReplicationPad support added #5708

  • Simplify Pattern API Implementations #5703

  • Remove deprecated opengl files #5711

  • Remove opengl runtime and cmake #5712

  • Rename tvm_dso_op to libtvm_dso_op #5714

  • Unify StrMapNode and MapNode #5687

  • Introduce runtime::String::CanConvertFrom #5718

  • Restore the StrMap behavior in JSON/SHash/SEqual #5719

  • Fix generating types like float44 and float88 #5722

  • Avoid downloading when TOPHUB_LOCATION is NONE #5720

  • codegen llvm: move nvptx-specific intrinsic handling into codegen_nvptx #5726

  • ROCm warp shuffles and reductions #5727

  • / delete mismatches in Relay VM #5735

  • Fix flaky test_topi_pooling.py:test_adaptive_pool #5736

  • Fix the values for test_fmod since it fails way too often otherwise #5723

  • fix small bug about dense_grad #5695

  • Clarify downstream consistency of TVMArgTypeCode #5742

  • Add Scatter to Topi/Relay/ONNX via hybrid script #5619

  • Clean WASM environment before build #5759

  • Second stage of Rust Refactor #5527

  • Fix gelu in PyTorch frontend, tighten numerical checks #5763

  • Add ShapePattern and DataTypePattern #5760

  • Make batch matrix multiplication on GPU tunable #5752

  • fix #5686: remove a overstrict assert in MakeAllreduce (#5686) #5785

  • CoreML codegen #5634

  • update vulkan build rule #5777

  • aten::norm support added #5776

  • @wpan11nv -> Reviewer #5790

  • Edit onnx parser to infer values in post order #5755

  • Support symbolic inputs of Fill #5762

  • support aten::type_as in the pytorch frontend #5787

  • Temporary disable fp16 type_as test for PyTorch Frontend #5799

  • Add config switch for nn.dense layer type. #5801

  • Move cpu-only frontend tests to a CPU stage #5807

  • Pin hand landmark network to version 0.7.4. #5813

  • Limit number of threads in all jobs #5815

  • Siju Samuel -> Committer #5817

  • Error msg update #5818

  • fix relay.build to not change the module argument in place #5822

  • Fix InferType when module contains Prelude #5797

  • Fix v0.6 CI #5832

  • Add a combine batch_matmul pass #5791

  • RepeatVector, Conv3DTranspose op support added #5833

  • Fix converting serialized quantized models #5839

  • ffi (Object): make class dict visible in instances #5843

  • tvm crate stage 3 of Rust refactor #5769

  • Additional canonicalization added for AddNode #5846

  • Suppress the warning messages when compile engine selects impls #5821

  • fix #5849 #5851

  • Introduce POD-C Compliant tvm::Map #5740

  • Add bfloat16 #5601

  • Add Python Classes for all Attrs #5853

  • Fix map assign issue in CI test #5854

  • Introduce Target Id Registry #5838

  • Update has_dtype/has_shape to pattern lang doc #5847

  • Add nn.batch_flatten as quantizable. #5805

  • Fail early before running invalid dynamic graphs #5856

  • Improve type handling in PyTorch frontend #5834

  • Matthew Brookhart -> Reviewer #5886

  • keep parameter names from PyTorch #5887

  • Improve quantized convolution performance for armv8 architectures #5754

  • HotFix the python intrin rule #5895

  • Rust Refactor Stage 4: Rewrite Rust graph runtime to use new APIs #5830

  • add a few gradients #5899

  • Add Binary Intrinsic ops to TIR Ops in C++ #5900

  • Allow implicit conversion in TVM FFI to tvm::Bool #5907

  • PyTorch frontend: fix handling of duplicate use of a model weight #5897

  • Don't multiply by constant 1 uselessly in dense #5911

  • Support any index matching for TupleGetItem #5909

  • Add MicroTVM tutorial using the STM32F746 discovery board #5655

  • Fix serialization of inf float value #5912

  • Fix CPU Thread Binding for Multiple Sockets #5918

  • CUDA device API & VerifyGPUCode pass update #5898

  • Update install.rst #5858

  • Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920

  • Add LegalizeInvalidAttach to legalize the compute_at location after split or fuse #5917

  • Update code_review.rst #5923

  • Don't rewrite expressions used outside of the pattern #5930

  • Add TupleGetItem to CSE #5931

  • Various update for CoreML codegen #5934

  • Update date in the NOTICE #5943

  • Update date in the NOTICE #5942

  • minor fix for release doc #5948

  • raise right error in tensorflow split op #5951

  • add rm xla attributes in tf docs #5950

  • Fix some typo errors in license header #5957

  • Fix OpenCL get_valid_counts errors due to intrinsic atomic_add #5857

  • Fix some typo errors in license header #5956

  • Amendments for gradients #5941

  • Fix the meaning of conv{1,2}d_transpose output_padding parameter. #5758

  • Make first order gradient graphs more efficient #5959

  • Raise an exception when extern function does not return Stmt #5964

  • Fix small typo in nn.conv2d_gemm_weight_transform #5925

  • Improve docker/bash.sh to handle git worktrees #5970

  • Install DNNL (OneDNN) to CI Environment #5936

  • Add Dynamic reshape to a dynamic namespace and add DynamicToStatic Pass #5826

  • Add meshgrid op in Relay, TOPI, Pytorch frontend #5961

  • Print right number of parentheses for LoadNode #5965

  • fix tvm relay testing tf.py typo error #5977

  • Migrate data structure of TargetNode #5960

  • Remove redundant function CreateBufferVecPtr #5982

  • Fix string argument mismatch in GraphRuntimeCodegen #5933

  • Demo showing how to run a pruned :hugs: model. #5975

  • VectorType::get with two parameters is deprecated in LLVM 11+ #5984

Refactor

  • relay.op.Op -> tvm.ir.Op #5705

  • Separate ArgTypeCode from DLDataTypeCode #5730

  • Remove legacy compute_expr.h #5738

  • Call::Halide => ProducerLoad, DSL/TIR decouple. #5743

  • Provide->ProducerStore, Realize->ProducerRealize. #5750

  • Migrate the tvm/tir/expr.h to constructor #5773

  • Migrate tir/stmt.h to use constructor. #5778

  • Migrate all Object construction to constructor. #5784

  • Cleanup unused classes #5789

  • Finish std::string->String updates #5793

  • Add tir prefix to type keys #5802

  • Deprecate FreeStmt #5890

  • Change Call.name to Call.op(RelayExpr) #5863

  • Range/IntSet API style consistency. #5953

Bugfix

  • Fix Compilation Error in CRT #5713

  • Fix runtime::String backward compatibility in JSON #5725

  • Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796

  • Fix reshape #5739

  • Fix building with LLVM-10 on macOS #5859

  • Add cuda 11 to contrib.nvcc.find_libdevice_path() #5902

Mxnet

  • Softmin, trunc op support added #5715

  • conv3d and conv3d_transpose addedx #5814

  • Add parser for contrib.box_decode #5967

Onnx

  • ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721

  • MaxRoiPool, Mod & Xor op support added #5729

  • Skip multiply with 1.0f constant for GEMM import #5800

  • Fix an issue with #5755 and add Batch norm unit tests. #5845

Tensorflow

  • StatefulPartitionedCall/PartitionedCall Ops support added #5617

  • Don't add cast for batch norm when type isn't changing #5731

  • Conv3d Transpose OP added #5775

Relay

  • Clear compile engine after task extraction #5724

  • Sparse to dense operator #5447

  • support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312

  • Conv3d_transpose op support added #5737

  • Fix for recursive let #5757

  • Fix Calibration Pass to Support Modules with Multiple Functions #5768

  • Add storage_order ignore in pooling layer. #5781

  • Tweak cublas/cudnn priority level #5820

  • ReverseSequence operator #5495

  • Add operation gather to relay. #5716

  • Skip Unknown Function Symbols #5888

  • Allow every runtime module to handle constants #5885

  • Some performance improvement to VM #5901

  • Add shape_of instruction #5855

  • symbolic max_output_size #5844

  • handle Tuple/TupleGetItem in first order gradient #5946

  • Add resnet-3d & Update network definitions for NHWC layout #5945

Frontend

  • Add parser support for shape and range #5329

  • Darknet support batch size for yolo #5688

  • Improve Control Flow and TensorArray #5699

  • Improve TF Parser to keep output nodes for saved_model #5794

  • Add parser support for relu6, leaky_relu, relu_n1_to_1, log_softmax #4805

  • Fix TF Dynamic input shape #5825

  • Support a few contrib ops in mxnet #5819

  • Check all unsupported ops before raising an exception #5929

Topi

  • Fix reshape usage in ARM schedule #5732

  • block sparse dense on cuda #5746

  • pass-by-value -> pass-by-const-reference #5783

  • fix sparse dense schedule on cuda #5803

  • fix strategy for sparse dense cuda #5782

  • Fix x86 conv2d template when tuning with unpacked layout #5938

Fix

  • Fix sequential cpp test #5745

  • Infer types in MergeComposite #5766

  • Fix some typos in git-clang-format.sh #5786

  • Fix recursive let for well formed check #5780

  • Recover global state after test_util.py #5824

Backport-0.6

  • fix a min/max simplify bug #5749

  • fix a min/max simplify bug #5761

  • Fix alpha_equal bug #5829

  • fix RemoveUnusedFunctions pass #5828

  • Add ConstantNode to IsAtomic #5831

  • Fix search path for libtvm_topi.so #5836

  • Fix Python debugger segfaults with TVM built with LLVM #5837

  • Fixed process termination routine in windows #5849

  • Fix annotation for multiply op (#4458) #5850

  • Fix NDArray SaveDLTensor declaration and implementation signature different #5852

  • fix serialization precision loss in float #5860

  • fix _parse_param bug #5861

  • Fix bias_add gradient #5862

  • Make sure to visit the arguments of inlined functions #5864

  • Fix Python syntax error in start_rpc_server_to_tracker.py #5865

  • Fixed crash caused by reversing bitwise operations #5866

  • Fix copy constructor #5867

  • fix small bug about dense_grad #5868

  • Fix compile errors of OpenCL FPGA backend #5869

  • Some Windows and MSVC fixes #5870

  • LRN only supports 4D tensors, remove it from alter_op #5871

  • fix topi.nn.global_pool layout=NHWC #5872

  • Fix hasattr by extracting Python error type from Windows error message #5873

  • Export GraphRuntime in tvm_runtime.dll #5874

  • Fix Base64OutStream portability issue #5875

  • Fix a bug in generating the search space #5876

  • Fix compilation of If-Elses #5877

  • Fix FuseBatchNorm output cast error if need_cast is True #5878

  • fskip of EliminateCommonSubexpr cannot always return false #5879

  • Fix multiple transfer issue in LoadUop module #5882

  • Enable streamlined GEMM execution #5893

  • Fixed a crash issue in TSIM driver #5894

  • Fix lambda lift pass for recursive call #5903

  • Fix conv2d alter op for arm cpu #5906

  • Fix alter op layout when calling a global var #5904

  • Fix dense x86 schedule #5905

  • End-to-end Inference with Chisel VTA #5896

  • keep div_mode during floordiv simplify #5927

  • keep div_mode during floordiv simplify #5922

  • fskip of EliminateCommonSubexpr cannot always return false #5880

Runtime

  • Add compile_shared option to linux compile utility fn #5751

  • Overload string operators #5806

  • Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770

  • Only initialize required module #5926

Tir

  • Remove CallNode.call_type in favor of attribute. #5937

  • Remove legacy HoistIfThenElse #5944

  • Improve Let/LetStmt support. #5949

  • Refine side effect analysis. #5954

Tflite

  • QNN support for TFLite 2.1.0 quantized models #5848

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)

Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (114), junrushao1994 (45), zhiics (43), masahi (28), comaniac (18), siju-samuel (14), mbrookhart (13), icemelon9 (12), anijain2305 (11), vinx13 (10), MarisaKirisame (9), ZihengJiang (8), yzhliu (8), kevinthesun (8), jroesch (8), ANSHUMAN87 (6), merrymercy (5), liangfu (5), FrozenGene (5), u99127 (5), cbalint13 (5), tmoreau89 (4), lixiaoquan (4), mbaret (4), srkreddy1238 (3), jwfromm (3), wpan11nv (3), robo-corg (3), Laurawly (2), kazum (2), t-vi (2), yidawang (2), areusch (2), Hzfengsy (2), binarybana (2), leonwanghui (2), mshawcroft (1), cchung100m (1), yongwww (1), ajtulloch (1), abergeron (1), antinucleon (1), sxjscience (1), shoubhik (1), maheshambule (1), wyc-ruiker (1), eric-haibin-lin (1), lhutton1 (1), roastduck (1), ehsanmok (1), zxy844288792 (1), notoraptor (1), yongfeng-nv (1), jackwish (1), dhruvaray (1), Lyken17 (1), ZhennanQin (1), Shawn-Inspur (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (32), yzhliu (26), t-vi (19), icemelon9 (13), comaniac (13), junrushao1994 (13), siju-samuel (11), zhiics (10), mbrookhart (8), ANSHUMAN87 (7), liangfu (5), kevinthesun (4), jroesch (4), lixiaoquan (4), abergeron (4), anijain2305 (3), FrozenGene (3), mbaret (3), cbalint13 (3), xqdan (3), Meteorix (3), jcf94 (3), leonwanghui (3), masahi (2), kazum (2), yongwww (2), inadob (2), antinucleon (2), maheshambule (2), trevor-m (2), lhutton1 (2), notoraptor (2), hypercubestart (2), badenh (2), dhruvaray (2), randxie (2), ceruleangu (2), merrymercy (1), vinx13 (1), hlu1 (1), jwfromm (1), kparzysz-quic (1), wpan11nv (1), leandron (1), tobegit3hub (1), gussmith23 (1), windclarion (1), LiangHao151941 (1), giuseros (1), Menooker (1), hzfan (1), tom-gall (1), deepakbabel23 (1), lsy643 (1), ymwangg (1), seanlatias (1), akosik-anyvision (1), handar423 (1), majiang31312 (1), wrongtest (1)

1 Like