TVM Monthly - May 2020

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

The community welcomes new PPMC members Masahiro Masuda (@masahi) and Zhi Chen (@zhiics). Meanwhile, this forum got 112K pageviews and 3.1K user visits last month.

Features and Improvements Highlights

In the previous month, the community has made good progress on performance improvement, operator/backend coverage, codebase refactor. Here are a few highlights.

  • A brand new TVM web runtime based on the WASM standard API. blog post
  • Use the pattern language to merge subgraphs in BYOC (Bring Your Own Codegen) #5656
  • Optimize conv2d Winograd algorithm with TensorCore #5485

Pull Requests

The below is a high-level summary of the PRs closed in the last month grouped by area.

BYOC (Bring Your Own Codegen)

  • Add additional check before re-using the cached match #5552
  • Remove kCompiler attr from external functions #5615
  • Pattern Language MergeComposite #5656
  • Support Tuple Output in C/DNNL Codegen #5701
  • Infer types in MergeComposite #5766

Pattern Language

  • Convert PatternGrouper to do pre-order, non-recursive analysis #5653
  • Remove constants from partitioned functions #5663
  • Add a check for null function attributes #5674
  • Add ConstantPattern #5689
  • Conditionally Embedding Constants in Partitioned Functions #5693
  • Simplify Pattern API Implementations #5703
  • Add ShapePattern and DataTypePattern #5760
  • Remove unnecessary print #5642

AutoTVM

  • Fix the runtime raise error #5586
  • Update XGBoost verbosity option #5649

TIR

  • text format printer considering future parsing use #5483
  • Remove buffer params from pass config. #5652

Relay

  • ReduceLogSumExp Operator support #5453
  • Math ops added #5502
  • enable blocking format in x86 conv2d and fold scale axis #5357
  • Fixed bug in attribute parsing for pool layers. #5582
  • Support symbolic newshape for Reshape #5429
  • Specify additional layouts in convert layout pass #5422
  • Safe check added for Merge Composite Call Node #5562
  • Memory planner (part 1) #5144
  • Improve Shape Func handling for Tuple inputs #5467
  • Relay updated with String #5578
  • Fix the creation of tuple of tuples in PartitionGraph #5616
  • Resize3d, Upsample3d op support #5633
  • Preserve type information in Merge Composite #5640
  • Add operator Correlation #5628
  • Move compiler_begin/end_op to local static objects #5622
  • affine_grid and grid_sample #5657
  • Support symbolic TopK, Ones, Zeros and Full #5459
  • Fix dataflow_pattern.rewrite() hang if Match in IR #5680
  • Fix segfault in pretty print when ObjectRef is null #5681
  • move fallback_device to config #5690
  • Replace build_config with PassContext #5698
  • Clear compile engine after task extraction #5724
  • Sparse to dense operator #5447
  • support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
  • Conv3d_transpose op support added #5737
  • Fix for recursive let #5757
  • Add operation gather to relay. #5716
  • Fix Calibration Pass to Support Modules with Multiple Functions #5768
  • Add storage_order ignore in pooling layer. #5781

Topi

  • Optimization of Conv2d Winograd algorithm on Tensor … #5485
  • AutoTVM incorrect measurement #5511
  • Fix bifrost spatial packing conv2d auto tune #5684
  • Fix reshape usage in ARM schedule #5732
  • block sparse dense on cuda #5746
  • Improve CUDA softmax scheduling #5600
  • pass-by-value -> pass-by-const-reference #5783
  • fix sparse dense schedule on cuda #5803
  • fix strategy for sparse dense cuda #5782

Arithmetic

  • Handle likely in IRMutatorWithAnalyzer #5665
  • ExtendedEuclidean merge impl to int_operator #5625
  • fix a min/max simplify bug #5749
  • fix a min/max simplify bug #5761

Runtime

  • Fix workspace #5503
  • Store nullptr PackedFunc as nullptr for better error propagation #5540
  • WebGPU support #5545
  • Hexagon driver for offloading kernels to simulator #5492
  • Setup lint, doc, test #5556
  • TVM WebAssembly JS Runtime #5506
  • Improve PackedFunc robustness #5517
  • Seg fault in WorkspacePool's destructor (#5632) #5636
  • Introduce runtime::Array #5585
  • Resolve constexpr issue in debug mode. #5651
  • Add compile_shared option to linux compile utility fn #5751

RPC

  • Call sync in CopyFromRemote and CopyToRemote #5512
  • Fix the multihop cpu case #5522
  • Improve RPCServer AsyncIO support. #5544

ONNX

  • LpPool Support added #5696
  • Skip ADD inside Gemm op when vector is zero #5697
  • ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721
  • MaxRoiPool, Mod & Xor op support added #5729
  • Skip multiply with 1.0f constant for GEMM import #5800

Tensorflow

  • StatefulPartitionedCall/PartitionedCall Ops support added #5617
  • Don't add cast for batch norm when type isn't changing #5731
  • Conv3d Transpose OP added #5775

Pytorch

  • expand bug fix #5576
  • Support max_pool2d_with_indices #5549
  • Add prim::device op #5584
  • ImplicitTensorToNum support added #5603
  • Matmul fix for batch_matmul #5604
  • ReflectionPad2d op #5624
  • Padding op support #5638
  • Minor bug fixes #5683
  • floor_divide support for squeezenet #5702
  • ReplicationPad support added #5708
  • aten::norm support added #5776

Mxnet

  • broadcast and logical op support #5461
  • MaxPool3d and AvgPool3d Ops support added #5614
  • Softmin, trunc op support added #5715
  • conv3d and conv3d_transpose addedx #5814

Tflite

  • Model importer to be compatible with tflite 2.1.0 #5497
  • Nit: Function names made consistent #5515
  • Select op support for tflite frontend #5486
  • GATHER_ND #5508
  • Quantize & Dequantize op #5394

Other frontend

  • Fully connected op conversion made in sync with TFLite #5510
  • ADD_N operator #5474
  • onnx, mxnet, pytorch mathops added #5561
  • abs, round, reciprocal, sign, softsign, hard_sigmoid ops support #5587
  • Gather nd bug fix for one dim support in tensorflow #5588
  • Add parser support for shape and range #5329
  • Darknet support batch size for yolo #5688
  • Improve Control Flow and TensorArray #5699
  • Improve TF Parser to keep output nodes for saved_model #5794
  • Add parser support for relu6, leaky_relu, relu_n1_to_1, log_softmax #4805

Docs

  • Fix bad restructured text formatting for VTA install guide #5541
  • Improve document in reflection #5593
  • Move the api docs to the api subfolder #5626
  • Fix the QNN TFLite tutorial build #5641
  • Clarify downstream consistency of TVMArgTypeCode #5742

CI

  • Install wasmtime for WebAssembly tests #5494
  • Update Jenkins ci-cpu to bionic #5555
  • Update the ci-gpu to the lastest build with the new vulkansdk. #5571
  • Fix clang-format error #5577
  • Enable llvm-11 and llvm-10 in build tests, recover webdocs. #5579
  • Update ci-lint to use the latest image that contains clang-format #5568
  • reintroduce docker stage for wasm tests #5565
  • Allow CI_PYTEST_ADD_OPTIONS to be unbound. #5644
  • Add log check to the sphinx gallery docs #5643
  • Move cpu-only frontend tests to a CPU stage #5807
  • Limit number of threads in all jobs #5815

Refactor

  • Non recursive partitioning #5493
  • Modularize the RPC infra #5484
  • IRModule is updated with String #5523
  • IR is updated with String #5547
  • Streamline ir/op Registry #5609
  • Migrate IRModule ObjectRef to not-null #5654
  • Migrate BuildConfig to PassContext. #5668
  • std::string -> String Migration in TIR nodes #5596
  • relay.op.Op -> tvm.ir.Op #5705
  • Separate ArgTypeCode from DLDataTypeCode #5730
  • Remove legacy compute_expr.h #5738
  • Call::Halide => ProducerLoad, DSL/TIR decouple. #5743
  • Provide->ProducerStore, Realize->ProducerRealize. #5750
  • Migrate the tvm/tir/expr.h to constructor #5773
  • Migrate tir/stmt.h to use constructor. #5778
  • Migrate all Object construction to constructor. #5784
  • Cleanup unused classes #5789
  • Finish std::string->String updates #5793
  • Add tir prefix to type keys #5802

Bug Fixes

  • Fix bug in rpc ring buffer shrink #5516
  • Fix remote device sync #5538
  • Fix bug in rpc ring buffer shrink (#5516) #5537
  • RPC Server error fix on Pynq FPGA #5607
  • Fix FloorMod Simplifier #5509
  • Fix FloorMod Simplifier #5505
  • Fix Python debugger segfaults with TVM built with LLVM #5685
  • Fix Compilation Error in CRT #5713
  • Fix runtime::String backward compatibility in JSON #5725
  • Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796
  • Fix reshape #5739
  • Make "none" DataType explicit #5491
  • Change "scalar" and "stack" in IDL from "inrout" to "in" #5487
  • Link necessary libraries when building runtime for Android #5496
  • Fixes for wasm32 target #5489
  • Reset target and wait for runtime initialization on connect. #5499
  • bump tophub rocm version #5504
  • Support CallNode inputs in qnn.concatenate #5360
  • Improve commentary for RingBuffer #5518
  • Add unit tests for ONNX PRelu and fix importer to pass them. #5521
  • LRN only supports 4D tensors, remove it from alter_op_layout #5520
  • Fix an issue with ONNX Upsample #5530
  • Cache PrimExpr instead of raw pointers in bound analyzer #5533
  • fix a few bugs with shape inference and types in the ONNX importer #5534
  • FP32 and Quantized Object Detection Model #5479
  • Add Onnx Pad v11 #5539
  • Changes to cpp_rpc to make it work on Android (+ Hexagon offloading) #5535
  • fix to reduce RAM size during loading model #5507
  • Fix MakeLoopNest for warp memory #5382
  • Add first stage of updating and rewriting Rust bindings. #5526
  • Load platform specific lib for tvmdsoop instead of the hard-coded tvm_dso_op.so #5542
  • Add tests for running micro on native arm hardware #5546
  • Apparently, ONNX Conv with no 'pads' defaults to zero padding #5548
  • clang-format the h,cc,m files. #5557
  • Fix conv2d alter op for arm cpu #5532
  • Fix topi test (/topi/tests/python/test_topi_conv2d_nhwc_winograd.py) for non tensorcore CI. #5563
  • Add clang-format and nodejs to ci-lint #5567
  • Enable clang-format. #5572
  • Allow ubuntu_install_darknet.sh to work in both 18.04 and 16.04 #5574
  • Add a quantized conv2 unit test for the tflite front-end #5558
  • Fix JSON graph dumping. #5591
  • Warp level reduction support for CUDA #5498
  • One more fix for concurrency count #5589
  • Improve robustness of the docs build #5583
  • Phase out WebGL #5570
  • Fix vulkansdk in the ci-gpu and upgrade to 1.2.135 #5566
  • Update ci-cpu to bionic #5554
  • Overestimate binary size for microTVM compiled binaries. #5590
  • Fix bug and re-enable RPC execution test #5436
  • Add ostream formatters for TargetPtr/TargetVal. #5592
  • Pattern Language, Matcher, Rewriter, and Function Paritioner #5231
  • Fix cross thread reduction #5551
  • Fix TVMArray layout on device #5599
  • Add debug mode to tempdir() #5581
  • Represent alignment information in LLVM IR #5598
  • Fix codegen for warp shuffle intrinsics #5606
  • Fix Topological Order calculation for DFPattern Language #5612
  • Fix a typo. #5611
  • Global MaxPool3d and AvgPool3d support #5098
  • Fix build error of iOS RPC #5621
  • Fix three typos #5620
  • isn't a CallNode sometimes #5623
  • Introduce config to PassContext. #5631
  • CMAKE fix #5630
  • Fix typo in test script #5635
  • Label Pattern Partitions #5627
  • Extend AttrPattern to support CallNode and FunctionNode attributes #5637
  • TFLite QNN Tutorial #5595
  • Increase bss section size. #5660
  • Upgrade XGBoost to latest #5658
  • Add buffer name when creating tensor bindings #5670
  • µtvm debug improvements #5648
  • enable amd_apu device on vulkan target #5659
  • Support TupleWrapper as direct ancestor of control flow ops #5639
  • add tvm.micro pydoc to sphinx #5661
  • Add a regression testcase for #5674 #5677
  • Fix C++ RPC build problem on Linux #5671
  • Misc doc fix #5672
  • Add a check Callback to the Pattern Paritioner #5646
  • Call previous excepthook in tvm_excepthook. #5675
  • Fix the shift column for scale_shift_nchw and scale_shift_nhwc in C topi #5679
  • Support more dtypes for TVMDSOOp #5694
  • fix typo: anchor windoes should be anchor windows #5706
  • Remove deprecated opengl files #5711
  • Remove opengl runtime and cmake #5712
  • In memory_plan, check if value is not None, instead of just checking value as boolean. #5700
  • Rename tvm_dso_op to libtvm_dso_op #5714
  • Unify StrMapNode and MapNode #5687
  • Introduce runtime::String::CanConvertFrom #5718
  • Restore the StrMap behavior in JSON/SHash/SEqual #5719
  • Fix generating types like float44 and float88 #5722
  • Avoid downloading when TOPHUB_LOCATION is NONE #5720
  • codegen llvm: move nvptx-specific intrinsic handling into codegen_nvptx #5726
  • ROCm warp shuffles and reductions #5727
  • / delete mismatches in Relay VM #5735
  • Fix flaky test_topi_pooling.py:test_adaptive_pool #5736
  • Fix the values for test_fmod since it fails way too often otherwise #5723
  • fix small bug about dense_grad #5695
  • Fix sequential cpp test #5745
  • Add Scatter to Topi/Relay/ONNX via hybrid script #5619
  • Clean WASM environment before build #5759
  • Second stage of Rust Refactor #5527
  • Fix gelu in PyTorch frontend, tighten numerical checks #5763
  • Make batch matrix multiplication on GPU tunable #5752
  • fix #5686: remove a overstrict assert in MakeAllreduce (#5686) #5785
  • CoreML codegen #5634
  • update vulkan build rule #5777
  • Fix some typos in git-clang-format.sh #5786
  • Edit onnx parser to infer values in post order #5755
  • Support symbolic inputs of Fill #5762
  • support aten::type_as in the pytorch frontend #5787
  • Temporary disable fp16 type_as test for PyTorch Frontend #5799
  • Add config switch for nn.dense layer type. #5801
  • Pin hand landmark network to version 0.7.4. #5813
  • Siju Samuel -> Committer #5817
  • Improve Pattern Language Docs #5676
  • Error msg update #5818

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities does not directly correspond to the community’s view about the significance of contributions.

tqchen (99), masahi (43), zhiics (35), tmoreau89 (31), comaniac (19), u99127 (15), junrushao1994 (14), ZihengJiang (13), FrozenGene (13), jroesch (12), anijain2305 (10), siju-samuel (8), kevinthesun (7), liangfu (7), Hzfengsy (7), yzhliu (6), mbaret (6), areusch (6), roastduck (6), icemelon9 (5), mbrookhart (5), MarisaKirisame (3), kazum (3), cchung100m (3), wpan11nv (3), ehsanmok (3), merrymercy (2), nhynes (2), yongwww (2), weberlo (2), maheshambule (2), binarybana (2), tom-gall (2), vinx13 (1), wweic (1), mshawcroft (1), vegaluisjose (1), soiferj (1), lixiaoquan (1), jwfromm (1), ajtulloch (1), inadob (1), kparzysz-quic (1), antinucleon (1), xqdan (1), hcho3 (1), yongfeng-nv (1), manupa-arm (1), TaoLv (1), robo-corg (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (46), siju-samuel (21), mbrookhart (16), areusch (11), ANSHUMAN87 (9), tmoreau89 (8), kparzysz-quic (8), kevinthesun (7), roastduck (6), kazum (5), jroesch (5), antinucleon (5), zhiics (4), cchung100m (4), u99127 (4), anijain2305 (3), vinx13 (3), comaniac (3), lixiaoquan (3), lhutton1 (3), tobegit3hub (3), icemelon9 (2), FrozenGene (2), junrushao1994 (2), mbaret (2), wpan11nv (2), maheshambule (2), dhruvaray (2), littlefish0123 (2), mei-ye (2), masahi (1), hlu1 (1), liangfu (1), jwfromm (1), t-vi (1), shoubhik (1), cbalint13 (1), hcho3 (1), trevor-m (1), notoraptor (1), spectrometerHBH (1), windclarion (1), LiangHao151941 (1), michalpiszczek (1), manupa-arm (1), huochaitiantang (1), samwyi (1), Xuxue1 (1), giuseros (1), Menooker (1), lsy643 (1), tom-gall (1), zhanghaohit (1), wsl-inspur (1)

2 Likes