TVM Monthly - May 2020

merrymercy · June 16, 2020, 8:03pm

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

The community welcomes new PPMC members Masahiro Masuda (@masahi) and Zhi Chen (@zhiics). Meanwhile, this forum got 112K pageviews and 3.1K user visits last month.

Features and Improvements Highlights

In the previous month, the community has made good progress on performance improvement, operator/backend coverage, codebase refactor. Here are a few highlights.

A brand new TVM web runtime based on the WASM standard API. blog post
Use the pattern language to merge subgraphs in BYOC (Bring Your Own Codegen) #5656
Optimize conv2d Winograd algorithm with TensorCore #5485

Pull Requests

The below is a high-level summary of the PRs closed in the last month grouped by area.

BYOC (Bring Your Own Codegen)

Add additional check before re-using the cached match #5552
Remove kCompiler attr from external functions #5615
Pattern Language MergeComposite #5656
Support Tuple Output in C/DNNL Codegen #5701
Infer types in MergeComposite #5766

Pattern Language

Convert PatternGrouper to do pre-order, non-recursive analysis #5653
Remove constants from partitioned functions #5663
Add a check for null function attributes #5674
Add ConstantPattern #5689
Conditionally Embedding Constants in Partitioned Functions #5693
Simplify Pattern API Implementations #5703
Add ShapePattern and DataTypePattern #5760
Remove unnecessary print #5642

AutoTVM

Fix the runtime raise error #5586
Update XGBoost verbosity option #5649

TIR

text format printer considering future parsing use #5483
Remove buffer params from pass config. #5652

Relay

ReduceLogSumExp Operator support #5453
Math ops added #5502
enable blocking format in x86 conv2d and fold scale axis #5357
Fixed bug in attribute parsing for pool layers. #5582
Support symbolic newshape for Reshape #5429
Specify additional layouts in convert layout pass #5422
Safe check added for Merge Composite Call Node #5562
Memory planner (part 1) #5144
Improve Shape Func handling for Tuple inputs #5467
Relay updated with String #5578
Fix the creation of tuple of tuples in PartitionGraph #5616
Resize3d, Upsample3d op support #5633
Preserve type information in Merge Composite #5640
Add operator Correlation #5628
Move compiler_begin/end_op to local static objects #5622
affine_grid and grid_sample #5657
Support symbolic TopK, Ones, Zeros and Full #5459
Fix dataflow_pattern.rewrite() hang if Match in IR #5680
Fix segfault in pretty print when ObjectRef is null #5681
move fallback_device to config #5690
Replace build_config with PassContext #5698
Clear compile engine after task extraction #5724
Sparse to dense operator #5447
support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
Conv3d_transpose op support added #5737
Fix for recursive let #5757
Add operation gather to relay. #5716
Fix Calibration Pass to Support Modules with Multiple Functions #5768
Add storage_order ignore in pooling layer. #5781

Topi

Optimization of Conv2d Winograd algorithm on Tensor … #5485
AutoTVM incorrect measurement #5511
Fix bifrost spatial packing conv2d auto tune #5684
Fix reshape usage in ARM schedule #5732
block sparse dense on cuda #5746
Improve CUDA softmax scheduling #5600
pass-by-value -> pass-by-const-reference #5783
fix sparse dense schedule on cuda #5803
fix strategy for sparse dense cuda #5782

Arithmetic

Handle likely in IRMutatorWithAnalyzer #5665
ExtendedEuclidean merge impl to int_operator #5625
fix a min/max simplify bug #5749
fix a min/max simplify bug #5761

Runtime

Fix workspace #5503
Store nullptr PackedFunc as nullptr for better error propagation #5540
WebGPU support #5545
Hexagon driver for offloading kernels to simulator #5492
Setup lint, doc, test #5556
TVM WebAssembly JS Runtime #5506
Improve PackedFunc robustness #5517
Seg fault in WorkspacePool's destructor (#5632) #5636
Introduce runtime::Array #5585
Resolve constexpr issue in debug mode. #5651
Add compile_shared option to linux compile utility fn #5751

RPC

Call sync in CopyFromRemote and CopyToRemote #5512
Fix the multihop cpu case #5522
Improve RPCServer AsyncIO support. #5544

ONNX

LpPool Support added #5696
Skip ADD inside Gemm op when vector is zero #5697
ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721
MaxRoiPool, Mod & Xor op support added #5729
Skip multiply with 1.0f constant for GEMM import #5800

Tensorflow

StatefulPartitionedCall/PartitionedCall Ops support added #5617
Don't add cast for batch norm when type isn't changing #5731
Conv3d Transpose OP added #5775

Pytorch

expand bug fix #5576
Support max_pool2d_with_indices #5549
Add prim::device op #5584
ImplicitTensorToNum support added #5603
Matmul fix for batch_matmul #5604
ReflectionPad2d op #5624
Padding op support #5638
Minor bug fixes #5683
floor_divide support for squeezenet #5702
ReplicationPad support added #5708
aten::norm support added #5776

Mxnet

broadcast and logical op support #5461
MaxPool3d and AvgPool3d Ops support added #5614
Softmin, trunc op support added #5715
conv3d and conv3d_transpose addedx #5814

Tflite

Model importer to be compatible with tflite 2.1.0 #5497
Nit: Function names made consistent #5515
Select op support for tflite frontend #5486
GATHER_ND #5508
Quantize & Dequantize op #5394

Other frontend

Fully connected op conversion made in sync with TFLite #5510
ADD_N operator #5474
onnx, mxnet, pytorch mathops added #5561
abs, round, reciprocal, sign, softsign, hard_sigmoid ops support #5587
Gather nd bug fix for one dim support in tensorflow #5588
Add parser support for shape and range #5329
Darknet support batch size for yolo #5688
Improve Control Flow and TensorArray #5699
Improve TF Parser to keep output nodes for saved_model #5794
Add parser support for relu6, leaky_relu, relu_n1_to_1, log_softmax #4805

Docs

Fix bad restructured text formatting for VTA install guide #5541
Improve document in reflection #5593
Move the api docs to the api subfolder #5626
Fix the QNN TFLite tutorial build #5641
Clarify downstream consistency of TVMArgTypeCode #5742

CI

Install wasmtime for WebAssembly tests #5494
Update Jenkins ci-cpu to bionic #5555
Update the ci-gpu to the lastest build with the new vulkansdk. #5571
Fix clang-format error #5577
Enable llvm-11 and llvm-10 in build tests, recover webdocs. #5579
Update ci-lint to use the latest image that contains clang-format #5568
reintroduce docker stage for wasm tests #5565
Allow CI_PYTEST_ADD_OPTIONS to be unbound. #5644
Add log check to the sphinx gallery docs #5643
Move cpu-only frontend tests to a CPU stage #5807
Limit number of threads in all jobs #5815

Refactor

Non recursive partitioning #5493
Modularize the RPC infra #5484
IRModule is updated with String #5523
IR is updated with String #5547
Streamline ir/op Registry #5609
Migrate IRModule ObjectRef to not-null #5654
Migrate BuildConfig to PassContext. #5668
std::string -> String Migration in TIR nodes #5596
relay.op.Op -> tvm.ir.Op #5705
Separate ArgTypeCode from DLDataTypeCode #5730
Remove legacy compute_expr.h #5738
Call::Halide => ProducerLoad, DSL/TIR decouple. #5743
Provide->ProducerStore, Realize->ProducerRealize. #5750
Migrate the tvm/tir/expr.h to constructor #5773
Migrate tir/stmt.h to use constructor. #5778
Migrate all Object construction to constructor. #5784
Cleanup unused classes #5789
Finish std::string->String updates #5793
Add tir prefix to type keys #5802

Bug Fixes

Fix bug in rpc ring buffer shrink #5516
Fix remote device sync #5538
Fix bug in rpc ring buffer shrink (#5516) #5537
RPC Server error fix on Pynq FPGA #5607
Fix FloorMod Simplifier #5509
Fix FloorMod Simplifier #5505
Fix Python debugger segfaults with TVM built with LLVM #5685
Fix Compilation Error in CRT #5713
Fix runtime::String backward compatibility in JSON #5725
Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796
Fix reshape #5739
Make "none" DataType explicit #5491
Change "scalar" and "stack" in IDL from "inrout" to "in" #5487
Link necessary libraries when building runtime for Android #5496
Fixes for wasm32 target #5489
Reset target and wait for runtime initialization on connect. #5499
bump tophub rocm version #5504
Support CallNode inputs in qnn.concatenate #5360
Improve commentary for RingBuffer #5518
Add unit tests for ONNX PRelu and fix importer to pass them. #5521
LRN only supports 4D tensors, remove it from alter_op_layout #5520
Fix an issue with ONNX Upsample #5530
Cache PrimExpr instead of raw pointers in bound analyzer #5533
fix a few bugs with shape inference and types in the ONNX importer #5534
FP32 and Quantized Object Detection Model #5479
Add Onnx Pad v11 #5539
Changes to cpp_rpc to make it work on Android (+ Hexagon offloading) #5535
fix to reduce RAM size during loading model #5507
Fix MakeLoopNest for warp memory #5382
Add first stage of updating and rewriting Rust bindings. #5526
Load platform specific lib for tvmdsoop instead of the hard-coded tvm_dso_op.so #5542
Add tests for running micro on native arm hardware #5546
Apparently, ONNX Conv with no 'pads' defaults to zero padding #5548
clang-format the h,cc,m files. #5557
Fix conv2d alter op for arm cpu #5532
Fix topi test (/topi/tests/python/test_topi_conv2d_nhwc_winograd.py) for non tensorcore CI. #5563
Add clang-format and nodejs to ci-lint #5567
Enable clang-format. #5572
Allow ubuntu_install_darknet.sh to work in both 18.04 and 16.04 #5574
Add a quantized conv2 unit test for the tflite front-end #5558
Fix JSON graph dumping. #5591
Warp level reduction support for CUDA #5498
One more fix for concurrency count #5589
Improve robustness of the docs build #5583
Phase out WebGL #5570
Fix vulkansdk in the ci-gpu and upgrade to 1.2.135 #5566
Update ci-cpu to bionic #5554
Overestimate binary size for microTVM compiled binaries. #5590
Fix bug and re-enable RPC execution test #5436
Add ostream formatters for TargetPtr/TargetVal. #5592
Pattern Language, Matcher, Rewriter, and Function Paritioner #5231
Fix cross thread reduction #5551
Fix TVMArray layout on device #5599
Add debug mode to tempdir() #5581
Represent alignment information in LLVM IR #5598
Fix codegen for warp shuffle intrinsics #5606
Fix Topological Order calculation for DFPattern Language #5612
Fix a typo. #5611
Global MaxPool3d and AvgPool3d support #5098
Fix build error of iOS RPC #5621
Fix three typos #5620
isn't a CallNode sometimes #5623
Introduce config to PassContext. #5631
CMAKE fix #5630
Fix typo in test script #5635
Label Pattern Partitions #5627
Extend AttrPattern to support CallNode and FunctionNode attributes #5637
TFLite QNN Tutorial #5595
Increase bss section size. #5660
Upgrade XGBoost to latest #5658
Add buffer name when creating tensor bindings #5670
µtvm debug improvements #5648
enable amd_apu device on vulkan target #5659
Support TupleWrapper as direct ancestor of control flow ops #5639
add tvm.micro pydoc to sphinx #5661
Add a regression testcase for #5674 #5677
Fix C++ RPC build problem on Linux #5671
Misc doc fix #5672
Add a check Callback to the Pattern Paritioner #5646
Call previous excepthook in tvm_excepthook. #5675
Fix the shift column for scale_shift_nchw and scale_shift_nhwc in C topi #5679
Support more dtypes for TVMDSOOp #5694
fix typo: anchor windoes should be anchor windows #5706
Remove deprecated opengl files #5711
Remove opengl runtime and cmake #5712
In memory_plan, check if value is not None, instead of just checking value as boolean. #5700
Rename tvm_dso_op to libtvm_dso_op #5714
Unify StrMapNode and MapNode #5687
Introduce runtime::String::CanConvertFrom #5718
Restore the StrMap behavior in JSON/SHash/SEqual #5719
Fix generating types like float44 and float88 #5722
Avoid downloading when TOPHUB_LOCATION is NONE #5720
codegen llvm: move nvptx-specific intrinsic handling into codegen_nvptx #5726
ROCm warp shuffles and reductions #5727
/ delete mismatches in Relay VM #5735
Fix flaky test_topi_pooling.py:test_adaptive_pool #5736
Fix the values for test_fmod since it fails way too often otherwise #5723
fix small bug about dense_grad #5695
Fix sequential cpp test #5745
Add Scatter to Topi/Relay/ONNX via hybrid script #5619
Clean WASM environment before build #5759
Second stage of Rust Refactor #5527
Fix gelu in PyTorch frontend, tighten numerical checks #5763
Make batch matrix multiplication on GPU tunable #5752
fix #5686: remove a overstrict assert in MakeAllreduce (#5686) #5785
CoreML codegen #5634
update vulkan build rule #5777
Fix some typos in git-clang-format.sh #5786
Edit onnx parser to infer values in post order #5755
Support symbolic inputs of Fill #5762
support aten::type_as in the pytorch frontend #5787
Temporary disable fp16 type_as test for PyTorch Frontend #5799
Add config switch for nn.dense layer type. #5801
Pin hand landmark network to version 0.7.4. #5813
Siju Samuel -> Committer #5817
Improve Pattern Language Docs #5676
Error msg update #5818

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities does not directly correspond to the community’s view about the significance of contributions.

tqchen (99), masahi (43), zhiics (35), tmoreau89 (31), comaniac (19), u99127 (15), junrushao1994 (14), ZihengJiang (13), FrozenGene (13), jroesch (12), anijain2305 (10), siju-samuel (8), kevinthesun (7), liangfu (7), Hzfengsy (7), yzhliu (6), mbaret (6), areusch (6), roastduck (6), icemelon9 (5), mbrookhart (5), MarisaKirisame (3), kazum (3), cchung100m (3), wpan11nv (3), ehsanmok (3), merrymercy (2), nhynes (2), yongwww (2), weberlo (2), maheshambule (2), binarybana (2), tom-gall (2), vinx13 (1), wweic (1), mshawcroft (1), vegaluisjose (1), soiferj (1), lixiaoquan (1), jwfromm (1), ajtulloch (1), inadob (1), kparzysz-quic (1), antinucleon (1), xqdan (1), hcho3 (1), yongfeng-nv (1), manupa-arm (1), TaoLv (1), robo-corg (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (46), siju-samuel (21), mbrookhart (16), areusch (11), ANSHUMAN87 (9), tmoreau89 (8), kparzysz-quic (8), kevinthesun (7), roastduck (6), kazum (5), jroesch (5), antinucleon (5), zhiics (4), cchung100m (4), u99127 (4), anijain2305 (3), vinx13 (3), comaniac (3), lixiaoquan (3), lhutton1 (3), tobegit3hub (3), icemelon9 (2), FrozenGene (2), junrushao1994 (2), mbaret (2), wpan11nv (2), maheshambule (2), dhruvaray (2), littlefish0123 (2), mei-ye (2), masahi (1), hlu1 (1), liangfu (1), jwfromm (1), t-vi (1), shoubhik (1), cbalint13 (1), hcho3 (1), trevor-m (1), notoraptor (1), spectrometerHBH (1), windclarion (1), LiangHao151941 (1), michalpiszczek (1), manupa-arm (1), huochaitiantang (1), samwyi (1), Xuxue1 (1), giuseros (1), Menooker (1), lsy643 (1), tom-gall (1), zhanghaohit (1), wsl-inspur (1)