TVM Monthly - March 2021

haichen · April 16, 2021, 6:53pm

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During the March of 2021 we welcomed Andrew Reusch (@areusch) as a new committer, and Bohan Hou (@spectrometerHBH) and Siyuan Feng (@Hzfengsy) as new reviewers to the projects. Thanks to everyone for their hardwork and contributions!

This forum got 122k pageviews, 2.9k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

Relay

Fix foldconstant involving dropout #7550
Modify some passes to not stack overflow on many lets. #7558
BiasAddRel does not check for a negative index being out of bounds #7554
Fix Bug Which Cause Negative Left Shift Op #7432
add ShapeFunc for tanh #6898
Fix relay op strategy for cuda dense int8 #7586
add ShapeFunc for one_hot op #7490
Simulated Quantize and Dequantize #7613
Fix issue when group attribute isnt defined in convtranspose. #7655
Simplify consecutive transpose/layout_transform #7656
Relax simulated qnn tests to prevent flakiness. #7684
Add TopPattern to nn.dropout #7685
Factor out first-order AD to a module pass #7677
Raise error when user provides an input not in the onnx graph. #7699
Add cumprod #7722
Add a converter for ATen Nodes #7747
ConcretizeLike and EliminateIdentity rewrites for SimplifyExpr #7731
Remove pop that interferes with nested loops. #7781
Logical Not Shape Function #7820
Avoid stack overflow when using PostOrderRewrite #7588
SimplifyCastLike/Cast and ConcretizeFullLikeRewrite rewrites for SimplifyExpr #7827
A new NMS op variant for ONNX NMS / TF Combined NMS #7796

Autoscheduler

Autoscheduler layout rewrite pass to VM #7516
Querying and sampling in task extraction #7571
Fix incorrectly array context device and hide info at the beginning #7632
Add function name in message #7703
Add sparse dense end to end model tuning support for x86/arm cpu & Some bug fix #7635
Add task.desc for its function name #7794

Fixes

fuse constant padding into conv kernels #7515
Fix: cuda codegen vectorize cast #7561
Add TIR While node #7425
Support conds depend on outer loop vars inside tensorize scope #7497
Add SPIR-V lowering for While node #7574
compile engine dump tir and shape funcs #7552
Fix a flaky test #7580
Fix: install script regarding get-pip.py during docker build #7579
Add support for 20.11 Ethos-N driver stack release #7506
Fixes for using Python APIs from Rust. #7085
Add segment sum Op to relay and 7 corresponding TF Ops , fix scatter_add dynamic bug #7562
Make TRT runtime robust to empty or weird subgraphs #7581
Support Bool buffer argument #7591
Fix for dynamic batch size conv2d nhwc #7598
Fix groups cannot divide output channel count error for deconv when groups>1 #7595
Guarantee data input is the first argument #7592
Support negative axis for gather #7600
Support passing 64 bit scalar #7572
Fix autotuning, broken in #7337 #7566
Sparse dense tuning support with custom sketch rule #7313
BF16 support #7014
Fix bug in AutoInlineElemWise and implement AutoInlineBroadcast #7602
Add logging to diagnose flaky ci-qemu test #7610
Move SimplifyConvPad to a new pass and don't enable it by default #7603
Added MaybeAlign to CreateAtomicRMW calls to fix build for LLVM13 #7617
Minor update to TIR sort to make it work on VK/SPIR-V #7607
Allow cuDNN in non-CUDA non-system dir #7608
Fix RelayVM for 32-bit platforms #7605
Fix TVM compile without LLVM #7621
Prevent host Vulkan SDK blocking cross-compilation #7609
Fix pushconstants offset calculation for 32 bit values #7620
Introduce Model Library Format export format #7533
Improve tensor mismatch ICHECK message #7335
Add PreOrderVisit and VisitPrimFuncs #7627
Fix CALL16 reloc at 0x290 not against global symbol #7634
Add Test Case to Cover Bug Fix by PR#7432 #7601
Add HW param for Vulkan tuning #7626
Introduce Apple BNNS backend #7299
Combine USE_VM_PROFILER and USE_GRAPH_RUNTIME_DEBUG into a single flag USE_PROFILER #7637
fix missing qparams in aten::upsample_nearest2d #7646
fixed ci-gpu docker environment path typo. #7648
fix build break in Android rpc #7664
Fixed strided_slice size after nms into TFLite frontend #7659
Remove pytest dependency in arm_compute_lib.py #7556
Add nvcc support for c source module #7668
Fix relay.testing.darknet convert_image #7667
Declare int64 capability by default #7681
fix:getcwd not work on android platform #7390
Default value for graph_runtime Init lookup_linked_param_func #7676
allow user supplied work dir #7670
Cast operator adapted for MLIR-based convertor #7639
Explicitly free TensorRT engine and context in destructor. #7702
Workaround for zero size allocation #7691
Fix auto scheduler crash when set with consumers is empty #7708
Fix graph_tuner ancestor duplication #7704
Fix memory leaks in Metal runtime #7714
Rev ci-qemu to 0.02 (Introduce onnx python dependency) #7728
fix heap corruption from bad allocation #7735
Fix missing <cassert> header, caused compilation failure. #7740
Better grouped convolution for CPU targets #6137
Rename TVMContext to Device #7721
Add support for Ethos-N 21.02 driver stack release. #7628
Bump ci-cpu and ci-arm container versions #7745
detect iter affine map with predicate #7752
Bring back the stack size optimization #7756
Clean up uTVM demo runtime, add ONNX model test and tutorial #7557
Make Autopad static when possible #7755
Make more explicit error message during sim lib loading failures. #7761
Grammar fix #7622
Rename GraphRuntime to GraphExecutor #7653
normalize iter affine map expr to PrimExpr #7759
Add support for using the VM across the RPC boundary. #7746
Fix typo in include/tvm/runtime/crt/crt.h and NEWS.md #7770
Fix go bindings #7696
fix shift out of type bounds #7733
Add support for target object with host field compatible with previous api #7534
Make rpc proxy jupyter friendly via PopenWorker. #7757
Bugfix for reduction that involves multi-outs with where cond #7692
Limit OpenCL built-in vector lanes to 2, 3, 4, 8, 16. #7777
Subspace division #7760
Profiling interface for VM and Graph Runtime #7624
Fix RVM onnx dependency and Zephyr document update #7774
Reenable compilation of TVM runtime for Hexagon #7784
Support matching tuples, call nodes, and functions with variable numbers of inputs #7754
Disable Rust CI #7793
Fix empty target and host for autotvm task #7791
Try to fix qemu hangs in the CI #7590 #7769
Fix compilation errors with clang 11 #7783
apps: Fix Zephyr code example for STM32F746 boards #7772
@kevinthesun -> PMC #7803
Scaffolding ScheduleState data structure #7765
Make TVM Rust bindings installable via Cargo. #7503
Added missing include file #7808
Zephyr: RISCV support for Zephyr QEMU RISCV-32/64 #7804
Update Zephyr 2.5 #7786
Update nrfjprog on reference virtual machine #7723
Support uniform buffer object for passing many scalar arguments #7717
Support uniform buffer object for passing many scala… #7821
Grammar fix #7824
Fix Metal accuracy problem caused by <dtype>3 vectors usage #7830
Add support for mps2_an521 board #7813
Add quantization support for the vision transform model in GPU #7814
Fix PyTorch matmul conversion when given (2-dim, N-dim) input pair #7845
Squeeze and reduce ops #7835
fix compiling warning in simplify_expr.h #7828
Allow microTVM Reference VM to be launched when TVM is a submodule. #7854
Fix Zephyr flashing on physical hardware, busted in #7813 #7853
Fix typos in comments #7862
Support uniform buffer object for passing many scalar arguments (Take 2) #7833
Add a new intrinsic count leading zeros for LLVM and SPIR-V #7825

Torch

Fix converting torch slice op with dynamic slice length #7549
Add linear operator support #7569
Support quantized mobilenet v3 from torch 1.8 #7606
Remove unnecessary reshapes for batch_matmul #7675
Use try_infer_value for clamp min/max #7712
Implement avg_pool1d #7694

Pass

Profiling TVM compiler passes #7500

Tensorir

introduce Block and BlockRealize #7553
TVMScript Parser/Printer #7630
add TIRTextPrinter support for Block and BlockRealize #7716
Fix parser autocompletion mode #7737
LowerInitBlock #7806
adding support for opaque block #7829

Runtime

Move Map into runtime #7570
Add device specific timers #7472
Unify load params interface #7559
Add Object::unique() #7615
remove explicit destructor call #7485
Switch time evaluator to use device specific timing. #7631
Extend Graph Runtime To Support Cuda Graph Launch #7616
Cleanup build for libbacktrace #7706
Fix GraphRuntime.load_params to allow passing parameters that are not an input #7665
Cleanup logging for web runtime. #7750
Add libbacktrace for backtraces with line numbers #7153
Add clear() function in tvm::Map class #7826

Ci

Bump ARM image version #7584
Update CI Vitis AI PyXIR version #7575
Improve docker/build.sh to accept a docker tag parameter. #7707
Temp disable rust docs build #7743
Rust CI Changes #7773
add the --net=host cmd line arg to the docker/bash.it script #7780
docker images build script cmd line args optional #7776

Frontend

Fix default value for is_ascend in topk #7568

Topi

disable test_shift with i8 datatype #7597
Fix CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES with NMS for certain GPUs #7623
Sparse Add Op added #7435
Mergepath sort with odd-even block sort #7611
Improve dynamism for BatchMatmul and Dense #7496
Fix 0 valid boxes case for NMS when return_indices=False #7700
Use fixed thread block size in unique op for Vulkan #7718

Bugfix

Properly return and unflatten outputs from GraphExecutor #7604
Correctly resume status #7614
Fix usages of some logging-related macros #7748
Avoid making a new node when already has span info #7789
Fix the race condition issue of packed func. (#7246). #7619
Print doubles with precision 17 in SaveJSON and TVM script printer #7846
Thread local handle for rocblas #7851

Fix

Fix clang12 warnings #7593
Fix temporary allocation size in threefry #7709
Fix android projects #7764
tvm.testing.parametrize_targets documentation for arguments does not match what it is acutally using #7778
Make HashCombine stable across platforms #7801
Fix howto_deploy #7841
Fix RPC for the VM #7810

Onnx

Use take instead of min in NMS conditions #7633
init the NMS output tensor with 1s and then slice them away after the loop #7666
Onnx node tests #7720
Enable GPU in ONNX importer tests #7438
Dynamic Gather #7787
Bitshift Operator #7800
Initial work to import pre-quantized ONNX Models #7802
Support optional outputs for ONNX nodes #7818
Make input shape immutable #7844

Docs

Getting Started with TVM: Auto Scheduler and matmul #7644
Set USE_LLVM OFF when build VTA on pynq board #7657
Getting Started with TVM: TVMC Tutorial #7640
Getting Started with TVM: AutoTVM and Matrix Multiply #7643
Getting Started: Introduction and Installation #7638
Getting Started With TVM: Tensor Expressions #7768
Getting Started with TVM: Auto Tuning with Python #7767
Small improvements to documentation/build setup for first-time builds #7840

Tvmc

Allow options on --target to contain dots. #7651
Refactoring to document the --target regex and simplify test cases #7654
Fix to check whether a path passed to --target is strictly a file #7663
Allow optional arguments to be passed to importers #7674
Python Scripting Init Files #7698
Separate model loading from model compilation in TVMC. #7739
Allow direct numpy inputs to run_module #7788
Runner.py Updates #7779
Enable Vitis AI target through TVMC #7577
–disable-pass option added to compile mode #7816
bugfix: disabled_pass -> disable_pass #7850

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (85), comaniac (39), masahi (34), junrushao1994 (25), mbrookhart (20), tkonolige (19), areusch (16), tmoreau89 (13), zhiics (12), anijain2305 (12), merrymercy (10), jroesch (10), jwfromm (10), FrozenGene (10), leandron (9), jcf94 (8), MarisaKirisame (6), trevor-m (6), u99127 (6), icemelon9 (5), vinx13 (5), ANSHUMAN87 (5), mbaret (5), kevinthesun (4), csullivan (4), manupa-arm (4), yzhliu (3), apivovarov (3), antinucleon (3), lhutton1 (3), giuseros (3), altanh (3), electriclilies (3), Hzfengsy (3), mdw-octoml (3), zxybazh (3), ZihengJiang (2), wweic (2), liangfu (2), d-smirnov (2), hzfan (2), echuraev (2), MasterJH5574 (2), Leo-arm (2), siju-samuel (1), Laurawly (1), lixiaoquan (1), vegaluisjose (1), ajtulloch (1), codeislife99 (1), rkimball (1), hypercubestart (1), hogepodge (1), ymwangg (1), tom-gall (1), mehrdadh (1), leeexyz (1), SWu (1), vizero1 (1), adelbertc (1), LuukOddity (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

masahi (16), tkonolige (13), comaniac (11), hogepodge (10), codeislife99 (9), mbrookhart (8), areusch (8), apivovarov (8), trevor-m (7), csullivan (6), tqchen (5), jwfromm (5), leandron (5), junrushao1994 (4), altanh (4), Hzfengsy (4), jroesch (3), slyubomirsky (3), d-smirnov (3), electriclilies (3), monklof (3), merrymercy (2), Laurawly (2), jcf94 (2), rkimball (2), spectrometerHBH (2), tristan-arm (2), CircleSpin (2), jtuyls (2), euntaik (2), Johnson9009 (2), leeexyz (2), echuraev (2), zhuochenKIDD (2), rafzi (2), cgerum (2), fantasyRqg (2), siju-samuel (1), zhiics (1), icemelon9 (1), ZihengJiang (1), yzhliu (1), srkreddy1238 (1), tmoreau89 (1), lixiaoquan (1), vegaluisjose (1), FrozenGene (1), t-vi (1), ANSHUMAN87 (1), PENGUINLIONG (1), mdw-octoml (1), hgt312 (1), huochaitiantang (1), zxybazh (1), michalpiszczek (1), alter-xp (1), jackwish (1), NicolaLancellotti (1), lsy643 (1), Wheest (1), yuchaoli (1), mvermeulen (1), akmaru (1), ambroise-arm (1), AndrewZhaoLuo (1), LeiWang1999 (1), LuukOddity (1), luyaor (1), brianlan (1), intheworld (1)