TVM Monthly - March 2021

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During the March of 2021 we welcomed Andrew Reusch (@areusch) as a new committer, and Bohan Hou (@spectrometerHBH) and Siyuan Feng (@Hzfengsy) as new reviewers to the projects. Thanks to everyone for their hardwork and contributions!

This forum got 122k pageviews, 2.9k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

Relay

  • Fix foldconstant involving dropout #7550
  • Modify some passes to not stack overflow on many lets. #7558
  • BiasAddRel does not check for a negative index being out of bounds #7554
  • Fix Bug Which Cause Negative Left Shift Op #7432
  • add ShapeFunc for tanh #6898
  • Fix relay op strategy for cuda dense int8 #7586
  • add ShapeFunc for one_hot op #7490
  • Simulated Quantize and Dequantize #7613
  • Fix issue when group attribute isnt defined in convtranspose. #7655
  • Simplify consecutive transpose/layout_transform #7656
  • Relax simulated qnn tests to prevent flakiness. #7684
  • Add TopPattern to nn.dropout #7685
  • Factor out first-order AD to a module pass #7677
  • Raise error when user provides an input not in the onnx graph. #7699
  • Add cumprod #7722
  • Add a converter for ATen Nodes #7747
  • ConcretizeLike and EliminateIdentity rewrites for SimplifyExpr #7731
  • Remove pop that interferes with nested loops. #7781
  • Logical Not Shape Function #7820
  • Avoid stack overflow when using PostOrderRewrite #7588
  • SimplifyCastLike/Cast and ConcretizeFullLikeRewrite rewrites for SimplifyExpr #7827
  • A new NMS op variant for ONNX NMS / TF Combined NMS #7796

Autoscheduler

  • Autoscheduler layout rewrite pass to VM #7516
  • Querying and sampling in task extraction #7571
  • Fix incorrectly array context device and hide info at the beginning #7632
  • Add function name in message #7703
  • Add sparse dense end to end model tuning support for x86/arm cpu & Some bug fix #7635
  • Add task.desc for its function name #7794

Fixes

  • fuse constant padding into conv kernels #7515
  • Fix: cuda codegen vectorize cast #7561
  • Add TIR While node #7425
  • Support conds depend on outer loop vars inside tensorize scope #7497
  • Add SPIR-V lowering for While node #7574
  • compile engine dump tir and shape funcs #7552
  • Fix a flaky test #7580
  • Fix: install script regarding get-pip.py during docker build #7579
  • Add support for 20.11 Ethos-N driver stack release #7506
  • Fixes for using Python APIs from Rust. #7085
  • Add segment sum Op to relay and 7 corresponding TF Ops , fix scatter_add dynamic bug #7562
  • Make TRT runtime robust to empty or weird subgraphs #7581
  • Support Bool buffer argument #7591
  • Fix for dynamic batch size conv2d nhwc #7598
  • Fix groups cannot divide output channel count error for deconv when groups>1 #7595
  • Guarantee data input is the first argument #7592
  • Support negative axis for gather #7600
  • Support passing 64 bit scalar #7572
  • Fix autotuning, broken in #7337 #7566
  • Sparse dense tuning support with custom sketch rule #7313
  • BF16 support #7014
  • Fix bug in AutoInlineElemWise and implement AutoInlineBroadcast #7602
  • Add logging to diagnose flaky ci-qemu test #7610
  • Move SimplifyConvPad to a new pass and don't enable it by default #7603
  • Added MaybeAlign to CreateAtomicRMW calls to fix build for LLVM13 #7617
  • Minor update to TIR sort to make it work on VK/SPIR-V #7607
  • Allow cuDNN in non-CUDA non-system dir #7608
  • Fix RelayVM for 32-bit platforms #7605
  • Fix TVM compile without LLVM #7621
  • Prevent host Vulkan SDK blocking cross-compilation #7609
  • Fix pushconstants offset calculation for 32 bit values #7620
  • Introduce Model Library Format export format #7533
  • Improve tensor mismatch ICHECK message #7335
  • Add PreOrderVisit and VisitPrimFuncs #7627
  • Fix CALL16 reloc at 0x290 not against global symbol #7634
  • Add Test Case to Cover Bug Fix by PR#7432 #7601
  • Add HW param for Vulkan tuning #7626
  • Introduce Apple BNNS backend #7299
  • Combine USE_VM_PROFILER and USE_GRAPH_RUNTIME_DEBUG into a single flag USE_PROFILER #7637
  • fix missing qparams in aten::upsample_nearest2d #7646
  • fixed ci-gpu docker environment path typo. #7648
  • fix build break in Android rpc #7664
  • Fixed strided_slice size after nms into TFLite frontend #7659
  • Remove pytest dependency in arm_compute_lib.py #7556
  • Add nvcc support for c source module #7668
  • Fix relay.testing.darknet convert_image #7667
  • Declare int64 capability by default #7681
  • fix:getcwd not work on android platform #7390
  • Default value for graph_runtime Init lookup_linked_param_func #7676
  • allow user supplied work dir #7670
  • Cast operator adapted for MLIR-based convertor #7639
  • Explicitly free TensorRT engine and context in destructor. #7702
  • Workaround for zero size allocation #7691
  • Fix auto scheduler crash when set with consumers is empty #7708
  • Fix graph_tuner ancestor duplication #7704
  • Fix memory leaks in Metal runtime #7714
  • Rev ci-qemu to 0.02 (Introduce onnx python dependency) #7728
  • fix heap corruption from bad allocation #7735
  • Fix missing <cassert> header, caused compilation failure. #7740
  • Better grouped convolution for CPU targets #6137
  • Rename TVMContext to Device #7721
  • Add support for Ethos-N 21.02 driver stack release. #7628
  • Bump ci-cpu and ci-arm container versions #7745
  • detect iter affine map with predicate #7752
  • Bring back the stack size optimization #7756
  • Clean up uTVM demo runtime, add ONNX model test and tutorial #7557
  • Make Autopad static when possible #7755
  • Make more explicit error message during sim lib loading failures. #7761
  • Grammar fix #7622
  • Rename GraphRuntime to GraphExecutor #7653
  • normalize iter affine map expr to PrimExpr #7759
  • Add support for using the VM across the RPC boundary. #7746
  • Fix typo in include/tvm/runtime/crt/crt.h and NEWS.md #7770
  • Fix go bindings #7696
  • fix shift out of type bounds #7733
  • Add support for target object with host field compatible with previous api #7534
  • Make rpc proxy jupyter friendly via PopenWorker. #7757
  • Bugfix for reduction that involves multi-outs with where cond #7692
  • Limit OpenCL built-in vector lanes to 2, 3, 4, 8, 16. #7777
  • Subspace division #7760
  • Profiling interface for VM and Graph Runtime #7624
  • Fix RVM onnx dependency and Zephyr document update #7774
  • Reenable compilation of TVM runtime for Hexagon #7784
  • Support matching tuples, call nodes, and functions with variable numbers of inputs #7754
  • Disable Rust CI #7793
  • Fix empty target and host for autotvm task #7791
  • Try to fix qemu hangs in the CI #7590 #7769
  • Fix compilation errors with clang 11 #7783
  • apps: Fix Zephyr code example for STM32F746 boards #7772
  • @kevinthesun -> PMC #7803
  • Scaffolding ScheduleState data structure #7765
  • Make TVM Rust bindings installable via Cargo. #7503
  • Added missing include file #7808
  • Zephyr: RISCV support for Zephyr QEMU RISCV-32/64 #7804
  • Update Zephyr 2.5 #7786
  • Update nrfjprog on reference virtual machine #7723
  • Support uniform buffer object for passing many scalar arguments #7717
  • Support uniform buffer object for passing many scala… #7821
  • Grammar fix #7824
  • Fix Metal accuracy problem caused by <dtype>3 vectors usage #7830
  • Add support for mps2_an521 board #7813
  • Add quantization support for the vision transform model in GPU #7814
  • Fix PyTorch matmul conversion when given (2-dim, N-dim) input pair #7845
  • Squeeze and reduce ops #7835
  • fix compiling warning in simplify_expr.h #7828
  • Allow microTVM Reference VM to be launched when TVM is a submodule. #7854
  • Fix Zephyr flashing on physical hardware, busted in #7813 #7853
  • Fix typos in comments #7862
  • Support uniform buffer object for passing many scalar arguments (Take 2) #7833
  • Add a new intrinsic count leading zeros for LLVM and SPIR-V #7825

Torch

  • Fix converting torch slice op with dynamic slice length #7549
  • Add linear operator support #7569
  • Support quantized mobilenet v3 from torch 1.8 #7606
  • Remove unnecessary reshapes for batch_matmul #7675
  • Use try_infer_value for clamp min/max #7712
  • Implement avg_pool1d #7694

Pass

  • Profiling TVM compiler passes #7500

Tensorir

  • introduce Block and BlockRealize #7553
  • TVMScript Parser/Printer #7630
  • add TIRTextPrinter support for Block and BlockRealize #7716
  • Fix parser autocompletion mode #7737
  • LowerInitBlock #7806
  • adding support for opaque block #7829

Runtime

  • Move Map into runtime #7570
  • Add device specific timers #7472
  • Unify load params interface #7559
  • Add Object::unique() #7615
  • remove explicit destructor call #7485
  • Switch time evaluator to use device specific timing. #7631
  • Extend Graph Runtime To Support Cuda Graph Launch #7616
  • Cleanup build for libbacktrace #7706
  • Fix GraphRuntime.load_params to allow passing parameters that are not an input #7665
  • Cleanup logging for web runtime. #7750
  • Add libbacktrace for backtraces with line numbers #7153
  • Add clear() function in tvm::Map class #7826

Ci

  • Bump ARM image version #7584
  • Update CI Vitis AI PyXIR version #7575
  • Improve docker/build.sh to accept a docker tag parameter. #7707
  • Temp disable rust docs build #7743
  • Rust CI Changes #7773
  • add the --net=host cmd line arg to the docker/bash.it script #7780
  • docker images build script cmd line args optional #7776

Frontend

  • Fix default value for is_ascend in topk #7568

Topi

  • disable test_shift with i8 datatype #7597
  • Fix CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES with NMS for certain GPUs #7623
  • Sparse Add Op added #7435
  • Mergepath sort with odd-even block sort #7611
  • Improve dynamism for BatchMatmul and Dense #7496
  • Fix 0 valid boxes case for NMS when return_indices=False #7700
  • Use fixed thread block size in unique op for Vulkan #7718

Bugfix

  • Properly return and unflatten outputs from GraphExecutor #7604
  • Correctly resume status #7614
  • Fix usages of some logging-related macros #7748
  • Avoid making a new node when already has span info #7789
  • Fix the race condition issue of packed func. (#7246). #7619
  • Print doubles with precision 17 in SaveJSON and TVM script printer #7846
  • Thread local handle for rocblas #7851

Fix

  • Fix clang12 warnings #7593
  • Fix temporary allocation size in threefry #7709
  • Fix android projects #7764
  • tvm.testing.parametrize_targets documentation for arguments does not match what it is acutally using #7778
  • Make HashCombine stable across platforms #7801
  • Fix howto_deploy #7841
  • Fix RPC for the VM #7810

Onnx

  • Use take instead of min in NMS conditions #7633
  • init the NMS output tensor with 1s and then slice them away after the loop #7666
  • Onnx node tests #7720
  • Enable GPU in ONNX importer tests #7438
  • Dynamic Gather #7787
  • Bitshift Operator #7800
  • Initial work to import pre-quantized ONNX Models #7802
  • Support optional outputs for ONNX nodes #7818
  • Make input shape immutable #7844

Docs

  • Getting Started with TVM: Auto Scheduler and matmul #7644
  • Set USE_LLVM OFF when build VTA on pynq board #7657
  • Getting Started with TVM: TVMC Tutorial #7640
  • Getting Started with TVM: AutoTVM and Matrix Multiply #7643
  • Getting Started: Introduction and Installation #7638
  • Getting Started With TVM: Tensor Expressions #7768
  • Getting Started with TVM: Auto Tuning with Python #7767
  • Small improvements to documentation/build setup for first-time builds #7840

Tvmc

  • Allow options on --target to contain dots. #7651
  • Refactoring to document the --target regex and simplify test cases #7654
  • Fix to check whether a path passed to --target is strictly a file #7663
  • Allow optional arguments to be passed to importers #7674
  • Python Scripting Init Files #7698
  • Separate model loading from model compilation in TVMC. #7739
  • Allow direct numpy inputs to run_module #7788
  • Runner.py Updates #7779
  • Enable Vitis AI target through TVMC #7577
  • –disable-pass option added to compile mode #7816
  • bugfix: disabled_pass -> disable_pass #7850

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (85), comaniac (39), masahi (34), junrushao1994 (25), mbrookhart (20), tkonolige (19), areusch (16), tmoreau89 (13), zhiics (12), anijain2305 (12), merrymercy (10), jroesch (10), jwfromm (10), FrozenGene (10), leandron (9), jcf94 (8), MarisaKirisame (6), trevor-m (6), u99127 (6), icemelon9 (5), vinx13 (5), ANSHUMAN87 (5), mbaret (5), kevinthesun (4), csullivan (4), manupa-arm (4), yzhliu (3), apivovarov (3), antinucleon (3), lhutton1 (3), giuseros (3), altanh (3), electriclilies (3), Hzfengsy (3), mdw-octoml (3), zxybazh (3), ZihengJiang (2), wweic (2), liangfu (2), d-smirnov (2), hzfan (2), echuraev (2), MasterJH5574 (2), Leo-arm (2), siju-samuel (1), Laurawly (1), lixiaoquan (1), vegaluisjose (1), ajtulloch (1), codeislife99 (1), rkimball (1), hypercubestart (1), hogepodge (1), ymwangg (1), tom-gall (1), mehrdadh (1), leeexyz (1), SWu (1), vizero1 (1), adelbertc (1), LuukOddity (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

masahi (16), tkonolige (13), comaniac (11), hogepodge (10), codeislife99 (9), mbrookhart (8), areusch (8), apivovarov (8), trevor-m (7), csullivan (6), tqchen (5), jwfromm (5), leandron (5), junrushao1994 (4), altanh (4), Hzfengsy (4), jroesch (3), slyubomirsky (3), d-smirnov (3), electriclilies (3), monklof (3), merrymercy (2), Laurawly (2), jcf94 (2), rkimball (2), spectrometerHBH (2), tristan-arm (2), CircleSpin (2), jtuyls (2), euntaik (2), Johnson9009 (2), leeexyz (2), echuraev (2), zhuochenKIDD (2), rafzi (2), cgerum (2), fantasyRqg (2), siju-samuel (1), zhiics (1), icemelon9 (1), ZihengJiang (1), yzhliu (1), srkreddy1238 (1), tmoreau89 (1), lixiaoquan (1), vegaluisjose (1), FrozenGene (1), t-vi (1), ANSHUMAN87 (1), PENGUINLIONG (1), mdw-octoml (1), hgt312 (1), huochaitiantang (1), zxybazh (1), michalpiszczek (1), alter-xp (1), jackwish (1), NicolaLancellotti (1), lsy643 (1), Wheest (1), yuchaoli (1), mvermeulen (1), akmaru (1), ambroise-arm (1), AndrewZhaoLuo (1), LeiWang1999 (1), LuukOddity (1), luyaor (1), brianlan (1), intheworld (1)

2 Likes