TVM Monthly - September 2021

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During August of 2021 we welcomed many new contributors to the project. Importantly we welcomed @Hzfengsy @giuseros as a new committer, @AndrewZhaoLuo @jtuyls as new reviewers. Thanks to everyone for the hard work and contributions!

This forum got 107k pageviews, 2.5k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

Fixes

  • Fix printing ForNode annotations #8891
  • CMSIS-NN graph partitioner for softmax #8653
  • Re-enabled automatic --tty flag when running bash. #8861
  • Trivial uTVM -> microTVM "spelling" fix to align with branding. #8905
  • Set default value of p in LpPool as 2 #8866
  • Sanitize names of input tensors in interface header #8720
  • Move to new style issue template system #8898
  • Remove LoweredModule #8886
  • Improve adaptive and global pool schedule #8936
  • Unify dense op input layout #8921
  • Improve local_response_norm schedule #8946
  • Fix printing of schedule operations #8949
  • Add manupa-arm to CODEOWNERS #8911
  • Fix incorrect AOT Memory Planning #8926
  • CMSIS-NN code generator for softmax #8833
  • Use popenpool in local_executor #8851
  • Arm(R) Ethos™-U NPU Relay passes and Conv2D op #8795
  • Profiling over RPC #8885
  • Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897
  • Sort PrimFuncs when creating LLVM module #8958
  • Set tvm.micro.project_api as a Python Module #8963
  • Document rewrite_once option #8900
  • Clean up IRModule attrs and LowerTEPass #8914
  • Update TVM VTA (VTA Chisel Wide memory interface) #8973
  • Move external codegen test helpers into utils #9008
  • Arm(R) Ethos™-U NPU TIR compiler with conv2d support #8806
  • Add an option to make imported models compatible with the Relay text parser #9015
  • Disable cuda int8 schedule for non-cuda gpu target #9014
  • Pipeline Executor Initial patch. #8702
  • Add standalone_crt/ to be part of the wheel package, when available. #9005
  • Implementation of relay_to_tir target hook #8423
  • Fix dense tensorcore legalize type error when units is specified #9030
  • Fix line break in setup.py #9029
  • Add bindings for StaticMemoryPlan and DensePackAttrs #9034
  • Support match pvar with dtype constraint #9016
  • Moved TIR generation from Python to C++ for CMSIS-NN #8951
  • Add back missing init.py to unbreak CI. #9052
  • Shape Func of Split Op Error #8887
  • DataType Bug In SplitRel #8899
  • Add parallel_for_dynamic with dynamic schedules #9056
  • Migrate flake8 from workflow to lint script #9062
  • Add flake8 to docker/lint.sh #9055
  • Speed Up get DataType #9072
  • Add tracker support into ios-rpc application #7876
  • Always use VM compiler for task extraction #9069
  • Enable python debug runtime for exported network libraries #8793
  • Ensure AOT passes all intermediary storages to function calls #9064
  • Swap block x and z dimension for conv2d NHWC schedule #9087
  • BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076
  • Add PaddlePaddle to python/gen_requirements.py #9098
  • Support colons in input-shapes tvmc command line arguments #9080
  • Add extern "C" to C Interface API header #9094
  • Fix hang while tune model through rpc #9032
  • Add random to ios_rpc #8935
  • Move the allocates of AoT codegen to be TVMBAWs #9065
  • Skip numpy.ascontiguousarray if C_CONTIGUOUS == True #9073
  • Fix padding calculation in Transpose Conv #9089
  • Fix missing dtype of tir.Shuffle #9131
  • wrong annotation of tir generic operators #9119
  • Corrected warning message about USE_GRAPH_EXECUTOR_DEBUG #9006
  • Arm(R) Ethos™-U NPU TIR to CS for Conv2D #8811
  • Frontend: add onnx GlobalLpPool op #8845
  • Ensure google-mock is installed and setup #9107
  • Swap out analyzer when outlining #9117
  • Support returning quantized weights and bias for BYOC use cases #9135
  • Add nn.global_avgpool to FQ2i #9137
  • Remove redundant visit statement in CodeGen. #9144
  • Arm(R) Ethos™-U NPU codegen integration #8849
  • Fix flaky NMS test by making sure scores are unique #9140
  • Update find cublas so it searches the default path if needed. #9149
  • Fix typo #9156
  • Arm(R) Ethos™-U NPU codegen integration with tvmc #8854
  • Add vectorization to cuda conv2d_nhwc schedule #8636
  • Introduce centralised name transformation functions #9088
  • Issue8717 x86 dws conv2d schedule #9092
  • add PaddlePaddle tutorial #9124
  • Move llvm import test away from minimum test #9171
  • BUG: Fix core-dump in crt graph_executor.c #9155
  • Enhance printer #8934
  • Script namespace changes #9115
  • Fix error reporting on Store #8895
  • Add while node support in TVMScript #9004
  • Add comments with origins of various runtime/backend types, NFC #9177
  • Fix flaky LocalRunner test due to restrictive timeout #9181
  • Support quantised RSQRT operator in TFLite #9165
  • Support "all" and "any" op #9185
  • Wait for RPCServer to be established #9150
  • A follow up PR for 5/6 of Arm(R) Ethos™-U NPU codegen #9147
  • Initial operator support for Mul #9163
  • Add USE_ETHOSU for the config.cmake #9162
  • Correct fast_tanh description #9193
  • Fix custom_address serialization in c++ tracker client. #9192
  • Add cleanup for localrunner #9191
  • Update ci-cpu to v0.78 #9199
  • Initial operator support for Add #9167
  • Update Vitis AI integration to 1.4 release #8815
  • Fix end to end benchmark with rpc devices #9175
  • Reset op attributes before registering them #9202
  • Add TVMC Frontend for PaddlePaddle #9083
  • Contributing the STM32 port #7742
  • Update ci_i386 to v0.74 #9211
  • Add dilation to MaxPool2DAttrs Rust bindings #9215
  • Adding annotations for tir.allocate #9168
  • Update virtual_machine.rst #9222
  • Add cache flush for arm #9170
  • Migrate C Interface API Generation to C++ #9106
  • Run full build when no files were changed over main #9221
  • BUG #9216: Don't disable FuseOps pass since required by GraphExecutor #9227
  • Use find_package to locate GTest files #9208
  • Documentation Refactor #9203
  • Initial Implementation of TIRToRuntime Target hook #9190
  • Add importer for ONNX QLinearMatMul op #8952
  • Arm(R) Ethos™-U NPU Depthwise2d operator support #9209
  • Strided slice layout transform fix (disallow NCHW4c -> NCHW etc properly) #9245
  • Fix Server connecting to RPC Tracker through a Proxy #9210
  • Add stage to ICHECK error message #9249
  • Enforce bias when pattern matching conv2d #9244
  • Address review comments on Arm(R) Ethos™-U PR 3/6 #9159
  • Add printing of SplitExprNode and SumExprNode #9262
  • Propagate tvm target through graph tuning setup #9248
  • Bumping up CMSIS-NN version to be in sync with TFLu #9247
  • Hexagon conv2d full output slice #9198
  • Pipeline Executor Second patch, configuration load and executor export/import. #9108
  • Fix USMP parallel to serial loop transform test #9254
  • Skip onnx test cases if no onnx #9272
  • Update TVM_LOG_DEBUG for IR tracing. #9278
  • Move build module transformations and utilities to C++ #9103
  • Sort unit tests before running. #9188
  • Fix VTA vision detection tutorial 'sphinx' style error. #9279
  • Ensure MyPy type checks pass #9284
  • Add option to overwrite OperatorConverter class in relay.frontend.from_tflite #9256
  • Reset sphinx-gallery version to 0.4.0 #9280
  • Do not aggregate frames with different devices #9290
  • Sort columns in table and csv output #9300
  • Light refactoring of TE -> TIR paths. #9263
  • Rename build module helper func #9297
  • Fix build issues on the latest XCode and iOS #9298
  • fix a bug in the comment of function :fixed_point_multiply #9304
  • llvm 14 and above move TargetRegistry.h into MC #9305
  • Test run triage #9308
  • Add significant VM instructions to profiling report #9292
  • Fix direct and broken links #9314
  • Support return_sequences in LSTM #9303
  • fix missing span arg #9318
  • Arm(R) Cortex(R)-M55 CPU and Arm(R) Ethos™-U55 NPU Demo App #8922
  • Adjust Hexagon conv2d schedule to split channel out (k) and move to outer loop #9287
  • Add conv1d support in BYOC TRT by converting conv1d to conv2d #9324

Microtvm

  • Add Arduino RVM #8748
  • Temporarily remove mps2_an521 from CI #8927
  • Remove Arduino aot code #8869
  • Zephyr: Fix option name in PROJECT_OPTIONS #8884
  • Add platform to build directory name #8945
  • Fix Arduino Versions in RVM Build #8938
  • Add method to query template info without creating a project #8950
  • Add support for AutoTVM #8715
  • Refactor platform used as board name in microTVM #8940
  • Zephyr: implement 'west_cmd' server option #8941
  • Zephyr: Set 'choices' for ProjectOption 'verbose' #8968
  • Hot Fix Bad Merge #8980
  • Fix board names #8998
  • Fix autotvm bug and tests #9003
  • Follow up fixes to #9003 #9018
  • Add 'config_main_stack_size' option to API server #9026
  • Add MIMXRT1050 board support #9068
  • Add wrapper for creating project using a MLF #9090
  • Update support for ARMv7m intrinsic #8990
  • Always destroy the VM if all tests pass #8739

Onnx

  • Support Negative Log Loss #8872
  • Add index_put operator #8894
  • Turn off flaky nllloss test for now #8919
  • Add OpSet 13 implementation for Hardmax #8924
  • fix GRU modification and reduce tolerance for RNN tests #8923
  • Add support for QLinearConcat contrib op #8907
  • Pow support for other types #8933
  • support slicing with out of order axes #8959
  • Remove unnecessary converters for greater and lesser #8967
  • Support depth_to_space op for FQ2I #8966
  • Turn off one more flaky test #8972
  • Add Adagrad #9001
  • Add Adam #9002
  • Add Einsum converter #8985
  • Add momentum #9000
  • Fix NLL Loss tests #8971
  • enable the onnx tests after PR #8274 merged #9019
  • QLinearAveragePool and QLinearGlobalAveragePool contrib op #9017
  • Add SoftmaxCrossEntropyLoss #8906
  • QLinearSigmoid contrib op and Bug Fix for DequantizeLinear #9028
  • LessOrEqual and GreaterOrEqual ops #9066
  • Increase tolerances for reduce tests #9054
  • Add dynamic unsqueeze / expand_dims op #9039
  • Add Compress Support #9067
  • enable more *_expanded tests #9051
  • Dynamic squeeze #9095
  • Fix unsqueeze constant expr #9146
  • support additional nllloss tests #9045
  • QLinearLeakyRelu contrib op #9063
  • Handle removal of onnx.utils.polish_model #9178
  • Resize Opset 13 #9265

Topi

  • Parametrizing additional topi tests, marking vulkan failures #8904
  • Fix CUDA pooling schedule #8957
  • Fix more pooling schedule #9021
  • Fix compiing batch_matmul and dense when two args are the same tensor #9207
  • Fix direct SIMD conv2d schedule name #9225

Tvmc

  • Add ROCm to the TVMC driver #8896
  • Treat invalid FILE arguments #9213
  • Split common tvmc test file into more specific files #9206
  • Compose target options from target registry #9218
  • Support dot inside of TVMC input shape name arguments #9294

Community

Unittests

  • Enable contrib tensorrt/coreml unit tests #8902
  • Mark unit tests as requiring Ethos-N #8873
  • Enable minimum testing on Vulkan target in CI #9093
  • Mark CMSISNN with skipif they are missing libraries #9179
  • Mark Binary Ops CMSIS NN tests as skipped #9200
  • Skip import of tvm.micro if micro-TVM was not enabled #9301

Autoscheduler

  • Propogate global autotvm state to PopenPool workers #8913
  • Fix custom build func in PopenWorker #8939
  • Change the doc to reflect previous code change #8970
  • Fix task scheduler after 8478 #8984
  • Reduce task weight coercion overhead #8995

Hexagon

  • Add trivial conv2d operator to Hexagon relay strategy #8915
  • Add support for linked-in model parameters #8865
  • Fix VTCM allocation #8954
  • llvm-options attribute is an array of strings #9011
  • Add contrib tests for blocked conv2d and maxpool2d #8960
  • Implement model launcher #8986
  • Allow undefined symbols in libtvm_runtime.so on Hexagon #9024
  • Disable thread_local on Hexagon #9025
  • Don't use {} initialization with FastRPC structures #9033
  • Treat floats as float32 when passing args to offloaded kernels #9010
  • Pytestify Hexagon unit test #8955
  • Fix compilation errors in Hexagon launcher #9189
  • Add hexagon launcher to apps and add to TVM's build system #9220
  • Fix addressing TVMValue array #9302

Relay

  • Make Softmax op fusible with elemwise ops #8909
  • Add a non-recursive LetNode VisitExpr_ for LabelOps Pass to avoid stack overflow #8917
  • Per-Channel FQ2I #8883
  • Make expressions in the DynamicToStatic pass tests more dynamic #8989
  • Removed redundant unit test. #8993
  • fix conv transpose weight dtype inference #8962
  • Remove memory planing from LowerTEPass #8974
  • Add ExtractOperators pass #8996
  • Fix compiler warning in ExtractOperators #9075
  • Prepare for merging context_analysis.cc and device_annotation.cc #9077
  • Register layout conversion function to more reduce ops #9048
  • Prepare for new plan_devices.cc (part II) #9130
  • Support for qnn.conv2d_transpose #9139
  • Merge analysis/context_analysis.cc and transforms/device_annotation.cc #9038
  • Use a uint64_t to serialize primitive_attrs in the Relay VM to fix 32bit RPC #9169
  • Remove DeviceMap from LowerTE #8788
  • Remove unnecessary Optional<IRModule> argument to ToANormalForm and friends #9197
  • Gather op dynamic input support #9240
  • Improve reduction op layout propagation for packed input #9253
  • VLOG for finer grained control of hyper-detailed logging #9012

Tensorir

  • Allow Tuple/Array in TE lowering #8916
  • Compute-At #8943
  • Decompose-Reduction #9041
  • update block syntax #9286

Tir

  • Fixed LowerThreadallreduce not remapping Store buffer var #8931
  • Add conversion from FloatImm to float in Python #9009
  • Revert a change to lower_tvm_builtin.cc from #6126 #8274
  • add loop partition hint pragma #9121
  • Fix lowering strides when source region has higher dimension than the buffer #9145
  • tir.transform.StorageFlatten refactor #9091
  • Fix lowering strides when source buffer has non-empty strides #9166
  • Fix FlattenBuffer computing size for buffer with strides #9195
  • Add a parallel to serial for loop converter pass #8469
  • Minor refactor to tir.transform.StorageFlatten #9260
  • Added PrettyPrint of ProducerStore/ProducerRealize nodes #9259
  • Add support for 0-dim buffer #9224

Bugfix

  • Add check to avoid calling back() on an empty container #8930
  • Fix visit_attrs error if its function pointer is equal to nullptr #8920
  • Add a nullptr check to tir.Buffer to fix the illegal memory access #8910
  • Fix div zero error in rewrite_simplify #8961
  • Fix other div zero errors also in rewrite_simplify #8983
  • Disallow fusing loops with dependency #9112
  • Prevent casting handle to other types #9114
  • Add nullptr checking for AttrStmt with coproc_uop_scope attr key #9123
  • Fix meta_schedule.testing.local_rpc #9172
  • Fix a predicate bug in TIR schedule primitive rfactor #9228
  • Add IterRangeSanityCheck in DetectIterMap #9205
  • Fix to allow zero-copy between numpy and TVM NDArrays #9230
  • fix API doc URLs #9266
  • Fix typo in error message in CMakeLists.txt #9251

Unittest

  • Runnable relay unit tests on Vulkan #8947
  • Parametrized test_conv2d_int8_intrinsics #9143
  • Fixing unittest #9180
  • Removed vulkan from CI run of task_python_topi.sh #9219

Ci

  • Update CI Vitis AI PyXIR version to v0.3.1 #8814
  • Update ci environment #9082
  • bash.sh, build.sh: add option to set the container name and hostname #9110
  • Split Integration tests out of first phase of pipeline #9128
  • Fix Google Mock differences between Ubuntu 18.04 and 16.04 #9141
  • Prevent the complete Jenkins pipeline to run when files commited only to /docs #9031
  • Use correct tag in Docker --cache-from #9234
  • Pre-build Reference System Dependencies #9270

Docs

  • Add github issue template for documentation #8982
  • Update code review guideline #8999
  • Minor Change in Create Prim Func function Docs #9070
  • Fix installation from source link some text #9238

Byoc

  • Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808
  • Fix build with TensorRT 8 #9047
  • Fix DNNL Conv2D in JSON runtime #9043
  • Fix incorrect conv2d padding handling in dnnl c_src codegen #9097
  • add multiply and remove subtract for dnnl json runtime #9120
  • support arbitrary input dims for add/mul/relu of dnnl c_src codegen #9127
  • Support arbitrary input dims in DNNL ReLU #9122

Meta schedule

Frontend

  • fix #9078 #9099
  • support for quantized conv_transpose2d op #9133
  • Semantic difference of 'bias_add' between relay and pytorch #9204
  • Add 100+ operators for PaddlePaddle #9126
  • Fix bug for paddle frontend #9236
  • Remove unused parameters and fix doc string #9283

Llvm

  • Refactor MakeCallPacked, NFC #9118
  • Rename t_tvm_context_ to t_tvm_device_, NFC #9176
  • Make changes needed for opaque pointers #9138
  • Treat scalars as single-lane vectors in CreateVecConcat #9264
  • Add ability to turn on fast math flags #9223

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

areusch (64), masahi (49), junrushao1994 (46), comaniac (37), tqchen (34), leandron (31), mbrookhart (30), jroesch (26), vinx13 (20), jwfromm (20), Mousius (20), tkonolige (19), Hzfengsy (19), jcf94 (16), manupa-arm (16), tmoreau89 (15), AndrewZhaoLuo (15), mbs-octoml (12), csullivan (9), elvin-n (8), shingjan (7), mehrdadh (6), mbaret (5), gromero (4), zxybazh (4), Lunderberg (3), FrozenGene (3), echuraev (3), MasterJH5574 (3), ZihengJiang (2), Laurawly (2), trevor-m (2), u99127 (2), huajsj (2), electriclilies (2), YuchenJin (2), anwang2009 (2), ashutosh-arm (2), ekalda (2), guberti (2), siju-samuel (1), yzhliu (1), kparzysz-quic (1), kevinthesun (1), vegaluisjose (1), t-vi (1), lhutton1 (1), giuseros (1), xqdan (1), hogepodge (1), mikepapadim (1), NicolaLancellotti (1), zhanghaohit (1), grant-arm (1), adstraw (1), apeskov (1), tiandiao123 (1), denise-k (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

echuraev (21), AndrewZhaoLuo (17), kparzysz-quic (16), masahi (14), mehrdadh (12), Mousius (11), Lunderberg (10), leandron (9), gromero (9), junrushao1994 (8), syang-ng (8), jiangjiajun (8), mbrookhart (5), anwang2009 (5), vvchernov (5), wangxiang2713 (5), vinx13 (4), manupa-arm (4), zxybazh (4), mbs-octoml (4), elvin-n (4), mikepapadim (4), areusch (3), tkonolige (3), electriclilies (3), shingjan (3), sunwayforever (3), ashutosh-arm (3), guberti (3), alter-xp (3), abraham-arun (3), tqchen (2), icemelon (2), comaniac (2), tmoreau89 (2), Hzfengsy (2), jtuyls (2), wrongtest (2), ekalda (2), apeskov (2), sergey-grovety (2), ZihengJiang (1), t-vi (1), mbaret (1), u99127 (1), huajsj (1), rkimball (1), csullivan (1), codeislife99 (1), hypercubestart (1), hogepodge (1), CircleSpin (1), euntaik (1), AnastasiaStulova (1), ganler (1), tom-gall (1), rafzi (1), michalpiszczek (1), mvermeulen (1), grant-arm (1), jinhongyii (1), binarybana (1), aasorokiin (1), fernchen (1), Leo-arm (1), UniverseFly (1), chiwwang (1), quic-sanirudh (1), chunit-quic (1), tiandiao123 (1), denise-k (1), yanggg1997 (1), idoudali (1)

1 Like