TVM Monthly - January 2021

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During the first month of 2021 we welcomed many new contributors to the project. Importantly we welcomed @trevor-m and @hzfan as reviewers.

Thanks to everyone for their hardwork and contributions!

On the technical side we kept working on improving operator and frontend support, a few performance optimization has been landed for Nvidia GPUs. IR side we now have pattern language support If condition and TIR support Return. Improvement on auto scheduler includes cost model enhancement and end2end stability. MicroTVM now has non-cryptographic random number generator and more hardware support.

This forum got 119k pageviews, 2.3k user visits in the last month.

Relay IR and TIR

  • [PatternLang] Add If pattern #7282
  • [PatternLang] Add Syntatic Sugar to the C++ pattern API and support DataType Attribute Matching #7120
  • [Relay][PatternLang] Fuzzy Function Matching #7355
  • [PatternLang] Add a relay LetPattern #7332
  • [TIR][REFACTOR] Enforce allocate to use the correct var pointer hint. #7216
  • [TIR][REFACTOR] ForNode introduce thread binding and remove legacy field #7306
  • [Arith] Simplify cast #7045
  • [Autodiff] Deterministic gradient compute #7321
  • [TIR] Support Return in TIR #7084

Operator support

  • [Relay, TOPI] Add numpy style cumsum op #7334
  • [TOPI] Make cumsum IR reusable, add thrust scan #7303
  • [THRUST] Faster multi dimensional argsort by segmented sort #7195
  • Add a shape function and dynamic test for round #7324
  • [PRNG] Add check to PRNG to make sure that unsigned integer arithmetic is wrapping #7287
  • [RELAY,TOPI] Threefry PRNG: splittable and stateless #7083
  • [ConvertLayout] slice_like support #7184
  • [ConvertLayout] Support transpose #7214
  • [Relay][Training] Add more gradients #7323

Frontend

Torch

  • [Torch] Restore class-aware NMS for detection models by graph rewrite #7154
  • [Torch] Various updates for PyTorch frontend #7348
  • [Torch] More graph rewrites for Faster RCNN / MaskRCNN #7346
  • Adding aten::unsqueeze_ to PT Frontend #7231

ONNX

  • [Relay][Frontend][Onnx] Robustify Loop Importer #7353
  • [Relay][Frontend][ONNX] Allow condition in if op to be an array. #7215
  • [FRONTEND][ONNX] Remove seemingly invalid SoftPlus #7189
  • [ONNX Frontend] add default value for leaky relu alpha #7259

MXNet

  • [Frontend][MXNet] add _npi_stack, issue #7186 #7209
  • [Frontend][MXNet] add _npi_subtract_scalar #7191

Tensorflow and TFLite

  • [TFLite] Strided slice handling of shrink_axis_mask improved #6998
  • [TFLite] Quantized version of unit test for Dense #7113
  • [TFLite] Added check for dynamic range quantization #7114
  • [TFLite] Added ability to infer shapes for arguments #7293
  • [Frontend][TFLite] Densify Op added #7048
  • [Frontend][Tensorflow] Sparse dense matmul adjoint option added #7267
  • [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86 #7148
  • Made tensorflow IsNan actually work #7320

Backend

  • [CUBLAS, CUDNN] Support dynamic batch size #7194
  • [CUDA] [Codegen] Ensuring atleast one thread block to handle empty tensor #7273

BYOC

  • [BYOC][Verilator] add support to dynamically load hardware library #7286
  • [BYOC][ACL] removed ACL 20.05 limitations #7251
  • [BYOC][ACL] Depthwise convolution support #7206

Ansor, Autoscheduler and AutoTVM

  • [AutoScheduler] Do not return naive schedule in tracing mode #7226
  • [AutoScheduler] Separate shapes from DAG hash and enable schedule sharing #7317
  • [AutoScheduler][Relay] Control compile engine cache via PassContext #7220
  • [AutoScheduler] Enable schedule sharing in dispatch context #7344
  • [AutoScheduler] Add layout rewrite support for dense and batch matmul on CPU #7161
  • [AutoScheduler] Bug fix & Custom sketch support #7260
  • [Fix][Autoscheduler] Costmodel enhancement #7197
  • [AutoTVM] Add index boundary check in ConfigSpace.get() #7234
  • [AutoScheduler] Add custom build function #7185

MicroTVM

  • [µTVM] Add TVMPlatformGenerateRandom, a non-cryptographic random number generator. #7266
  • Update uTVM code to work with the nRF5340DK dev board. #7331
  • [µTVM] Add ST STM32F746 disco board to tflite tutorial script #7254
  • [uTVM] Initial BYOC support with c-source module #6950
  • Add MicroTVM support for the STM32F746 Discovery board #7225

Performance

  • [TOPI] Minor perf improvement for GPU scatter #7233
  • [TOPI] Parallelize GPU NMS inner loop #7172
  • [TOPI] Improve memory layout inside GPU NMS kernel #7257
  • [TOPI] Rewrite GPU argwhere using exclusive scan #7314
  • Parallelize cumsum in get_valid_counts #7123
  • [Relay] Fold If when the Condition is Constant #7354
  • [CUDA] Parallel Cuda Mergesort #7099
  • [CUDA][PASS]Legalize tensorcore #7147
  • [CUDA]batch_matmul tensorcore schedule #7146

Runtime

  • [VM] Per-input, data dependence specification for shape func #7210
  • Remove MemoryPlan from VM passes #7361

Refactor and API changes

  • [Refactor][VM] Port memory_alloc to c++ #7369

Build and CI

  • [BUILD] Don’t add $TVM_HOME/… to the include path when compiling code #7342
  • switch to more portable bash pipeline syntax #7274
  • [CI] make sure submodule checkout in clean state #7228
  • [DOCS] Fix figure links #7268
  • [CI][BYORTL] add Verilator regression test to CI #7098
  • [Hardware][Verilator] Separate Verilator dependency from Chisel dependencies #6986
  • [CMake] use wrong flag name #7341
  • Do not use ICHECK in nnvm #7255

Doc

  • [TUTORIAL] Add gpu instructions and results to deploy_sparse #7298
  • [µTVM] Add documentation #7164
  • Bring back numbered lists to TVM docs. #7290
  • Add QEMU setup to uTVM tutorial. #7296
  • [Tutorial] Autoscheduler on ARM devices #7326

Improvement and Bugfix

  • [TEST] Another attempt to fix flaky segfaults from torch detection test #7371
  • [TEST] Swap pytorch and tvm import order to fix flaky segfaults #7380
  • [TEST] Relax tolerance for dlpack <-> pytorch test #7325
  • [TEST] Disable one of graph rewrites in torch detection test #7365
  • [Relay] Type Relation Fixes #7362
  • Fix Get Valid Counts when the number of boxes is zero #7229
  • Reorder dynamic to static and simplify inference, lower DynamicToStatic Opt Level #7213
  • Fix an issue with dynamic functions overwritting call arg types #7295
  • [FIX] Remove leftovers from check_correctness #7272
  • [FIX] Infer input shape in sparse_dense_padded’s alter_op if one does not exist #7308
  • [FIX,AUTOTVM] Add flop counts to cublas #7297
  • [FIX,TUTORIALS] Import tvm.testing in tutorials that use it #7248
  • [RUNTIME] Improve error messages for TypedPackedFunc #7152
  • [BUGFIX] Change debug_runtime to represent times in seconds internally #7227
  • [µTVM] Raise a better error when project_dir does not exist #7165
  • Remove check_correctness in AutoTVM, which is busted #7250
  • Add resource_handle to TVM_DLL_EXPORT_TYPED_FUNC. #7338
  • [µTVM] Remove need for -mcpu=native #7276
  • Fix ICHECK_NOTNULL in logging.h #7193
  • [TOPI] Treat undefined elements as constants in Array #7232
  • [DOCS] Fix figure links #7268
  • [VTA] update version of 3rdparty vta-hw submodule #7271
  • [VTA] update 3rdparty submodule #7081
  • [BYOC][Verilator] change runtime registry function name #7351
  • [FIX, AutoScheduler] Fix conv3d’s op strategy for auto-scheduler #7328
  • [AutoScheduler] Fix layout rewrite for iterator with extent=1 #7279
  • [AutoScheduler] Fix typos in feature extraction and cost model #7280
  • [AutoScheduler] Fix for zero-rank output #7180
  • [Relay][Frontend][Onnx] Fix mismatch between Onnx Prelu definition and importer. #7208
  • [Relay][Frontend[Onnx] Add testing for output datatypes and fix related bugs. #7364
  • [Relay][Frontend][Onnx] Compare against onnxruntime more consistently during testing #7300
  • Change the all #pragma once to ifdef include guard #7264
  • Change const to used dtype if it is passed in #7285
  • A few typo fixes in the uTVM design doc. #7291
  • Some docstring fixes. #7367
  • [PatternLang][Bugfix] Ensure CallNode attrs are not undefined before checking #7278
  • [BYOC][bugfix] Handle empty tuples in annotation pass #7288
  • [BYOC][TRT] Fix TRT conversion for reshape op - ReshapeAttrs no longer has reverse #7205
  • [BYOC][TRT] Fix weight conversion when first dim of weight shape is 1 #7253
  • [Relay][Training] fix grad for zeros and ones #7357
  • [Relay][Training] Fix tanh gradient and update tests to use downstream gradient #7340
  • [µTVM] Fix two warnings when deprecated forms are used #7269
  • [µTVM] Avoid listing links when probing serial ports #7265
  • [TEST] Fix test_topi_batch_matmul_tensorcore.py:test_batch_matmul requirement #7294
  • [Relay][PatternLang] Bug fix of rewrite func attr #7358
  • Add op_name in error message for Pool #7243
  • [Fix] Tensor core type issue for dense #7187
  • [ONNX] Fix issues for Clip and RoiAlign #7237
  • [RELAY] Fix reshape header file #7218
  • Fixed temporary lock_guard instances. #7199
  • Add resource_handle to both TVM_DLL_EXPORT_TYPED_FUNC and TVM_DLL_EXP… #7343
  • get_top_results works on a copy of output #7327
3 Likes