TVM Monthly - November and December

TVM Monthly - November, and December 2020

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

The months of November and December were a busy time for the TVM community with many end of the year improvements leading up to TVM conference in early December. The conference was completely digital this year due to Covid-19 and allowing more people to attend. We had over 900 people register for the conference! Thank you to Chris Hoge and many others who planned and participated, especially the speakers who went through many extra levels of planning to pre-record their talks, and talk materials.

During this period we welcomed many new contributors to the project.

Importantly we welcomed @leandron and @tkonolige as reviewers, @mbaret, @mbrookhart, @jcf94, and @jwfromm as committers, and finally @Laurawly has joined the PMC.

Thanks to everyone for their hardwork and contributions!

On the technical side the past few months have had a few major themes:

  • Improved coverage and operator support at the Relay level and in the importers.
  • Major improvements and features for new auto scheduler based on the Ansor paper.
  • Work on production-izing uTVM including better CI integration, better board support, and a variety other features detailed below.
  • Lots of stability and bug fixes.

In January we hope to see even more progress on stability, and coverage and hopefully see some fresh

RFCs and new features landing. Happy new year everyone!

This forum got 113k pageviews, 2k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

Ci

  • Move back Keras to 2.4.3 #6810

  • Update to latest #6812

  • Torch 1.7 update staging #6825

  • Torch 1.7 update to mainline #6828

  • remove unused environment var #6824

  • Disable flaky tests #6841

  • Add python setup script #6844

  • Add more guidelines about additional local setup #6848

  • Update actions miniconda #6926

  • Install libc6-dev-i386 to compile wasm32 #6886

  • Disable ASF header checking on untracked files #6975

  • Hotfix CI (see #7010) #7025

  • Update docs style dependency. #7034

  • Switched to ACL 20.11 #7106

  • Add ACL to the CI for AArch64 #7122

  • make sure submodule checkout in clean state #7228

  • add Verilator regression test to CI #7098

Fixes

  • Update types slots for baseexpr and primexpr #6814

  • TF frontend: add softsign op #6799

  • Sparse2Dense support #5767

  • Update stale link #6820

  • Update stale link to new location #6819

  • Improve AArch64 depthwise convolution through smlal/smlal2 intrinsic #6711

  • Fix a bug in _stridedSlice() #6829

  • Fix Annotate Target to support freevars(relay.zeros, relay.ones etc) of any size (including zero) #6826

  • Update tophub link to new location #6838

  • Register shape functions for some image related ops #6373

  • Update version #6837

  • Support nested tuples #6809

  • Syntax error String::fromwe() should be String::from() #6846

  • Update SimplifyInference documentation #6853

  • Dynamic scale, zero point in qnn.op.dequantize #6849

  • Update path in arm_compute_lib.rst #6861

  • Using diagnostics for TVM Script #6797

  • ‘tvmc tune’ --rpc-tracker and --rpc-tracker fail due to argparse misconfiguration #6822

  • Add smmla/ummla support in quantized Conv2d #6802

  • Support for more ops (conv1d) #6731

  • conv1d_transpose speedup #6840

  • Fix bug in processing script #6867

  • Update search for bitcode files for rocm 3.9 #6865

  • More flexible conv2d_NCHWc_int8 generic operator. #6714

  • Fix the build error for wasm-standalone app #6862

  • Improve the order of tutorials within a subsection #6880

  • TF frontend: add rint op #6818

  • Fix GCC8.1 and GCC8.2 template dispatch compilation issue #6893

  • Fix bug of generate-unmatched-brackets in CodeGenC::PrintSSAAssign #6887

  • Bump the versions #6896

  • Fix the cmake error for TensorRT #6902

  • Dynamic gpu tests, add dynamic strided slice to topi #6870

  • Bump up tophup cuda version #6908

  • Allow to set number of threads to TFLite interpreter #6901

  • Consolidate RPC Context helper functions #6915

  • Make TVMLogf platform-independent #6916

  • Handle int64 dtype in range #6918

  • Handle weights in shape func #6912

  • Minor improvements for auto-tuning tutorials #6919

  • Extract tasks via compile engine #6903

  • Make AutoScheduler handling of errors during measure consistent with AutoTvm #6909

  • timeout is not passed correctly #6924

  • Fix typo #6920

  • Add Handling of Zero Len Arguments #6923

  • Explicitly use new to avoid exit-time destruction of global state #6938

  • fix tvm.relay.build() docs #6940

  • Fix tir allocation with multiple lanes #6941

  • AArch64 base algorithm refactoring in LLVM #6907

  • Cleanup comments in partition pass #6951

  • Lazy import XGBoost #6939

  • Fix #6954 uTVM, fix when building the runtime for native hardware #6957

  • Raise ImportError for XGBoost #6969

  • Bug fix for debug builds in micro_session.cc #6968

  • Don’t fuse take with dynamic inputs #6979

  • bumping vta version #6977

  • Add Relay option to link parameters into runtime Modules #6917

  • Add initial support for quantized transpose convolution in Relay #6899

  • Fix GraphRuntime with -link-params over RPC #6985

  • Fix C runtime NDArray allocation bug #6991

  • fix rust installation in CI #7004

  • use target_host when it is set #6855

  • Dynamic Batch Support for TRT #6955

  • Fix call mkl gemm in mkldnn.py #7007

  • Fix trt Test #7016

  • Fix edge cases in const_int_bound and fold_scale_axis #6911

  • Add environment variable for controlling top-level printing and fix issue with pretty printing/parsing roundtrip. #6874

  • Save PyTorch frontend state in object #7023

  • Prefer IPv4 between IPv4 and IPv6 #7013

  • Add version 11.1 in finding CUDA libdevice #7033

  • remove print from GetInputIndex #7027

  • Implement Keras Conv1D #7035

  • Attach span information to tir nodes in tvmscript #6910

  • Part.1 metal default hardware params #7022

  • Support atomic for GPU backend (NVPTX, ROCm) #7051

  • fix missing ffi binding of relay.attrs.DequantizeAttrs #7054

  • Fix nvcc compile option to be compatible with older cuda #7065

  • PyTorch frontend: make type inference incremental #6900

  • Compatibility improvement with XGBoost v1.3.0 #7069

  • Fix QNN type inference #7074

  • Add a standalone regression test for running MergeComposite on a QNN graph #7080

  • Rollback changes to SSA begin/end scope for Store in C codegen #7073

  • Fix missing header inclusion. #7097

  • clean standalone CRT files in microTVM VM rebuild script #7095

  • support int64 #7105

  • update language version #7116

  • Remove support of double type #7118

  • add support for StridedSlice to input a single constant #6949

  • Fix spelling in some comments #7124

  • add cl support in tvmc runner #6831

  • Add is_floating_point and div_ PyTorch ops #7128

  • Fix a bug in batch_matmul that te.max should be used #7111

  • Update is_floating_point to handle bfloat16 #7133

  • PopenPoolExecutor #6959

  • Compatibility improvement with XGBoost v1.3.0 #7076

  • Added additional information to the from_onnx tutorial #7127

  • Fix a few OpNode argument field descriptions when registered #7140

  • Created CSourceMetaData module for model metadata #7002

  • Add is_floating_point() test and better type support in verify_model_vm() #7134

  • Add a FunctionPattern, remove unused attributes in CallPattern #7151

  • missing header for GraphRuntimeFactory in android_rpc #7160

  • Slight optimize the default injective schedule #7158

  • Fix PyTorch NMS conversion for negative scores #7137

  • Update the docs stale links #7169

  • Support hard_swish op #7174

  • Asymmetric padding and dilation in conv2d workload #7142

  • Makes sure g_last_error is null terminated. #7190

  • Fix ICHECK_NOTNULL in logging.h #7193

  • Fixed temporary lock_guard instances. #7199

  • Parallel Cuda Mergesort #7099

  • Support dynamic batch size #7194

  • slice_like support #7184

  • Simplify cast #7045

  • Support transpose #7214

  • Fix Get Valid Counts when the number of boxes is zero #7229

  • Fix code to work with cmake 3.2 #6952

  • avoid unexpected value(1) of search space when get length for uninitiated search space #7175

  • avoid unexpected value(1) of search space when get length for uninitiated search space" #7236

  • Add index boundary check in ConfigSpace.get() #7234

  • Add autoscheduler support to tvmc #7070

  • Do not use ICHECK in nnvm #7255

  • Add op_name in error message for Pool #7243

  • Remove check_correctness in AutoTVM, which is busted #7250

  • Restore class-aware NMS for detection models by graph rewrite #7154

  • add default value for leaky relu alpha #7259

  • Faster multi dimensional argsort by segmented sort #7195

  • Change the all #pragma once to ifdef include guard #7264

  • Reorder dynamic to static and simplify inference, lower DynamicToStatic Opt Level #7213

  • batch_matmul tensorcore schedule #7146

  • Ensuring atleast one thread block to handle empty tensor #7273

  • switch to more portable bash pipeline syntax #7274

  • Add MicroTVM support for the STM32F746 Discovery board #7225

  • Bring back numbered lists to TVM docs. #7290

  • Per-input, data dependence specification for shape func #7210

  • Initial BYOC support with c-source module #6950

  • A few typo fixes in the uTVM design doc. #7291

  • Change const to used dtype if it is passed in #7285

  • Import errors in deploy_detection.py and deploy_classification.py #7059

  • Fix test_topi_batch_matmul_tensorcore.py:test_batch_matmul requirement #7294

  • Add QEMU setup to uTVM tutorial. #7296

  • Add gpu instructions and results to deploy_sparse #7298

  • Parallelize cumsum in get_valid_counts #7123

  • Adding aten::unsqueeze_ to PT Frontend #7231

  • Fix an issue with dynamic functions overwritting call arg types #7295

  • Made tensorflow IsNan actually work #7320

  • Add a shape function and dynamic test for round #7324

  • Relax tolerance for dlpack <-> pytorch test #7325

  • get_top_results works on a copy of output #7327

  • Autoscheduler on ARM devices #7326

  • Fix warning showed with GCC10 #7336

  • use wrong flag name #7341

  • Add resource_handle to TVM_DLL_EXPORT_TYPED_FUNC. #7338

Rust

  • Add initial boilerplate for Rust diagnostic interface. #6656

  • : maintain error sources when propagating errors #6815

  • Flesh out IRModule methods #6741

  • Impl IsObjectRef for Array #7138

  • More Rust bindings for Attrs #7082

Bugfix

  • Fix leak when Packed callback arg is ndarray. #6821

  • Fix recursive GetFunction in runtime::Module #6866

  • Fix recursive GetFunction in runtime::Module #6859

  • Change debug_runtime to represent times in seconds internally #7227

  • Ensure CallNode attrs are not undefined before checking #7278

Autoscheduler

  • New layout rewrite option: Weight pre-transpose #6750

  • Bug fix for layout rewrite CI error in i386 #6830

  • Register auto-scheduler to more ops #6879

  • Fix the occasional crash caused by split memo #6883

  • Improve tuning with random cost model #6835

  • Add winograd support in tuning networks #6877

  • Tutorial on auto-scheduling a network for GPU #6882

  • Fix task scheduler restoring #6934

  • Improve warning messages #6935

  • Make SearchTask and ComputeDAG serializable #6842

  • Register workload when deserializing tasks #6927

  • Strictly select impl using plevel #6956

  • Task scheduler callbacks #6945

  • Fix task extraction #6965

  • Print the time used for measurement #6972

  • Check duplicated names in the compute dag #6973

  • Accelerate feature extraction for winograd #6981

  • Skip useless calls to RewriteLayout #6993

  • Use a smaller retry number #6996

  • Use a smaller iteration number for GA to acclerate the search #6994

  • Support layout rewrite for whole networks #6987

  • Add a tutorial on auto-scheduling a network for x86 CPU #7019

  • Misc update to hardware parameter and task scheduler #7020

  • Refactor task interface for tuning single operators #7028

  • Improve CPU matmul tutorial #7037

  • Remove max_registers_per_block in HardwareParams #7040

  • Add tips on resuming the search from a log file #7039

  • Delete deprecated file auto_schedule.py #7071

  • Fix winograd infer tize #7092

  • Support string processing to records #7144

  • Improve SearchTask and ComputeDAG serialization #7145

  • Python based measure callbacks #7143

  • Fix the conflict of thread pool in measurement #7166

  • Improve hyperlinks in tutorials #7167

  • Enable winograd for conv2d and layout rewrite for conv3d #7168

  • Update layout rewrite option setting for measuring #7156

  • Use VM to extract tasks for dynamic models #7173

  • Add custom build function #7185

  • Control compile engine cache via PassContext #7220

  • Costmodel enhancement #7197

  • Do not return naive schedule in tracing mode #7226

  • Fix for zero-rank output #7180

  • Add layout rewrite support for dense and batch matmul on CPU #7161

  • Fix layout rewrite for iterator with extent=1 #7279

  • Fix typos in feature extraction and cost model #7280

  • Bug fix & Custom sketch support #7260

  • Fix conv3d’s op strategy for auto-scheduler #7328

  • Separate shapes from DAG hash and enable schedule sharing #7317

Docs

  • Enable theme with header and footer. #6834

  • Improve windows build instruction via conda #6944

  • Update to reflect the repo name change #6967

  • Document cloudpickle dependency in tutorials #7049

  • Fix figure links #7268

Byoc

  • FTVMAnnotateTarget method signature update #6786

  • 20.05 memory corruption temporarely fix #6724

  • Vitis-AI codegen integration #6343

  • Allocate GPU data buffers and transfer data when needed #6872

  • handling dynamism in TensorRT to support OD models #6905

  • Use channels attribute in Conv2D op converter #7011

  • Support batch norm for all ranks <=5, and all axes #7026

  • Added “nclude_non_call_ops” parameter to AnnotateTarget pass #6655

  • include_non_call_ops = False #7121

  • Fix TRT conversion for reshape op - ReshapeAttrs no longer has reverse #7205

  • Depthwise convolution support #7206

  • Fix weight conversion when first dim of weight shape is 1 #7253

  • Handle empty tuples in annotation pass #7288

  • removed ACL 20.05 limitations #7251

  • add support to dynamically load hardware library #7286

Relay

  • SparseTensorDenseMatMul support for Tensorflow #6685

  • If Operator Support #6730

  • Support MXNet-style attributes for reshape_like #6851

  • Fix first-order AD on tuple arguments #6827

  • Mix mode type inference #6704

  • roi_pool operator alter layout #6516

  • Keep node name in span #6885

  • Add dynamic SparseToDense #6892

  • Add space_to_batch_nd and batch_to_space_nd operators #6477

  • fix unparsable yolo formals #6963

  • Add scatter_nd op #6854

  • Add DefuseOps pass #6946

  • Clean up DCE tests in preparation for refactoring. #7029

  • Add support for Size op in Onnx frontend. #7031

  • Fix GPU NMS when return_indices is True #7005

  • MaxUnpool Operator #7036

  • Support deformable Conv2D NHWC #7075

  • Allow cuda cross compilation without physical device. #7063

  • Add softplus operator conversion to Onnx. #7089

  • Support deformable conv2d #7087

  • Auto extract onnx input shapes when possible. #7115

  • Add Sort Op to Relay #6978

  • Add fast_softmax #7163

  • Stack should take exprs that evaluate to tuples #7130

  • Remove reverse attribute from reshape and reverse_reshape operators. #7086

  • Fix mismatch between Onnx Prelu definition and importer. #7208

  • Allow condition in if op to be an array. #7215

  • Fix reshape header file #7218

  • Threefry PRNG: splittable and stateless #7083

  • Compare against onnxruntime more consistently during testing #7300

  • Add more gradients #7323

  • Fix tanh gradient and update tests to use downstream gradient #7340

  • Add numpy style cumsum op #7334

Fix

  • Add task_ci_python_setup.sh to the arm CI #6850

  • Skip RPC tests when using multiprocessing’s spawn method #6858

  • disable cuda test for argwhere #7042

  • Improve error messages and docs #7064

  • Remove debugging print statement #7072

  • Update tune_relay_vta.py to support single board #7100

  • Fix using num_workers in omp #7078

  • Add dense strategy for mali #7181

  • Tensor core type issue for dense #7187

  • Import tvm.testing in tutorials that use it #7248

  • Remove leftovers from check_correctness #7272

  • Add flop counts to cublas #7297

  • Infer input shape in sparse_dense_padded’s alter_op if one does not exist #7308

Μtvm

  • Add virtual machine, test zephyr runtime on real hardware #6703

  • Fix problems with the debug flow #6930

  • Remove binutils module, no longer needed after microTVM refactor. #6947

  • Demote session traffic logs to DEBUG log level #6989

  • Include required CMSIS headers in Cortex-M micro kernel. #6988

  • Minor fixes to the Reference VM tutorial #7012

  • Modify reference VMs to support new ”TVM demo #7001

  • Fix paths in the reference VM tutorial and add vbguest recommendation #7015

  • Allow for platform-specific malloc in runtime #6948

  • Add platform timer and RPCTimeEvaluator to enable AutoTVM #6964

  • Raise a better error when project_dir does not exist #7165

  • Add documentation #7164

  • Avoid listing links when probing serial ports #7265

  • Fix two warnings when deprecated forms are used #7269

  • Remove need for -mcpu=native #7276

  • Add ST STM32F746 disco board to tflite tutorial script #7254

  • Add TVMPlatformGenerateRandom, a non-cryptographic random number generator. #7266

Topi

  • Enable scatter_add on GPU #6856

  • deformable_conv2d in NHWC #6999

  • Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

  • cuda for argwhere #6868

  • GPU scatter_add using atomic #7044

  • GPU scatter 1D via sorting based approach #7056

  • sparse_dense Op sparse_data input added #6889

  • Fix GPU Dynamic Op Schedule #7117

  • Simplify GPU NMS IR and optimize a bit #7136

  • cuda reduction schedule #7131

  • GPU sort IR refactor to enable sort by keys #7157

  • Parallelize GPU NMS inner loop #7172

  • Treat undefined elements as constants in Array #7232

  • Improve memory layout inside GPU NMS kernel #7257

  • Minor perf improvement for GPU scatter #7233

  • Make cumsum IR reusable, add thrust scan #7303

  • Rewrite GPU argwhere using exclusive scan #7314

Tir

  • Make loop unrolling in LoopPartition optional #6823

  • Do not show meta data when printing TIR #6881

  • Add spans to all ExprNodes #6860

  • Enforce allocate to use the correct var pointer hint. #7216

  • Support Return in TIR #7084

  • ForNode introduce thread binding and remove legacy field #7306

Community

Frontend

  • Support NonMaxSuppressionV5 #6933

  • Prevent tflite frontend from producing int64 shape/parameters #7030

  • Handle case where output of model is python list #7088

  • Unnecessary default warning msg changed to debug #7119

  • Support mode=instance,spatial for l2_normalize #7062

  • Remove seemingly invalid SoftPlus #7189

  • add _npi_subtract_scalar #7191

  • add _npi_stack, issue #7186 #7209

  • Densify Op added #7048

  • Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86 #7148

Patternlang

  • Remove unnecessary check #6958

  • Add Syntatic Sugar to the C++ pattern API and support DataType Attribute Matching #7120

  • Add If pattern #7282

  • Add a relay LetPattern #7332

Verilator

  • Integrating and simulating hardware accelerators in TVM #6971

  • Multiple fixes #6995

  • regression tests #7000

  • Separate Verilator dependency from Chisel dependencies #6986

Vta

  • Fix the shape check for vta dense strategy #6983

  • add device_annot support in graphpack #6125

  • update 3rdparty submodule #7081

  • update version of 3rdparty vta-hw submodule #7271

Auto scheduler

  • Support Auto scheduler and NHWC convolution on ROCm #7038

  • Add target host to measure record #7046

  • Fix infer tile size for NHWC winograd for CUDA #7068

  • Mali Support #7132

Tflite

  • added scalar axis value handling in reduce #6970

  • add support for float16 #7093

  • pack operation extedned with const args #6984

  • Reshape - support different qnn params for input and output #7159

  • Quantized version of unit test for Dense #7113

  • Added ability to infer shapes for arguments #7293

  • Strided slice handling of shrink_axis_mask improved #6998

Onnx

  • NMS in ONNX #6839

  • Fix a bug with reshape imports when an initialized target shape is used more than once #7109

  • Fix issues for Clip and RoiAlign #7237

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)

Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (109), comaniac (75), zhiics (68), masahi (40), junrushao1994 (33), FrozenGene (31), merrymercy (30), tmoreau89 (23), giuseros (23), mbrookhart (20), jroesch (19), kevinthesun (19), tkonolige (17), anijain2305 (15), jcf94 (14), mbaret (12), Laurawly (11), jwfromm (10), vinx13 (9), icemelon9 (8), MarisaKirisame (8), leandron (8), trevor-m (8), ZihengJiang (7), areusch (7), liangfu (6), u99127 (6), electriclilies (5), altanh (5), yzhliu (4), ANSHUMAN87 (4), lhutton1 (4), manupa-arm (4), siju-samuel (3), wweic (3), t-vi (3), yongwww (3), srkreddy1238 (2), lixiaoquan (2), apivovarov (2), soiferj (2), antinucleon (2), cbalint13 (2), spectrometerHBH (2), insop (2), adelbertc (2), kparzysz-quic (1), cchung100m (1), rkimball (1), gussmith23 (1), jmorrill (1), mwillsey (1), Hzfengsy (1), hogepodge (1), wrongtest (1), samskalicky (1), anwang2009 (1), jtuyls (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

merrymercy (33), mbrookhart (27), comaniac (22), tqchen (18), areusch (17), yongwww (14), masahi (13), tkonolige (11), jwfromm (8), lixiaoquan (7), trevor-m (7), antinucleon (7), d-smirnov (7), codeislife99 (7), jroesch (6), giuseros (6), zhiics (5), kevinthesun (5), FrozenGene (5), ANSHUMAN87 (5), alexgl-github (5), anijain2305 (4), vegaluisjose (4), liangfu (4), t-vi (4), tobegit3hub (4), zhanghaohit (4), euntaik (4), tmoreau89 (3), jcf94 (3), rkimball (3), hypercubestart (3), altanh (3), manupa-arm (3), alter-xp (3), TylerADavis (3), siju-samuel (2), ZihengJiang (2), kazum (2), junrushao1994 (2), slyubomirsky (2), mbaret (2), leandron (2), cbalint13 (2), gussmith23 (2), insop (2), hogepodge (2), wrongtest (2), CaramelFc (2), lsy643 (2), Wheest (2), icemelon9 (1), yzhliu (1), vinx13 (1), Laurawly (1), wweic (1), mshawcroft (1), u99127 (1), xqdan (1), electriclilies (1), csullivan (1), tom-gall (1), hzfan (1), tristan-arm (1), leonwanghui (1), solin319 (1), Beya2019 (1), Meteorix (1), hgt312 (1), samskalicky (1), anilmartha (1), leowang1225 (1), bernhardklein (1), Xuxue1 (1), domin1985 (1), rohanmukh (1), 0x00-pl (1), adelbertc (1), BhushanIMG (1), corehalt (1), dsteger (1), echuraev (1), Light-of-Hers (1), TaylorZowtuk (1)

3 Likes