TVM Monthly - April 2020

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

We welcome one new commiter @liangfu and two new reviewers @mbaret, @kparzysz joined the community in the previous month.

Meanwhile this forum got 102k pageviews and 3.1k user visits.

Features and Improvements

In the previous month, the commnunity continues to enhance the runtime Object system, which now has better support for NULL, String and Integer. Tensor Level IR and its namespace are getting refactored to prepare for further enhancement. At Relay level, Bring You Own Codegen (BYOC) has better support for partitioning and external code generation. Static Tensor Array is introduced to provide easier type inference and optimization for dynamism in models. Moreover, developers now can use non-recursive visitor to traverse large graphs. In the aspect of mobile and edge devices support, Hexagon runtime is now included in TVM runtime. MicroTVM added AutoTVM support for Cortex-M7. As usual, there is a load of update for framework parser. MLIR support is under discussion.

IR

  • [TIR] Enhance Substitute, python bindings for Substitute/PostOrderVisit (#5400)
  • [TIR] Remove ProducerConsumer and AllocateNode::new_expr (#5333)
  • [IR][TRANSFORM] Enable CopyOnWrite for TIR passes. (#5309)
  • [NODE][IR] Introduce StructuralHash for the Unified IR. (#5160)
  • [TIR] Introduce BufferLoad/Store (#5205)
  • [NODE] General serialzation of leaf objects into bytes. (#5299)
  • [POC][IR] Initial stab at std::string->String upgrade (#5438)
  • [NODE][IR] Introduce StructuralEqual Infra for the unified IR. (#5154)
  • [TIR] Make lower_warp_memory support extent(threadIdx.x) < warp_size (#5307)
  • [TE] Support mixing normal and cross-thread reduction (#5193)
  • [TIR][PASS] dtype rewrite for indexing variables (#5092)

Arithmetic

  • [Arith] linear system and equation solver (#5171)
  • Improve IntervalSet’s floormod (#5367)

Relay

  • [BYOC] Bind constant tuples in graph partitioner (#5476)
  • [RELAY][BYOC] Add support for composite functions in BYOC (#5261)
  • [RELAY][BYOC] Register pattern tables from external codegens (#5262)
  • [BYOC] Enhance partitioning and external codegen (#5310)
  • [Relay][ADT]Static Tensor Array (#5103)
  • [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (#5277)
  • [BYOC] Use Non-Recursive Visitor/Mutator (#5410)
  • [BYOC] Refine DNNL Codegen (#5288)
  • [RELAY] Non-recursive Graph Vistor and Rewriter (#4886)
  • [Blocksparse] Pipeline for lowering dense model to sparse-dense (#5377)
  • [BYOC] Prevent duplicate outputs in subgraph Tuple (#5320)
  • [RELAY] Re-wrote the Graph Partitioner to support multiple outputs (#5143)

Framework Support

  • [Frontend] Asymmetric padding of convolution support (#4803)

ONNX

[ONNX]Pool3d & upsample3d op support (#5135) Add TopK to ONNX Frontend (#5441) Add RoiAlign to Onnx frontend (#5454)

Torch

  • [PYTORCH]AvgPool3d, MaxPool3d and Squeeze op support (#5220)
  • [PYTORCH]celu, gelu, selu activations (#5263)
  • [Pytorch]layernorm bug fix and testcase updated (#5257)
  • [PYTORCH]LayerNorm support added (#5249)
  • [RELAY-OP][PYTORCH]GroupNorm op support added (#5358)
  • [TOPI][PYTORCH]Logical & Bitwise operator support (#5341)
  • [PYTORCH]Tensor creation ops support (#5347)
  • [RELAY][PYTORCH]cosh,sinh,log2,log10,log1p op support (#5395)
  • [PYTORCH]Rsub, Embedded, OneHot ops support (#5434)
  • [PYTORCH]Abs, Arange, Softplus ops (#5295)
  • [RELAY][PYTORCH]isNan, isinf, isfinite, ceil, clamp, round ops (#5316)
  • [PYTORCH]Activations for pytorch (#5194)
  • [PYTORCH]Repeat, Reciprocal & Reshape Op support (#5280)
  • [PYTORCH]Reduce_ops support added (#5308)
  • [PYTORCH]Take, Topk op support (#5332)
  • [PYTORCH]Dropouts And InstanceNorm support added (#5203)
  • [PYTORCH]Unary Ops frontend support. (#5378)
  • [PYTORCH]where, addcdiv, addcmul op support (#5383)
  • [Torch] Support Python list, more realistic recurrent networks (#5306)
  • [Torch] Add support for split (#5174)
  • [Frontend][Torch] Fix up graph input handling (#5204)

Tflite

  • [FRONTEND][TFLITE]Logical not op support (#5475)
  • [TFLITE]Hard Swish & MobilnetV3 model testing (#5239)
  • [FRONTEND][TFLITE]Gather, StridedSlice op support added (#4788)
  • [TFLITE] Match TFLite shape for SSD custom op (#5473)
  • Factor out import of common tflite.Operator in tflite frontend. (#5355)
  • [Frontend][TFLite] support for FILL and SPLIT_V operators (#5330)
  • [Frontend][TFLite] L2_POOL_2D operator (#5452)
  • [TFLite] Add config option to specify FlatBuffers location (#5425)
  • [FRONTEND][TFLITE]Logical not op support (#5475)

Tensorflow

  • [TENSORFLOW]reduce ops updated (#5180)
  • [FRONTEND][TENSORFLOW] Fix gather_nd indices (#5279)
  • [Frontend][TensorFlow]Improve TensorFlow Static Shape Tensor Array (#5243)

Keras

  • [KERAS]Minimum & AlphaDropout op support (#5380)
  • [KERAS]Upsample3d & ZeroPadding3d op (#5125)
  • [KERAS]Embedding layer (#5444)
  • [FRONTEND][KERAS]Max_pool3d and Averagepool3d operator support (#5085)

Caffe2

  • [RELAY][FRONTEND][CAFFE2] add Mul and ConvTranspose operator (#5302)

MXNet

  • [MXNET]DepthToSpace & SpaceToDepth Operator (#5408)
  • [MXNET]broadcast and logical op support (#5461)
  • [FRONTEND][MXNET] Use leaky by default for LeakyReLU (#5192)
  • [FRONTEND][MXNET] support elemwise logic ops (#5361)
  • [Frontend|MXNet] SwapAxis operator support (#5246)

Object and Python Frontend

  • [PYTHON] Enhance with_attr API, cleanup MakeAPILegacy in testcases (#5335)
  • [Runtime][Object] expose runtime::String to Python (#5212)
  • [PYTHON] Make IntImm more like an integer (#5232)
  • [PY][FFI] Refactor runtime.String to subclass str (#5426)
  • [RUNTIME][IR] Allow non-nullable ObjectRef, introduce Optional. (#5314)
  • [RUNTIME][OBJECT] Introduce static slots for common objects. (#5423)
  • [RUNTIME] Introduce RValue reference(move) support to TypedPackedFunc (#5271)
  • [RUNTIME] Auto conversion from str to runtime::String in PackedFUnc (#5251)
  • [RUNTIME] Improved Packed FFI for optional. (#5478)

Operator support

  • [TOPI] Using x86 schedules for ARM conv2d (#5334)
  • [TOPI-ARM] Do not alter layout if layout is NHWC (#5350)
  • [TOPI] Setting workload correctly for Depthwise Spatial conv ARM. (#5182)
  • [Relay][OP] Add fast_erf implementation (#5241)
  • [Topi] Tensorcore support for Conv3D (#5284)
  • [intrin] a few more math functions (#5468)
  • [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign (#5312)
  • [relay][topi] Add operation relay.nn.dilate() which calls topi.nn.dilate() (#5331)

Performance and AutoTVM

  • [Topi x86] Missing vectorize for depthwise conv2d. (#5196)
  • [TOPI x86] Adding unroll_kw config option for depthwise conv2d. (#5197)
  • [Runtime][Contrib] Support cudnn softmax (#5214)
  • [cuDNN] Add cuDNN grouped convolution support (#5319)
  • [Relay][Topi][AutoTVM] Winograd support for Conv3D (#5186)
  • [TOPI] Improve get_valid_count and nms performance for CUDA (#5339)
  • [Topi][Cuda]Optimizations of global_ave_pool for NHWC layout (#5450)

Backend

  • [LLVM] Do not use x86_vcvtph2ps_256 intrinsic with LLVM 11+ (#5267)
  • [LLVM] Use llvm::ElementCount with LLVM 11+ when creating vectors (#5265)
  • [LLVM] Use llvm::FunctionCallee in IRBuilder::CreateCall with LLVM 11+ (#5338)
  • [LLVM] Include Support/Host.h for declaration of getDefaultTargetTriple (#5268)
  • [LLVM] Replace calls to Type::getVectorNumElements (#5398)
  • [LLVM] Use ArrayRef in calls to CreateShuffleVector (#5399)
  • [LLVM] Use llvm::Align with LLVM 11+ to avoid warnings (#5264)

Runtime

  • [uTVM][Runtime] Introduce Virtual Memory Allocator to CRT (#5124)
  • [RUNTIME] Initial implementation of Hexagon runtime support (#5252)
  • [Hexagon] Add hexagon_posix.cc to TVM/RT sources in the right place (#5346)
  • [RUNTIME] FastRPC interface for Hexagon runtime (#5353)
  • [RUNTIME][CONTRIB] CoreML Runtime (#5283)
  • [Runtime][Relay][Cleanup] Clean up for memory pass to enable heterogenous execution support. (#5324)
  • [RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 (#5417)
  • Windows Support for cpp_rpc (#4857)
  • [RUNTIME] Implement TVMDSOOp(TensorFlow custom op) for TVM runtime (#4459)

QNN and quantization

  • [Requantize] Cleanup and Optimize Lowering (#5286)
  • [Topi, ARM] Disbale Winograd for quantized tensors. (#5363)
  • Adding support for TFLite QnnSubtract operator. (#5230)
  • Remove developer facing api from frontend exports. (#5375)

Infra and Refactor

  • [REFACTOR][TIR] Migrate LowerTVMBuiltin, InferFragment, LowerThreadAllreduce, ThreadSync to Pass Manager (#5213)
  • [TIR][REFACTOR] Remove te::Tensor dependencies from TIR passes. (#5372)
  • [REFACTOR][TE] Inline -> te/schedule/operation_inline.h (#5386)
  • [TIR] Refactor MakePackedAPI to target dependent stage. (#5326)
  • [REFACTOR] tvm.hybrid -> te.hybrid (#5223)
  • [REFACTOR][TIR] Migrate most of low-level build to use the Pass Manager. (#5225)
  • [TIR][REFACTOR] Migrate low-level passes in tvm.lower to the Pass Manager (#5364)
  • [TIR] Migrate VTA TIR passes to the new pass manager. (#5397)
  • [REFACTOR][TIR] Migrate all low-level passes to the Pass Manager. (#5233)
  • [PY][FFI] Refactor runtime.String to subclass str (#5426)
  • [REFACTOR][TIR] Introduce ExprDeepEqual, Remove IRDeepCompare (#5206)
  • [RELAY] Remove re-exports of tvm.transform (#5337)
  • [ARITH] Remove legacy const pattern functions (#5387)
  • [REFACTOR][ARITH] Remove the legacy Simplify, migrate to Analyzer. (#5385)
  • [TIR][REFACTOR] RewriteForTensorCore -> te/schedule (#5379)
  • [RELAY] Move frontend utils (#5345)
  • [REFACTOR][IR] Move to runtime::String (#5276)
  • [REFACTOR][IR] kExternalSymbol -> kGlobalSymbol (#5211)
  • [REFACTOR][IR] Remove PrimExpr from String (#5311)
  • [Topi] Breakdown topi.cc into smaller files (#5253)
  • [Refactor] Add memoized expr translator for use by backend codegen (#5325)
  • [CodeGen] Cleanup generated code (#5424)
  • [IR][Debug] Add dump and print for debugging (NFC) (#5207)
  • Customize SI prefix in logging (#5411)
  • [TIR][REFACTOR] Remove ir_pass in favor of analysis/transform. (#5415)
  • Legalize - Use Non-recursive Rewriter. (#5296)

CI and tests

  • [CI] Fix the hexagon string (#5304)
  • [CI] Temporary disable CRT test (#5297)
  • [CI][DOCKER] Update ci-gpu to the lastest (#5469)
  • Removing older Object detection TFlite test (#5477)
  • [CI] Enable tsim and fsim for GPU build to avoid pack_lib error (#5352)
  • [CI] Update MxNet to 1.6.0 with MKL (#5240)
  • [LINT] Remove scalalint from lint deps (#5269)
  • [Rust][CI] Restore Rust CI (#5137)
  • [RFC] Pytest environment improvements (#5421)
  • [CI] Migrate Tensorflow and Tensorflow lite in CI to 2.1.0 (#5392)
  • [CI] Fix build.sh to propagate --network=host to the docker build command (#5336)
  • [CI] Include local Docker images as source for layers (#5466)
  • [TFLite Runtime] Add TFLite Runtime dependencies to CI CPU docker build (#5437)

Docs

  • [DOCS] Migrate some markdowns to rst, fix sphinx3 warnings (#5416)
  • [DOCS] Misc docs improvements (#5222)
  • [DOCS] Bring relay docs to the top-level flat view (#5343)
  • [DOCS] Reduce artifcats generated by sphinx gallery (#5208)
  • [DOCS] Use https link (#5183)
  • [DOCSTRING]missing function parameters updated (#5228)
  • [DOCS] Migrate HLS documents from md to rst (#5419)
  • [BYOC] Add example of Composite + Annotate for DNNL fused op (#5272)
  • [Tutorial, QNN] Add tutorial for loading quantized PyTorch model (#5321)
  • [Docs] VTA install doc migration from md to rst (#5442)
  • [TVM][docs] compiler version in docs (#5281)

Fixes

  • [BUGFIX] Fix CRT static test bug (#5293)
  • [RUNTIME] Quick fix PackedFunc String passing (#5266)
  • [TIR] Fix perf regression of tir refactor (#5258)
  • [BUGFIX]bugfix in tensorflow space_to_batch_nd (#5175)
  • [RUNTIME][CRT]Compilation warnings fixed for 32bit and 64bit compilation (#5349)
  • [BYOC][FIX] Fix typo in “default” (#5348)
  • [RELAY][FIX] Fix hang in MergeCompilerRegions (#5227)
  • [RELAY] Fixes to MergeCompilerRegions (#5195)
  • [LLVM] Fix generation of LLVM intrinsics (#5282)
  • Fix setting up hints for getaddrinfo (#2872)
  • [Fix] Add ConstantNode to IsAtomic (#5457)
  • [BUGFIX][IR] Fix String SEqual (#5275)
  • [FIX][VM] fix fuse over functions that are handled by external codegen (#5365)
  • [Relay] Fix memory leak when accessing NDArray (#5413)
  • [Fix] Remove the duplicate PrintIR pass in Relay (#5403)
  • [TIR] Fix lower_warp_memory (#5247)
  • [TIR] Fix lower_warp_memory when there are >1 warp buffers (#5368)
  • [External codegen] Add test cases for fused ops with manual annotation (#4741)
  • Fix intel conv2d auto tune (#5200)
  • [CodeGen][CUDA] Fix bugs (#5209)
  • Don’t remove() TemporaryFile in del. (#5414)
  • Fix test_ir_type. (#5390)
  • [Relay][Frontend][Onnx] Fix multiple identical inputs bug (#5389)
  • [Relay][Strategy] Add cuda target check to dense tensorcore schedule. (#5376)
  • Tf2 test fixups (#5391)
  • [REALY][OP] fix typo (#5315)
  • [Node] Provide guide to user who has difficulty register SEqualReduce (#5300)
  • [NDArray] Set NDArray::Container.shape_ in NDArray::FromDLPack (#5301)
  • fix miopen padding (#5433)
  • misc fixes for ROCm (#5431)
  • Create loops according to storage scope and thread hierarchies (#5190)
  • [RELAY] Partition graph codestyle fixes (#5202)
  • [TE][BuildModule] Fix import in dump pass ir (#5327)
  • docker: Drop caffe2 download progess bars (#5359)
  • [Fix][VM] Fix copy constructor (#5237)
  • [Relay][Tutorial][Fix] Fixed typo and type mismatch in relay infrastructure tutorial (#5259)
  • Corrected TVM autotuning on GPU (#5432)
  • [CODEGEN][CUDA] Fix vector load (#5226)
  • [Fontend][Pytorch] Fix translation of transpose when axis argument is as a list (#5451)
  • [TE] Minor bugfix in message_passing.cc (#5254)
  • [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for… (#5428)
  • fix to skip node not in graph. (#5238)
  • fix #5388 [RUNTIME][VULKAN] vkBuffer released before memory copy command se… (#5418)
  • [BUGFIX][RELAY]fix a minor error in device_annotation (#5291)
  • [VTA] Fix VTA compile issue (#5481)
  • [RUNTIME][CRT] scalar’s ndim is 0 (#5344)

Submodule

  • [SUBMODULE] Update dmlc-core to latest (#5401)

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (125), zhiics (52), masahi (46), tmoreau89 (29), FrozenGene (27), icemelon9 (19), yzhliu (18), ZihengJiang (17), anijain2305 (17), comaniac (17), Hzfengsy (12), MarisaKirisame (10), siju-samuel (9), u99127 (9), vegaluisjose (8), mbaret (8), jroesch (7), jwfromm (7), wpan11nv (7), vinx13 (6), kevinthesun (6), junrushao1994 (6), soiferj (5), kazum (4), liangfu (4), manupa-arm (4), Laurawly (3), wyc-ruiker (3), trevor-m (3), hzfan (3), merrymercy (2), inadob (2), shoubhik (2), cbalint13 (2), maheshambule (2), mbrookhart (2), roastduck (2), kevinyuan (2), wweic (1), apivovarov (1), hlu1 (1), Huyuwei (1), yongwww (1), cchung100m (1), weberlo (1), huajsj (1), sxjscience (1), antinucleon (1), jmorrill (1), yongfeng-nv (1), lhutton1 (1), spectrometerHBH (1), Dayananda-V (1), windclarion (1), alexwong (1), llehtahw (1), binarybana (1), JishinMaster (1), mehrdadhe (1), Shawn-Inspur (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (49), siju-samuel (39), liangfu (23), mbaret (13), kparzysz-quic (13), anijain2305 (12), zhiics (9), icemelon9 (9), kazum (6), roastduck (6), masahi (5), jroesch (5), kevinthesun (5), wpan11nv (5), areusch (5), comaniac (4), jwfromm (4), u99127 (4), maheshambule (4), mbrookhart (4), shoubhik (3), windclarion (3), michalpiszczek (3), yzhliu (2), tmoreau89 (2), hlu1 (2), antinucleon (2), t-vi (2), leandron (2), ANSHUMAN87 (2), jmorrill (2), yongfeng-nv (2), trevor-m (2), manupa-arm (2), hzfan (2), vinx13 (1), Laurawly (1), mshawcroft (1), vegaluisjose (1), FrozenGene (1), yongwww (1), lixiaoquan (1), inadob (1), junrushao1994 (1), wyc-ruiker (1), eric-haibin-lin (1), lhutton1 (1), dpankratz (1), LiangHao151941 (1), jjohnson-arm (1), notoraptor (1), yhcvb (1), gaurav1086 (1), adi-muresan (1), JishinMaster (1), huochaitiantang (1), n-nez (1), pratikfegade (1), SXM-inspur (1), boh-inspur (1), chinakook (1), samwyi (1), tobegit3hub (1), weireweire (1)

5 Likes