TVM Monthly - September 2019

As discussed with TVM PMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.

Feedback and suggestion are welcomed so that we can further improve the report.

Community

The community welcomes four new reviewers Luis Vega (@vegaluisjose), Balint Cristian (@cbalint13), Yong Wu (@yongwww), and Animesh Jain (@anijain2305).

This forum grew healthily and got 65.5k pageviews, 2.2k user visits in the last month.

Features and Improvements

In the previous month, the community has been working on improving the compiler infrastructure, more quantization support, optimization of operators and expanding to more data types and backend, and improving runtime performance and accelerator. A few highlights in the previous month:

  • With introduction of FloorDiv/Mod, TruncDiv/Mod, and IndexDiv/Mod, TVM can perform more simplification on index calculation.
  • AVX512VNNI intrinsics now is supported in TOPI.
  • A new minimal runtime implementation (~12kb .text on ARMv7/x86) for TVM.

More improvements along with details are listed below.

Compiler Improvement

  • Add Tuple pattern (#3596)
  • Text format support for ADTs and prelude (#3863, #3939)
  • Make Type Relation catch more errors (#3899, #3699)
  • Refactor the way we interface between different modules of Relay (#3906)
  • Add new IR pass CombineParallelDense (#3862)
  • Add support for EQ op in the deduce bound and the loop partition (#3775)
  • Introduce base-class IRMutatorWithAnalyzer (#3969)
  • Introduce FloorDiv/Mod, TruncDiv/Mod, and IndexDiv/Mod for better arithmetic simplification (#3976, #3986, #4000, #4014, #4008, #4028)

Operator Support and Improvement

  • Add gradient operators (#3857, #3894, #3901, #3915)
  • New TOPI operators: erf, logical_and, logical_or, logical_not, isnan (#3702, #3929, #3979)
  • Improve ceil_divide in tile/split (#3842)
  • Parallelize batch axis for ARM (#3931)
  • Support cuBLAS BatchMatMul (#3936)
  • Refactoring x86 conv2d_NCHWc (#3944)
  • Add AVX512VNNI support for TVM (#3388)
  • Enhance tuning space of split (#3949)
  • Enable miopen transpose convolution and fp16 support (#3952)
  • Improve conv2d_transpose schedule on X86 and CUDA (#3948)
  • Add AutoTVM template for conv2d Intel int8 (#3955)
  • Add AutoTVM template for dense on CUDA (#3923)
  • Add AutoTVM template for conv2d on Intel graphics (#3839)
  • Enable miopen Group Convolution (#3987)
  • Introduce schedule_injective_from_existing and unify external schedules for all targets (#3983)
  • Expose llvm.nearbyint intrinsic (#4001)

Quantization

  • Requantize: Optimize lowering for some corner cases. (#3864)
  • New quantized operator support: conv2d, add, dense (#3580, #3736, #3896, #3910)
  • Do type checking for the input and kernel in the qnn conv2d (#3904)
  • Legalize and AlterOpLayout for Intel int8. (#3961)
  • Renaming tests to follow the Relay nomenclature. (#3975)
  • Fix padding changes due to #3739 (#3989)

User Interface and Frontend

  • ONNX new operator support: And, Tile, Erf (#3878, #3941, #3988)
  • MXNet new operator support: pad, conv1d, deconv1d (#3739)
  • TFLite new operator support: batch_to_space_nd, space_to_batch_nd, tanh, greater, relu (#3850, #3996, #3963, #4022)
  • TFLite: Support depthwise convolution multiplier greater than 1 (#3922)
  • Keras: Fix ReLU in Keras Converter missed the case (#3917)
  • Keras: frontend upsample and 1 channel conv2d fixes (#3937)
  • Tensorflow: Convert scalar Const into tvm.relay.const (#3885)
  • TensorFlow: Add support for SquaredDifference (#3930)
  • Darknet: Solve tvm parsing darknet resnext failure bug (#3778)

Language, Runtime and Hardware Support

  • Add OpenOCD Low-Level Device (RISC-V Support) (#3756)
  • Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)
  • Vulkan runtime reimplementation (stream approach) (#3849)
  • Add wave 32 bc for AMD ROCm backend (#3984)

Accelerator and Microcontroller Support

  • Reename USE_TSIM macro with USE_VTA64 and cleanup runtime (#3872)
  • Fix TSIM compile error in Linux (add missing -fPIC flag) (#3876)
  • Add scalafmt and format existing scala codebase (#3880)
  • Add ISA BitPat generation (#3891)
  • de10-nano driver (#3394)
  • Extending Vision model coverage compilation for VTA (#3740)
  • Conv2d transpose (deconvolution) operator support (#3777)
  • Support TLPP in function simulator. (#3555)
  • hotfix denano10 (#3918)
  • RPC path update. (#3924)

Documents, Test, and Build

  • Fix doc rendering (#3897)
  • Use pytest instead of nosetest (#3524)
  • Enable NHWC of relay.testing.mobilenet (#3886)
  • Add .hsaco save/load for tesnor_expr Tutorial (#3852)
  • Support LLVM trunk (#3907)
  • Remove GTest cmake flag from install docs (#3953)
  • Allow USE_LLVM to take extra arguments (#3954)
  • Add docs for analysis namespace (#3985)
  • Add test script starter command to document (#3993)
  • Add type solver unit tests for unifying quantified funcs (#3947)
  • Change Vivado install instructions to version 2018.3 (#4003)
  • Add a link to the defining network description of auto-tuning tutorial (#4023)
  • Additional MXNet Convolution and Deconvolution tests (#4026)
  • Adding support to check if an attribute is present or not without having to get the value (#3957)

Fixes

  • Fix parser for cast. (#3873)
  • Fix operator fusion for multiple output (#3871)
  • Remove extern C warpper for cuBLAS (#3877)
  • Fix int32 range overflow by using int64 (#3870)
  • Remove duplicate resize (#3902)
  • Fix blas cmake for mac os (#3898)
  • Add another MKL name alias for MKL installed through pypi (#3853)
  • Numpy compatible dtype inference for tvm.convert and tvm.const (#3861)
  • Remove incorrect check for LLVM in C codegen test (#3921)
  • Fix exponential blowup in interpreter (#3559)
  • Fix CUDA int8x4 vectorize (#3928)
  • Make buffer auto broadcast independent to the order of input args (#3956)
  • Fix benchmark layout in graph tuner (#3926)
  • Fix Android Demo LLVM version (#3962)
  • Cast filepath arguments to string (#3968)
  • Fixes “common” sub crate using nightly and master (#3965)
  • Changes to make tensorize work. These changes also fix the previously broken test. (#3981)
  • Remove FLOP computation when calling 3rd party library (#4005)
  • Use a more intuitive way to limit the #ops in a group (#4018)
  • Add more pad_mode support for onnx converter (#4029)
  • Impose a max op limit to the op fusion pass (#4002)
  • Fixes issue with CPP enums (#4019)
  • Int64 shape handling for outputs. (#4031)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions

tqchen (72), zhiics (28), tmoreau89 (25), jroesch (24), yzhliu (22), vinx13 (21), kevinthesun (19), icemelon9 (17), anijain2305 (16), MarisaKirisame (14), FrozenGene (10), wweic (8), slyubomirsky (8), weberlo (8), jackwish (8), merrymercy (7), masahi (7), liangfu (7), junrushao1994 (7), cchung100m (7), vegaluisjose (6), yongwww (6), srkreddy1238 (5), SWu (5), ajtulloch (4), u99127 (4), yinghai (4), Huyuwei (3), apivovarov (3), were (3), soiferj (3), jwfromm (3), comaniac (3), nhynes (2), yidawang (2), kimishpatel (2), umangyadav (2), ZihengJiang (1), Laurawly (1), eqy (1), kazum (1), mshawcroft (1), hlu1 (1), lixiaoquan (1), sgrechanik-h (1), sxjscience (1), antinucleon (1), huajsj (1), alexeyr (1), TaoLv (1), kparzysz-quic (1)

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list)

MarisaKirisame (25, relay), tqchen (12, arith), anijain2305 (12, quantization, frontend), FrozenGene (11, frontend), cchung100m (9, frontend), kevinthesun (7, topi), soiferj (7, frontend, topi), kimishpatel (6, relay), icemelon9 (5, relay, topi), tmoreau89 (5, vta), vegaluisjose (5, vta), shoubhik (5, quantization), ajtulloch (4, runtime), liangfu (4, vta), comaniac (4, topi), petrex (4, topi), inadob (4, frontend), yzhliu (3, build, topi), zhiics (3, relay), weberlo (3, relay), umangyadav (3, tvm), jackwish (3, arith), Laurawly (2), slyubomirsky (2), abergeron (2), yidawang (2), junrushao1994 (2), huajsj (2), jwfromm (2), Mutinifni (2), kice (2), alexgl-github (2), paddyhoran (2), SWu (2), llehtahw (2), yongsun (2), merrymercy (1), nhynes (1), jroesch (1), Huyuwei (1), lixiaoquan (1), yongwww (1), sxjscience (1), cbalint13 (1), eric-haibin-lin (1), cowanmeg (1), gemfield (1), gussmith23 (1), Lyken17 (1), flip1995 (1), kparzysz-quic (1), golunovas (1), hzfan (1), hgt312 (1), binarybana (1), jianyuh (1), KeDengMS (1), standbyme (1), bindog (1), brettkoonce (1), egolearner (1), vmiheer (1), ndl (1), ZQPei (1), youluexx (1)

8 Likes