As discussed with TVM PMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.
Feedback and suggestion are welcomed so that we can further improve the report.
Community
The community welcomes new committer Jian Weng (@were) and new reviewers Josh Pollock (@joshpoll).
This forum grew healthily and got 58.9k pageviews, 2.0k user visits in the last month.
Features and Improvements
In the previous month, the community has been working on improving the infrastructure including pass manager, feature manager, integer-set analysis and VM runtime. We improved both the coverage and performance of operators for various of frameworks. For accelerator support, VTA now supports Relay and AutoTVM. We also have Chisel implementation for VTA and have it run on top of TSIM.
The community is also working on Micro-TVM (bare-metal devices support), custom datatypes and higher order differentiation.
More improvements along with details are listed below.
Compiler Improvement
- Add module supoort in relay.build (#3424)
- Relay pass infrastructure improvement (#3319, #3336, #3430, #3353)
- Migrate Relay passes to pass manager (#3323, #3289, #3251, #3406)
- Integer set analysis/simplifier improvement (#3272, #3463, #3464)
- Enable decorating python class to be a Relay Pass (#3364)
- Improve heterogeneous annotation by using visitor (#3261)
- Make Partial Eval support interprocedural optimization and termination check. (#3033)
- Introduce feature manager to Relay. (#3236)
- Use Relay parser to define the Relay prelude (#3043)
- Autotvm: Support override in
register_topi_compute
andregister_topi_schedule
(#3292) - Support export ADT value in Python (#3299)
- Mechanism to detect incomplete expression match in Relay (#3203)
- Memorizing quantize node mapping to avoid duplicated simulated quantization (#3233)
- EQ/NE operators support for StringImm expressions (#3283)
- Introduce CanonicalizeCast pass to formally reduce memory overhead introduced by fused cast operations (#3280)
- Extend TensorComputeOp to allow scalar inputs (#3300)
- Enables operators to be invisible in NNVM IndexedGraph (#3290)
- Support overloading comparison operations in Relay (#3168)
Performance
- Fast tanh implementation (#3255)
- Improve multi-batch conv2d on x86 (#3308)
- Improve
non_max_suppression
andget_valid_counts
for CPU (#3305) - Improve
roi_align
performance for CPU (#3296) - Improve
nms
andget_valid_count
performance (#3282)
Operator Support
- Add TopK operator (#3256, #3362)
- Implement fast mode in take op (#3325)
- Dilation conv2d for x86 (#3308)
- Register
abs
gradient (#3447) - Add
sequence_mask
operator and port into MXNet frontend (#3437)
User Interface and Frontend
- TFLite frontend operator support: PAD, RESIZE, MUL, Reduce (min, max, mean, prod), LOGISTIC, elemwise operators (Sub, Divide, Power, Max, Min) (#3310, #3370, #3304, #3421, #3313, 3357)
- Tensorflow frontend operator support: Abs, FloorDiv, GatherND, LeftShift, LogSoftmax, Max, Min, Mod, RightShift, ZerosLike, TruncateMod, Neg, ClipByValue, ResizeNearestNeighbor (#3270, #3211, #3393)
- Update TFLite wheel version to 1.13.1 (#3435)
- TFLite: Add fused_activation_function for ADD, SUB, MUL, DIV (#3372)
- Support bidirectional RNN layer for MXNet (#3397)
- Bumped ONNX version from 1.1.0 to 1.4.1 (#3286)
- Simplify parameter handling in Tensorflow frontend (#2993)
Language, Runtime and Hardware Support
- Chisel implementation for VTA and runs on top of TSIM (#3258, #3347)
- Port VM, VM compiler, and Object into Python (#3391)
- VM: Add AllocTensor instruction and better instruction printer (#3306)
- GraphRuntime: Enable sharing parameters of a model among multiple threads (#3384)
- Rust: load syslib modules at compile time (#3274)
- Relay Compilation + AutoTVM compatible operator libraries for VTA (#3135)
Documens, Test, and Build
- Add all parameters to from_tensorflow docs (#3321)
- Add
test_forward_ssd_mobilenet_v1
to tflite/test_forward (#3350) - Add Azure build pipeline (#3458, #3459)
- Update ci-gpu to v0.52 (#3374)
- Enable more visible symbols by default (#3365)
- Separate out legacy as a stage in CI (#3337)
- Update documents for TSim (#3409, #3318, #3302, #3343, #3206)
- Simplify build script, remove python 2 support (#3419)
- Documents improvment (#3340, #3317, #3316, #3341)
- Ignore rust cargo lock files in rat (#3314)
- Improve CUDA Conda package build (#3281)
- Improve tvm4j document describing LLVM support (#3404)
- Update CMakeLists.txt to be more flexible to find the third parties libraries (#3354)
Fixes
- Fix Error messages in tflite.py (#3320)
- Fix typos in docs and comments (#3309, #3376)
- Bugfix min/max const canonicalize rule (#3386)
- Return module from frontend for autotvm (#3401)
- Fix constant and reshape in ONNX (#3387)
- Default verilator location fix (#3324)
- Fix autodiff for conditional expression (#3453)
- Gramatical improvements to tensor_expr_get_started (#3330)
- Fix AutoTVM data structure bug (#3462)
- Fix MXNet RNN without providing state initialization as input (#3326)
- Fix flaky test on topk and quantize pass (#3362)
- Add VTA PYNQ metal_test bitstream program logic and fix compilation issue. (#3400)
- Fix VTA function Vivado Compile Error. (#3375)
- Fix VTA DRAM functionality issue. (#3278)
- Fix reshape precompute and type error in ONNX frontend (#3230)
- Fix interpreter argument conversion for tuples. (#3349)
- Fix code generation for packed functions + tuples in VM (#3287)
- Fix memory leak in Relay interpreter (#3448)
- Fix x86 depthwise conv2d
alter_op_layout
(#3264) - Create closure object for GlobalVar (#3411)
- Fix getting global var in prelude (#3405)
- Fix rfactor bugs which related to predicate and loop partition (#3382, #3444)
- Fix the bug in AutoTVM where SimulatedAnnealingOptimizer sometimes finds useless candidate (#3413)
- Fix name conflict in PartialEval (#3402)
- Fix int bound analysis bug for modular (#3288)
- Check arg positiveness for modular rules (#3279)
- Fixes failure of
sum
andall
onaxis=0
(#3422) - Fix package path in tflite test (#3427)
- Fix Windows build (#3429)
- Fix
LSTMBlockCell
in Tensorflow frontend (#3410)
People Who Reviewed Pull Requests:
Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.
tqchen (81), jroesch (32), zhiics (20), kevinthesun (14), MarisaKirisame (13), yzhliu (13), wweic (12), FrozenGene (12), tmoreau89 (11), vinx13 (10), eqy (10), srkreddy1238 (8), icemelon9 (8), yongwww (8), merrymercy (7), ZihengJiang (5), slyubomirsky (5), apivovarov (5), Laurawly (4), antinucleon (4), masahi (3), hlu1 (3), ajtulloch (3), vegaluisjose (3), junrushao1994 (3), liangfu (3), huajsj (2), anijain2305 (2), PariksheetPinjari909 (1), mshawcroft (1), zhreshold (1), lixiaoquan (1), sgrechanik-h (1), joshpoll (1), xqdan (1), yidawang (1), derisavi (1), weberlo (1), cbalint13 (1), gussmith23 (1), Mutinifni (1), wangshangsam (1), u99127 (1), Hzfengsy (1)
People Whose Pull Requests are Updated:
Note: The format is name(number of activities, area list)
apivovarov (16, frontend, doc), tqchen (13, compiler, ci), zhiics (11, relay), vegaluisjose (11, vta), MarisaKirisame (10, relay, autodiff), mshawcroft (10, doc, ci), icemelon9 (9, vm, community, topi, frontend), huajsj (8, vta), jroesch (6, relay, frontend, vm), hlu1 (6, topi, runtime), kevinthesun (6, topi), wweic (5, relay), yongwww (4, frontend), abergeron (4, build), Laurawly (3, topi), eqy (3, quantize, android), slyubomirsky (3, relay), ajtulloch (3, runtime), anijain2305 (3, quantize), csarofeen (3, compiler), ZihengJiang (2, quantize, compiler), merrymercy (2, autotvm, community), vinx13 (2, relay), sgrechanik-h (2, compiler), liangfu (2, vta), cclauss (2, frontend), altanh (2, compiler), henrywu2019 (2, doc), vv1133 (2, frontend), yzhliu (1, community), nhynes (1, rust), tmoreau89 (1, vta), joshpoll (1, doc), FrozenGene (1, frontend), weberlo (1, uTVM), alexeyr (1, frontend), larroy (1, doc), gussmith23 (1, compiler), jdavies-huawei (1, compiler), szha (1, doc), ptrendx (1, nnvm), Howave (1, nnvm), wangshangsam (1, frontend), yinghai (1, runtime), kaitingwang (1, autodiff), ritsu1228 (1, relay), Lyken17 (1, build), marcelotrevisani (1), u99127 (1), sxjscience (1), ttyang1018 (1)