As discussed with TVM PMC, we would like to give a summary of the project per month, so people can get a better sense of what is going on in the community.
Feedback and suggestions are welcomed so that we can further improve the report.
Community
Two meetups were held in Bay Area and Shanghai . The slides are available here.
This forum grew healthily and got 103k pageviews, 3.3k user visits in the last month. The community welcomes the new reviewer Logan Weber (@weberlo).
Features and Improvements
In the previous month, the community has made good progress on several aspects. Here are a few highlights.
- TensorCore support, with competitive performance compared to native libraries (cuBLAS, cuDNN) (#4105, #4353)
- C++ RPC server for embedded devices without python runtime (#4281)
- Explicitly manifest memory and tensor allocations in Relay. This is to enable future optimizations and transformations. (#3560)
- Operator performance improvement (reduction ops #4158, batch matmul #4242)
We made the first Apache release Apache TVM (incubating) v0.6.
Compiler and VM Improvement
- Auto TensorCore CodeGen (#4234)
- add rocm codegen unittest for cross thread reduction (#4423)
- [Relay][VM] Clean up the VM and VM profiler code (#4391)
- Update compile_engine.py (#4393)
- [Relay][VM][Interpreter] Enable first-class constructors in VM and in…
- AutoTVM: selecting tuning templates when extracting task (#4338)
- Add workgroup size attribute to AMDGPU functions in codegen (#4342)
- [CodeGen] Add build config option disable_assert to control whether t…
- [Relay][Pass] Add pass to remove unused functions in relay module (#4334
- [Codegen] remove fp16 function override for cuda (#4331)
- [NODE][REFACTOR] Rename IRFunctor->NodeFunctor, use func pointer (#4247)
- [Relay][Prelude] Add more dtypes to tensor_t (#4233)
- Implement explicit IR representation of memory alloction (#3560)
- [Relay][Pass] Avoid FoldConstant folding some ops (#4245)
- [ARITH] Fix lowering of FloorMod (#4236)
Quantization
- [QNN] Lowering for Depthwise Convolution. (#4351)
- [Relay][Quantize] Integrate data-aware calibration into quantization (#…
- [QNN][Legalize] Specialize for Platforms without any fast Int8 arithm…
- [QNN] Use Int16 upcast in Fallback Conv2D. Fix test names. (#4329)
- [QNN] Quantize - Fixing the sequence of lowering. (#4316)
Performance
- [AutoTVM] select model with the most tuned schedules (#4404)
- [Perf] Enhance cudnn and cublas backend and enable TensorCore (#4353)
- [PERF] Parallelize reduction for CPU (#4158)
- [VTA] Performance optimize, remove unnecessary contigious memory use. (…
Operator Support
- [AutoTVM] Add batch_matmul to tunable operations (#4242)
- [TFLite] Support PRelu (#4298)
- [Relay][Frontend][Tensorflow]Add conv2d_transpose (#4300)
- Bump up CUDA log version in tophub.py (#4347)
- Solve custom model of prelu (#4326)
- Add support for quant. mul operator in tflite frontend (#4283)
- [Relay][Op][TF] Complete tensor array unstack with all ranks support (#…
- [Frontend][MxNet] support mxnet cond op #4311 (#4311)
- Add More Shape Functions (#4179)
- [TOPI][OP] Support Faster-RCNN Proposal OP on CPU (#4297)
User Interface and Frontend
- Tweak debugger result (#4426)
- Added tflite frontend support for quantized mean. (#4339)
- [Relay][Frontend][TF] Fix slice when begin or size is not Const (#4372)
- reminding message for TVM_REGISTER_NODE_TYPE (#4365)
- [Debugger] Sorting op-time breakdown for quicker analysis. (#4352)
- [Relay][Frontend][Keras] batch_norm op params not handling well (#4310)
- [TF][Relay][Op] Pass module when infer shape (#4287)
- [Relay][Frontend][ONNX] Add support for broadcasting to Where and Mat…
- [Relay][Frontend][Tensorflow] Fix GatherV2, Add StopGradient (#4238)
- Support reshape for dynamic shape in tf converter (#4185)
- [ Relay ][ Frontend ][ Tensorflow ]add op add_n to relay/frontend/ten…
Language, Runtime and Hardware Support
- [RUTNIME] Support C++ RPC (#4281)
- [Codegen][cuda-fp16] fallback to fp32 simulation when cuda arch < sm53 (
- rpi4b target (#4445)
- [VTA] Enable streamlined GEMM execution (#4392)
- add DeviceName to ROCm api (#4437)
- [RUNTIME] rename allocator.make -> allocator.make_object for term con…
- [RUNTIME] Move module export to the function level. (#4405)
- add GPU checking before compilation for rocm (#4394)
- [nvcc] enable multiple arch in one fatbin (#4377)
- [Frontend]Add TensorFlow FloorMod (#4308)
- proper device query through rocm api (#4305)
- [RUNTIME] Add device query for AMD GcnArch (#4341)
- [Contrib] Add MKL DNN option (#4323)
- [RUNTIME][REFACTOR] Use object protocol to support runtime::Module (#…
Documents, Test and Build
- [tutorial] Relay pass infra tutorial (#4083)
- [Relay][Frontend][TFlite] Add test for qnn_mul operator (#4395)
- [Doc] Fix broken link (#4438)
- [DOCS] Update main website to tvm.apache.org (#4429)
- [SETUP] Add optional dependencies to extras_require (#4428)
- [LICENSE] clarify the blockingqueue license, update version to 0.6.0 (#…
- [License] move cma_api to 3rdparty. separate BSD 2-clause and 3-clause (
- [LINT] Remove unnecessary copyright message for files with ASF header (…
- [Release] resolve license issues (#4408)
- [LICENSE] add 3rdparty licenses (#4402)
- [DOCS] Mention incubating in readme (#4401)
- [Golang][Doc] improve the samples and doc (#4385)
- update_document_after_repository_renamed (#4398)
- Update Jenkinsfile for external runtime (#4396)
- [TOPI] Fix flaky testcase for floor div (#4382)
- [CI] Add more info, per exec ws isolation (#4388)
- Compare all outputs in TFLite test_forward_ssd_mobilenet_v1 (#4373)
- [CI] Avoid content-length request in test data download (#4375)
- [tutorial][benchmark] nnvm -> relay (#4368)
- [Relay tests] AlterOpLayout - Temporary attr update (#4357)
- add rule for clean (#4364)
- [Test][Relay][Pass] Add test case for lambda lift (#4317)
- Add topi.nn.fifo_buffer to TVM doc (#4343)
- [CI] Set workspace to be per executor (#4336)
- Add test for the qnn_add operator (#4282)
Bugfix
- [Relay][Pass] Fix lambda lift pass for recursive call (#4432)
- fix multiple transfer issue in loaduop (#4442)
- [VTA][HotFix] Relay->VTA quantization fix (#4433)
- Allow Array/Map store objects that are not NodeRef (#4430)
- [Fix][Relay] Remove schedule register for nonexisting log1p op (#4425)
- removing nnvm dep from VTA sources (#4419)
- Fix compilaton of bfloat16 on Windows (#4415)
- [Relay][Legalize] Legalize conv2d_transpose for NHWC (#4399)
People Whose Pull Requests are Updated:
Note: The format is name(number of activities, area list)
. Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions
anijain2305 (36), tqchen (18), sgrechanik-h (18), yzhliu (15), icemelon9 (11), FrozenGene (11), kevinthesun (8), co maniac (8), vinx13 (7), wweic (7), tmoreau89 (7), yongwww (7), cchung100m (7), liangfu (6), shoubhik (6), zhiics (5 ), eqy (5), t-vi (5), hcho3 (5), merrymercy (4), jroesch (4), apivovarov (4), hlu1 (4), alexgl-github (4), MarisaKi risame (3), srkreddy1238 (3), Laurawly (3), soiferj (3), jwfromm (3), petrex (3), jdavies-huawei (3), ZihengJiang ( 2), were (2), huajsj (2), inadob (2), zxy844288792 (2), csarofeen (2), makihiro (2), vmiheer (2), hgt312 (2), KimBi oInfoStudio (2), siju-samuel (1), masahi (1), nhynes (1), Huyuwei (1), vegaluisjose (1), lixiaoquan (1), weberlo (1 ), junrushao1994 (1), antinucleon (1), liangdzou (1), cbalint13 (1), imorinaga (1), yuruofeifei (1), u99127 (1), ki mishpatel (1), gemfield (1), Rasterer (1), tristan-arm (1), Hzfengsy (1), kice (1), jackwish (1), liaha (1), paddyh oran (1), bindog (1), jmorrill (1), mbarrett97 (1), ariwaranosai (1), ic (1), minminsun (1), lsy643 (1), tweej (1), trevor-m (1), XFPlus (1), abuccts (1), autumnqin (1), ekalda (1), jason-song-dev (1), PeikeLi (1), gittripley (1), zhuochenKIDD (1), ziyu-guo (1)
People Who Reviewed Pull Requests:
Note: The format is name(number of activities)
.
tqchen (139), zhiics (62), yzhliu (55), vinx13 (28), kevinthesun (26), FrozenGene (23), anijain2305 (20), tmoreau89 (20), merrymercy (18), icemelon9 (18), masahi (18), yongwww (18), jackwish (17), ZihengJiang (16), wweic (16), jro esch (14), soiferj (13), junrushao1994 (13), MarisaKirisame (12), Laurawly (8), ajtulloch (8), comaniac (8), u99127 (8), srkreddy1238 (7), kazum (7), liangfu (7), shoubhik (6), vegaluisjose (5), weberlo (5), cchung100m (5), jwfrom m (5), eqy (4), apivovarov (4), slyubomirsky (4), yidawang (4), cbalint13 (4), grwlf (4), broune (4), huajsj (3), p etrex (3), Huyuwei (2), antinucleon (2), xqdan (2), derisavi (2), Hzfengsy (2), reminisce (2), KimBioInfoStudio (2) , minminsun (2), siju-samuel (1), nhynes (1), PariksheetPinjari909 (1), mshawcroft (1), zhreshold (1), sgrechanik-h (1), t-vi (1), hcho3 (1), adityaatluri (1), denis0x0D (1), yinghai (1), altanh (1), umangyadav (1), lly-zero-one ( 1), kaitingwang (1), SWu (1), TaoLv (1), ZhennanQin (1), jmorrill (1), Leo-arm (1), zhuochenKIDD (1)