TVM Monthly - July 2021

As discussed by the TVM PPMC, our goal is to provide a monthly summary of the project so users and

developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During July of 2021 we welcomed many new contributors to the project. Importantly we welcomed @comaniac , @junrushao1994 as new PMC members! Thanks to everyone for the hard work and contributions! Lots of discussion around RFCs happens in the new RFC process, like Automatic Mixed Precision Pass, Static Memory Planing, etc. Welcome to check out。

This forum got 129k pageviews, 2.9k user visits in the last month.

More improvements along with details are listed below.

Relay

  • Fix type relation for batch_matmul #8376

  • Resize 1D #8346

  • Fix index order in conv2d computation for Arm CPU. #8361

  • Add support of conv2d with NHWC for Mali #8422

  • Add support of conv2d with NHWC for Bifrost #8430

  • Modify create_executor to pass params #8418

  • Batch_matmul to dense optimization #8440

  • Add ConvInteger support. #8456

  • Add RandomUniform converter and tests to onnx frontend. #8426

  • Allow importing models with malformed Loop nodes. #8475

  • Switch from CompileEngine to TECompiler in Interpreter #8486

  • Support resize in the ONNX conversion #8455

  • Fix bug in test_op_level3 #8508

  • Extend FakeQuantizationToInteger to more ops #8241

  • Change Default "opt_level" of Sequential from 2 to 0 #8634

Refactor

  • Remove dead code from depthwise_conv2d for Intel graphics #8381

  • Enforce attaching storage scope to PointerType #8366

  • Remove scope attribute from Buffer class #8463

  • Remove AttrStmt with storage_scope key #8516

  • Unify the shared pass prefix between vm and graph #8526

  • Avoid Override Generic Op Strategy in "hls.py" #8614

Docs

  • Fix for broken link in apps for wasm-standalone dir #8045

  • Add docs for Pass Instrument #8220

  • Corrected typo in googletest build instructions. #8459

  • Fix scipy docs inv #8619

  • TVM install addenda for M1 Macs #8568

TIR

  • fix storage rewrite index remap #8338

  • Bugfix for zero number arguments tir functions. #8515

  • cast disparate floating point types for binary ops #8517

  • specialize #8354

Microtvm

  • Add Nucleo stm32l4r5zi board to zephyr #8386

  • Add fixture to zephyr test #8393

  • Fix Stack Size Issue for Zephyr AOT Demo on Physical Hardware #8453

  • Fix clock skew on virtualbox #8395

  • Add zephyr cortex-r5 board to Zephyr #8519

  • Set the number of cores based on the VM sizing #8624

  • Fix platform name in base-box-tool #8612

Onnx

  • Wrap 'If' if it has multiple outputs #8385

  • Parametrize ONNX Unit tests #8621

Cuda

  • dense_tensorcore/batch_matmul_tensorcore support int8/int4 #8402

  • Improve injective schedule to enable half2 #8457

  • Initial support for dynamic shared memory #8466

  • Support multiple TIR-level dynamic shared memory allocations #8571

Topi

  • Bugfix for topi.prod #8416

  • Add support for arbitrary dtypes to CSRMV and CSRMM #8437

  • Parameterize conv2d and depthwise_conv2d tests #8433

  • minor change on assert statement in conv2d_NCHWc_int8.cuda #8554

  • Fix nn.pool*d issue with 'vectorize' function and add unit tests #8541

  • Add transpose_a/b & dynamic shape support for batch matmul #8527

  • Improve the performance of scatter_nd #8479

Frontend

  • Stridedslice and concat_v2 fix #8483

  • Added support for TensorList ops #8454

  • Check LLVM enabled/installed #8414

  • Add support for unpack with dim 0 after tensorlist stack #8558

  • Vc/pytorch lstm #8447

Fixes

  • Minimal type checking on TIR schedule #8367

  • Update the tvmc tutorial with additional requirements #8334

  • Add pass for splitting kernel with huge number of args #8313

  • Minor bugfix to arm_compute_lib bulid scripts #8377

  • Add support for log_softmax #8369

  • Allow multiprocessing spawn to work (on macOS llvm at least) #8363

  • Allow tvmc to compile models with AOT executor in MLF #8331

  • Support QLinearAdd from onnx runtime com.microsoft contrib ops. #8305

  • Fix np.int and np.float usage in the tree. #8389

  • Add "operator" style to Model Library Format #8072

  • macOS is now supported by TVMC #8396

  • Remove unused conversion #8397

  • Support aten::flip #8398

  • Inverse affine map #8384

  • Add Compute Library tests to Jenkins for AArch64 CI #8394

  • add aten::masked_fill_ in pytorch frontend #8403

  • Cleanup more uses of np.bool and np.int. #8399

  • Revert "Actually add Compute Library tests to the Jenkins File (#8394)" #8400

  • TECompiler: Staged refactor and removal of compile engine #7518

  • fix keras install #8391

  • Add missing annotation for requires_gpu in test_topi_dense.py #8387

  • Minor updates to pass pylint locally. #8424

  • Fix x86 dense schedule extern ops #8420

  • Fix Relay pattern rewrite #8425

  • Simplify MatchFusePattern in InverseAffineMap #8427

  • Improve XGBTuner document #8428

  • TVMScript Parser support BufferSlice indices #8408

  • Replace RuntimeError in _lookup_task with deferred error. #8421

  • fix flaky TF crop_and_resize #8431

  • Fix address and port reported by android_rpc to tracker #8405

  • Fix undefined symbols by adding library #8446

  • Extend type checking and annotation for TIR #8429

  • Add qnn batch_matmul operator #8401

  • Use PAPI to collect hardware performance counters on CPU and CUDA #7983

  • Fix cpp_rpc connection to rpc_tracker #8388

  • Minor fixes to unit tests for cudnn/vulkan targets #8462

  • Add default op attribute registration to __init __.py #8460

  • Fix auto-scheduling after 9c6658721 #8478

  • FoldScaleAxis became non-recursive #8325

  • Remove compile_enginer header #8471

  • DeviceType enums match dlpack #8407

  • fix typo #8484

  • Fix the shape function of conv & Add dynamic support for conv2d nhwc #8480

  • add multi functions support in partition pass #8464

  • Fix _get_yolo_detections #8477

  • apps: microtvm: Disable CONFIG_FPU for Zephyr runtime #8055

  • Support tir.abs node in tvm script #8488

  • Allow serialization of function attrs which are strings #8485

  • Re-enable ref_input #8113

  • Fix dynamic batching when use_implicit_batch=False #8461

  • fix zero iter bug in arith #8494

  • Add missing shape functions for relay.nn operations #8489

  • Better error message for src/runtime/module.cc if function cannot be loaded. #8496

  • Update Docker CI #8193

  • Re-enabled tests and updated module hashes #8498

  • Keep CODEOWNERS file up to date. #8500

  • Rename runtime-config to executor-config and add documentation for Model Library Format #8270

  • Enable ONNX tests that needed onnxruntime 1.7.0 #8502

  • Fix #8093, Enhance Buffer Index Simplify #8204

  • Organize CodeOwners File #8512

  • Fuse, Split #8467

  • Fix script printters StructuralEqual check failed #8499

  • Add json output to profiling reports #8503

  • Fix the repeatitive cast in scripr printing #8531

  • Fix TypeKey2Index when for root Object #8547

  • Split out libinfo.cc into a separate target. #8520

  • Mimic the TFLite 2.4 reader's behaviour #8538

  • Remove unused variable in topi cpp test #8549

  • Add explicit type cast to print. #8524

  • Specifically check handle for recursion during shutdown #8548

  • Add a --context-path for build.sh #8557

  • Handling a corner case in TRT RemoveDropout pass #8506

  • Re-enable Compute library tests. #8573

  • Fix AutoScheduler test to cover Conv2D Winograd #8539

  • Fix Coreml Input Shape Handling #8562

  • Fix task extraction with TE compiler #8560

  • add support for softmax and log_softmax with MIOpen #8543

  • Added default non-verbose to download_testdata(), pass to download() #8533

  • Disable pip cache when creating Docker images #8575

  • wasm32-standalone app repaired #8563

  • Bug fix for numpy scalar input in vm #8553

  • Reduce testing time of LSTM tests #8583

  • Prioritize discrete GPUs as device_id=0. #8588

  • speed up reference resize kernel #8592

  • Delete pytest-results as part of CI workspace preparation #8594

  • Use SizeVar instead of Var when convert Any in the GetShape function #8555

  • Fix storage_access not visiting else branch #8525

  • Reduction Factoring (RFactor) #8544

  • Support for match_buffer from subregion #8585

  • Recover rpc server support #8604

  • Add caching to CMake #8373

  • Add support for AOT in external code generation tests #8591

  • Fix global pip cache disable change #8590

  • Fix Initial Memory Misalignment #8487

  • Remove QEMU Install #8518

  • Remove unused parameter. #8580

  • Docker env for Arm® Ethos™-U55 Port #8514

  • Instruction and Trace #8615

  • Introduce --interface-api={c,packed} parameter #8280

  • Fix test_external_codegen, broken by #8591 #8630

  • Rewrote PointerValueTypeRewrite transform #8528

  • Framework for device querying for all targets. #8602

  • Add graph_executor get_input_index API. #8633

  • Disallow fp16 conversion for arange op #8644

  • Allow spaces in target attributes #8587

  • Several minor corrections to the device property query #8651

  • Fix depthwise conv2d on non-cuda GPU platforms #8379

  • Fix wrong log of tir pass VerifyMemory #8445

  • Explicitly retain __hash__ of StringImm #8449

  • Update stale relay.Module API in docs/comments #8411

  • Remove unused variable in GraphExecutorCodegen #8465

  • Compiler supports input with a slash #8481

  • Minor misspelling #8476

  • Enhance robustness of DefuseOps #8564

  • Add USE_PAPI configuration to config.cmake #8567

  • Fix a typo in include/tvm/ir/function.h #8617

  • hotfix check_grad perf regression #8581

  • Fix broadcast type func with incomplete type #8438

  • Fix the integer overflow problem of the scatter_nd op. #8415

  • do not simplify 'Any() - Any()' to 0 #8266

  • Visit each input param of the function in ExprVisitor visit_function #8521

  • Correct class number in Golang frontend sample #8511

  • fix android rpc app undefined reference problem #8530

  • fix illegal memory access bug in reduce op schedule by constriant thread_y #8566

  • Preserve IRModule type definition and imports in NameMangleExtFuncs #8523

  • Fix #8536 Get Target When Heterogeneous Execution #8537

Contributors Who Reviewed Pull Requests

Note: The format is name (number of activities)

Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (56), comaniac (40), junrushao1994 (32), leandron (26), masahi (25), areusch (23), mbrookhart (22), jcf94 (22), tkonolige (13), vinx13 (11), jroesch (9), jwfromm (9), trevor-m (6), u99127 (6), Lunderberg (5), elvin-n (5), Mousius (5), merrymercy (4), icemelon (4), mehrdadh (4), Hzfengsy (4), csullivan (4), spectrometerHBH (4), manupa-arm (4), YuchenJin (4), MasterJH5574 (4), kparzysz-quic (3), altanh (3), giuseros (3), gromero (3), AndrewZhaoLuo (3), hogepodge (3), zhiics (2), MarisaKirisame (2), anijain2305 (2), FrozenGene (2), mbaret (2), echuraev (2), xqdan (2), wyc-ruiker (2), ZihengJiang (1), yzhliu (1), tmoreau89 (1), srkreddy1238 (1), Laurawly (1), kazum (1), wweic (1), apivovarov (1), vegaluisjose (1), lixiaoquan (1), ANSHUMAN87 (1), yongwww (1), huajsj (1), cbalint13 (1), electriclilies (1), mdw-octoml (1), zxybazh (1), rohanmukh (1), zackcquic (1), leonwanghui (1), michalpiszczek (1), zhuwenxi (1), Leo-arm (1), mshr-h (1)

Contributors Whose Pull Requests were Updated

Note: The format is name (number of activities)

tqchen (8), tkonolige (8), masahi (7), mbrookhart (7), Lunderberg (7), mehrdadh (7), u99127 (6), echuraev (6), AndrewZhaoLuo (6), comaniac (5), areusch (4), jwfromm (4), jcf94 (4), zxy844288792 (4), Beya2019 (4), YuchenJin (4), mikepapadim (4), schilkunda-amba (4), d-smirnov (3), rohanmukh (3), AnastasiaStulova (3), ZihengJiang (2), vinx13 (2), jroesch (2), leandron (2), altanh (2), csullivan (2), wyc-ruiker (2), hogepodge (2), manupa-arm (2), leeexyz (2), ganler (2), lygztq (2), vvchernov (2), chiwwang (2), kueitang (2), JoeyChou-SiMa-ai (2), melsonlai (2), siju-samuel (1), zhiics (1), trevor-m (1), apivovarov (1), were (1), huajsj (1), Hzfengsy (1), electriclilies (1), spectrometerHBH (1), ymwangg (1), Johnson9009 (1), gussmith23 (1), mdw-octoml (1), elvin-n (1), hgt312 (1), zackcquic (1), Mousius (1), wrongtest (1), MasterJH5574 (1), mvermeulen (1), zhuwenxi (1), akmaru (1), CaptainDuke (1), ekalda (1), Leo-arm (1), syang-ng (1), sunjiweiswift (1), apeskov (1), ZQPei (1), senychen (1), srinidhigoud (1), mshr-h (1), Shpionus (1), hope51607 (1), juierror (1), ya0guang (1), jinhongyii (1), karljang (1), microbuilder (1), MarioPeric-SiMa-ai (1)

1 Like