Hello,
I’m using TVM for tuning models. I’m using Android RPC Java app, some of my phones works correctly, so the tuning process completes, other phones crash and reboot. I would like to debug it more, but I have no idea how to deep dive into TVM. I have grabbed some of “reboot logs” from /sys/fs/pstore/console-ramoops-0
, and I can see following logs:
[20727.518618] [2022-03-15 22:00:52 GMT+1] QCOM-STEPCHG: handle_vbatt_limit: FCC=550000, vbat=4391010
[20731.324307] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: kgsl: possible gpu syncpoint deadlock for context 2 timestamp 0
[20731.324355] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: context[2]: queue=228798, submit=228792, start=228792, retire=228792
[20731.324380] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: possible deadlock. Context 2 might be blocked for itself
[20731.324415] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: context[2]: submit times: 4.274 1.646 134.329 110.266 136.517 19.123 16.209
[20731.324440] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: pending events:
[20731.324464] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: [0] FENCE kgsl-timeline kgsl-3d0_11-ndroid.systemui(230: 204186
[20731.324489] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: [0] FENCE kgsl-timeline kgsl-3d0_16-ache.tvm.tvmrpc(658: 220422
[20731.324512] [2022-03-15 22:00:56 GMT+1] kgsl kgsl-3d0: --gpu syncpoint deadlock print end--
(...)
[22587.836130] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: kgsl: possible gpu syncpoint deadlock for context 2 timestamp 0
[22587.836180] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: context[2]: queue=229488, submit=229482, start=229482, retire=229482
[22587.836209] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: possible deadlock. Context 2 might be blocked for itself
[22587.836246] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: context[2]: submit times: 30.44 31.870 26.725 4.277 1.645 39.308 54.989
[22587.836270] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: pending events:
[22587.836293] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: [0] FENCE kgsl-timeline kgsl-3d0_11-ndroid.systemui(230: 205356
[22587.836315] [2022-03-15 22:31:52 GMT+1] kgsl kgsl-3d0: --gpu syncpoint deadlock print end--
(...)
[22639.548412] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: |adreno_drawctxt_detach| Wait for global ctx=18 ts=14523 type=2 error=-110
[22639.548645] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: mrpc:RPCProcess[7685]: gpu fault ctx 18 ctx_type CL ts 159 status 00E61015 rb 02c2/052b ib1 00000007FF6EF000/0000 ib2 00000007FFE51DEC/0000
[22639.548701] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: mrpc:RPCProcess[7685]: gpu fault rb 2 rb sw r/w 02c2/052b
[22639.560822] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: |kgsl_iommu_fault_handler| GPU PAGE FAULT: addr = 7FF943A00 pid= 0 name=unknown
[22639.560863] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: |kgsl_iommu_fault_handler| context=gfx3d_user TTBR0=0x1d0001492e4000 CIDR=0x1e05 (write translation fault)
[22639.645852] [2022-03-15 22:32:44 GMT+1] kgsl: kgsl_snapshot_push_object: snapshot: Can't find entry for 0x00000007FF6EF000
[22639.647893] [2022-03-15 22:32:44 GMT+1] kgsl: kgsl_snapshot_push_object: snapshot: Can't find entry for 0x00000007FFE51DEC
[22639.647910] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: |kgsl_device_snapshot| GPU snapshot created at pa 0x00000001ec400000++0xdfb20
[22639.648111] [2022-03-15 22:32:44 GMT+1] kgsl kgsl-3d0: |kgsl_snapshot_save_frozen_objs| snapshot: Active IB1:00000007ff6ef000 not dumped
[22644.652133] [2022-03-15 22:32:49 GMT+1] platform 506d000.qcom,rgmu: RGMU CX gdsc off timeout
[22646.652107] [2022-03-15 22:32:51 GMT+1] kgsl kgsl-3d0: CP initialization failed to idle