TVM RPC will fail when allocating large arrays on an Android phone

Maximilianxu · March 27, 2021, 1:55am

Hi I want to deploy the BERT-base model on an Android phone. One of its params has shape (30522, 768) with dtype float32, the RPC connection will be reset each time I allocate this array.

for pk, pv in params.items():
        print(pv.shape, pv.dtype)
        weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)

The error message:

Traceback (most recent call last):
  File "tune_network_x86.py", line 483, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    weights[pk] = tvm.nd.array((np.random.uniform(size=pv.shape)).astype(pv.dtype), ctx=ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 516, in array
    return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/runtime/ndarray.py", line 154, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v2/python/tvm/_ffi/base.py", line 344, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (6) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(TVMArrayCopyFromBytes+0xe) [0x7f097dcf53ae]
  [bt] (5) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)+0x2c9) [0x7f097dcf52e9]
  [bt] (4) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext, DLDataType, void*)+0x346) [0x7f097dd265b6]
  [bt] (3) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::CopyToRemote(void*, unsigned long, void*, unsigned long, unsigned long, DLContext, DLDataType)+0x75d) [0x7f097dd2a4cd]
  [bt] (2) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)+0x1a5) [0x7f097dd28955]
  [bt] (1) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(tvm::runtime::SockChannel::Send(void const*, unsigned long)+0xb8) [0x7f097dd490b8]
  [bt] (0) /home/zyx/workspaces/python/tvm0.8_v2/build/libtvm.so(+0x1bc2838) [0x7f097dd44838]
  File "/home/zyx/workspaces/python/tvm0.8_v2/src/runtime/rpc/../../support/socket.h", line 360
TVMError: Socket SockChannel::Send Error:连接被对方重设

The BERT model was imported from Torch

            model_class = transformers.BertModel
            tokenizer_class = transformers.BertTokenizer

            # Better to download them manualy
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
            #   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json
            # Then rename to pytorch_model.bin, vocab.txt & config.json
            # weight = 'path to downloaded model dir'
            weight = '/home/zyx/.torch/hub/bert-base-uncased'
            model = model_class.from_pretrained(weight)
            model = ModelWrapper(model)
            model.eval()

            # tokenizer = tokenizer_class.from_pretrained(weight)
            # A = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])
            # There is 30522 words in bert-base-uncased's vocabulary list
            input_shape = [batch_size, 128]
            input_name = 'input_ids'
            input_dtype = 'int64'
            A = torch.randint(30000, input_shape)
            scripted_model = torch.jit.trace(model, [A])
            shape_list = [('input_ids', input_shape)]
            mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
            mod = optimize_bert(mod, params)

The optimize_bert function has the following passes:

    new_mod = FastSoftmax(mod)
    new_mod = ShapeConstDedup(new_mod)
    new_mod = tvm.relay.transform.EliminateCommonSubexpr()(new_mod)
    BindPass = tvm.relay.transform.function_pass(lambda fn, new_mod, ctx:
            tvm.relay.build_module.bind_params_by_name(fn, params), opt_level=1)
    new_mod = BindPass(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    new_mod = tvm.relay.transform.CombineParallelBatchMatmul()(new_mod)
    # new_mod = tvm.relay.transform._ffi_api.BatchMatmulWeightTranspose()(new_mod)
    new_mod = tvm.relay.transform.FoldConstant()(new_mod)
    ret_list.append(new_mod)

I also tried the commit [RPC][BUGFIX][BACKPORT-0.6] Fix bug in rpc ring buffer shrink by tqchen · Pull Request #5516 · apache/tvm · GitHub for ring_buffer.h, but didn’t work.

Maximilianxu · March 27, 2021, 2:51am

It seems that it will fail when the allocated space is over about 400 MB.

Maximilianxu · March 27, 2021, 6:59am

Traceback (most recent call last):
  File "tune_network_x86.py", line 492, in <module>
    tune_network()
  File "tune_network_x86.py", line 423, in tune_network
    tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 513, in array
    return empty(arr.shape, arr.dtype, device).copyfrom(arr)
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/runtime/ndarray.py", line 152, in copyfrom
    check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
  File "/home/zyx/workspaces/python/tvm0.8_v3/python/tvm/_ffi/base.py", line 346, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  31: 0xffffffffffffffff
  30: _start
  29: __libc_start_main
  28: Py_BytesMain
  27: Py_RunMain
  26: PyRun_SimpleFileExFlags
  25: PyRun_FileExFlags
  24: 0x000000000067d61e
  23: 0x000000000067d5a0
  22: PyEval_EvalCode
  21: _PyEval_EvalCodeWithName
  20: _PyEval_EvalFrameDefault
  19: _PyFunction_Vectorcall
  18: _PyEval_EvalCodeWithName
  17: _PyEval_EvalFrameDefault
  16: _PyFunction_Vectorcall
  15: _PyEval_EvalCodeWithName
  14: _PyEval_EvalFrameDefault
  13: _PyFunction_Vectorcall
  12: _PyEval_EvalFrameDefault
  11: _PyObject_MakeTpCall
  10: 0x00007f19e8e4d7df
  9: _ctypes_callproc
  8: 0x00007f19e9d82409
  7: 0x00007f19e9d82ff4
  6: TVMArrayCopyFromBytes
  5: tvm::runtime::ArrayCopyFromBytes(DLTensor*, void const*, unsigned long)
  4: tvm::runtime::RPCDeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
  3: tvm::runtime::RPCEndpoint::CopyToRemote(void*, DLTensor*, unsigned long)
  2: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  1: tvm::runtime::SockChannel::Send(void const*, unsigned long)
  0: tvm::support::Socket::Error(char const*)
  File "/home/zyx/workspaces/python/tvm0.8_v3/src/runtime/rpc/../../support/socket.h", line 360

The last error message was generated using the latest version and the following code.

    local_demo = True if TARGET != "android" else False
    if local_demo:
        remote = rpc.LocalSession()
    else:
        tracker_host = os.environ.get("TVM_TRACKER_HOST", "192.168.1.103")
        tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9196))
        key = "huawei"
        tracker = rpc.connect_tracker(tracker_host, tracker_port)
        # When running a heavy model, we should increase the `session_timeout`
        remote = tracker.request(key, priority=0, session_timeout=1000)
    weights = {}
    ctx = remote.cpu(0) if TARGET != "cuda" else remote.gpu(0)
    MB = 0
    from functools import reduce
    byte = lambda x: reduce(lambda a, b: a * b, x) / 1e6
    for _ in range(100):
        tmp = tvm.nd.array(np.random.uniform(size=(30522, 7680)).astype(np.float32), ctx)
        MB += byte((30522, 7680)) * 4
        print(tmp.shape)
        print("MB:", MB)
    exit()

I cannot allocate a single array with the above code.