CUDA: misaligned address

I’m following this tutorial., but with my own network loaded from PyTorch. At some point it fails with the message below.

How would I begin to figure out what the offending code is? My tuning loop is

def run_tuning(tasks, task_weights, log_file):
    print("Begin tuning...")
    measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1,
                                                        min_repeat_ms=300,
                                                        timeout=10)
    tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
    tune_option = auto_scheduler.TuningOptions(
        num_measure_trials=54000,  # ~ 900 * num tasks = 5400                   
        runner=measure_ctx.runner,
        measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
    )
    tuner.tune(tune_option)

Seems related to this?

[18:34:51] /home/torch-mlir-user/tvm/src/runtime/cuda/cuda_device_api.cc:143: allocating 480 bytes on device, with 16601251840 bytes currently free out of 16928342016 bytes available
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [18:34:51] /home/torch-mlir-user/tvm/src/runtime/cuda/cuda_device_api.cc:310: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: misaligned address
Stack trace:
  0: _ZN3tvm7runtime6detail
  1: tvm::runtime::CUDATimerNode::~CUDATimerNode()
  2: _ZN3tvm7runtime18SimpleObjAlloca
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, DLDevice, int, int, int, int, int, int, int, tvm::runtime::PackedFunc)::$_0> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
  5: tvm::runtime::RPCSession::AsyncCallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::RPCCode, tvm::runtime::TVMArgs)>)
  6: tvm::runtime::RPCEndpoint::EventHandler::HandleNormalCallFunc()
  7: tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void (tvm::runtime::TVMArgs)>)
  8: tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, std::function<void (tvm::runtime::TVMArgs)>)
  9: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  10: tvm::runtime::RPCEndpoint::ServerLoop()
  11: tvm::runtime::RPCServerLoop(int)
  12: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::$_1> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)


Exception in thread Thread-1 (_listen_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/torch-mlir-user/tvm/python/tvm/rpc/server.py", line 279, in _listen_loop
    _serving(conn, addr, opts, load_library)
  File "/home/torch-mlir-user/tvm/python/tvm/rpc/server.py", line 168, in _serving
    raise RuntimeError(
RuntimeError: Child process 9713 exited unsuccessfully with error code -6
1 Like

I’m experiencing the same issue. Is there a solution available?

I check related blogs and I think the problem may caused by measuring hardware time of schedules. Some schedules may be illegal for cuda, so it can explain why I can avoid the issue and run successfully sometimes. I tried change the version of cuda but it was no use. I suspect it’s a code version issue. After adding Ansor to TVM, many codes were updated also, which may influence the runtime of Ansor.