I’m following this tutorial., but with my own network loaded from PyTorch. At some point it fails with the message below.
How would I begin to figure out what the offending code is? My tuning loop is
def run_tuning(tasks, task_weights, log_file):
print("Begin tuning...")
measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1,
min_repeat_ms=300,
timeout=10)
tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=54000, # ~ 900 * num tasks = 5400
runner=measure_ctx.runner,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
)
tuner.tune(tune_option)
Seems related to this?
[18:34:51] /home/torch-mlir-user/tvm/src/runtime/cuda/cuda_device_api.cc:143: allocating 480 bytes on device, with 16601251840 bytes currently free out of 16928342016 bytes available
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [18:34:51] /home/torch-mlir-user/tvm/src/runtime/cuda/cuda_device_api.cc:310: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: misaligned address
Stack trace:
0: _ZN3tvm7runtime6detail
1: tvm::runtime::CUDATimerNode::~CUDATimerNode()
2: _ZN3tvm7runtime18SimpleObjAlloca
3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, DLDevice, int, int, int, int, int, int, int, tvm::runtime::PackedFunc)::$_0> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
4: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
5: tvm::runtime::RPCSession::AsyncCallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::RPCCode, tvm::runtime::TVMArgs)>)
6: tvm::runtime::RPCEndpoint::EventHandler::HandleNormalCallFunc()
7: tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void (tvm::runtime::TVMArgs)>)
8: tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, std::function<void (tvm::runtime::TVMArgs)>)
9: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
10: tvm::runtime::RPCEndpoint::ServerLoop()
11: tvm::runtime::RPCServerLoop(int)
12: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::$_1> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
Exception in thread Thread-1 (_listen_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/torch-mlir-user/tvm/python/tvm/rpc/server.py", line 279, in _listen_loop
_serving(conn, addr, opts, load_library)
File "/home/torch-mlir-user/tvm/python/tvm/rpc/server.py", line 168, in _serving
raise RuntimeError(
RuntimeError: Child process 9713 exited unsuccessfully with error code -6