Is it possible to autotune GPU locally?

adi-muresan · March 23, 2020, 6:03pm

Hi,

I’ve been using the tutorial here to get autotuning for an NVidia GPU to work for me, but I consistently get a runtime exception when trying to run tuning as described in the tutorial:

RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.

I’m also running the RPC tracker and server in separate terminals and they seem to be working fine. I have one GPU on the same machine : Tracker address localhost:9190

Server List
----------------------------
server-address  key
----------------------------
127.0.0.1:42334 server:qp4000
----------------------------

Queue Status
------------------------------
key      total  free  pending
------------------------------
qp4000   1      1     0      
------------------------------

Now, I tried to run it locally, without going through the RPC by replacing the RPCRunner with a LocalRunner, but that does not seem to work as I still get the same error, which is very confusing since it’s not suppose to go through the RPC runner, right?

Can autotuning on GPUs work locally?

Any idea what’s wrong with my setup that I get the runtime error in the first place?

Thanks!

Wheest · March 24, 2020, 4:45pm

It depends on how your autotuning is set up, but I reckon you can autotune the GPU locally. I wouldn’t expect to be seeing that error message if you are using the local setup.

RPC should work too, but there are fewer potential headaches doing the local runner route.

Yes, your tuning option should have the local builder and runner, i.e.

tuning_option['measure_option'] = autotvm.measure_option(
    builder=autotvm.LocalBuilder(),
    runner=autotvm.LocalRunner(number=1, repeat=1,
                               min_repeat_ms=1000))

The only other branch I’ve got in my main autotuning script is when getting the inference time of the best configuration

this looks like:

# compile kernels with history best records
with autotvm.apply_history_best(tuning_option['log_filename']):
    print("Compile...")
    with relay.build_config(opt_level=3):
        graph, lib, params = relay.build_module.build(
            mod, target=target, params=params)

    # upload parameters to device
    if device_info['remote']:
        # export library
        tmp = tempdir()
        if use_android:
            from tvm.contrib import ndk
            filename = "net.so"
            lib.export_library(tmp.relpath(filename), ndk.create_shared)
        else:
            filename = "net.tar"
            lib.export_library(tmp.relpath(filename))

        # upload module to device
        print("Upload...")
        remote = autotvm.measure.request_remote(device_info['device_key'],
                                                device_info['rpc_address'],
                                                device_info['rpc_port'],
                                                timeout=10000)
        remote.upload(tmp.relpath(filename))
        rlib = remote.load_module(filename)

        ctx = remote.context(str(target), 0)
        module = runtime.create(graph, rlib, ctx)
    else:
        ctx = tvm.cpu()
        module = runtime.create(graph, lib, ctx)

comaniac · March 24, 2020, 5:16pm

Actually you cannot. LocalRunner still uses RPC.

github.com

apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure_methods.py#L314


    will be automatically increased.
cooldown_interval: float, optional
    The cool down interval between two measurements.
check_correctness: bool, optional
    Whether check correctness after measurement. This will use llvm cpu target to
    call your template and get the reference output.
    This can work for TOPI templates, but may not work for your custom template.


Note
----
This is a "fake" local mode. We start a silent rpc tracker and rpc server
for the user. In this way we reuse timeout/isolation mechanism in RPC infrastructure.
"""
def __init__(self,
             timeout=10,
             number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1,
             check_correctness=False):
    super(LocalRunner, self).__init__('', None, None, 0,
                                      timeout=timeout, n_parallel=1,
                                      number=number, repeat=repeat,
                                      min_repeat_ms=min_repeat_ms,

adi-muresan · April 3, 2020, 3:14pm

Managed to fix it with a clean reinstall of everything in a VM.