Segmentation fault when Autotune mxnet

guyzsarun · September 24, 2020, 2:50am

I got some segmentation fault when autotune, but the autotune is still running

I have tried building mxnet and downgrading xgboost to 0.90 It still doesn’t solve the issue

OS: Ubuntu 18.04.4 LTS
Python: 3.6.9
Mxnet 1.6.0

Output from terminal:

[Task  1/22]  Current/Best:   54.80/ 104.39 GFLOPS | Progress: (8/2640) | 15.36 s
Segmentation fault: 11

Stack trace:
  [bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x3c27360) [0x7f5873ad3360]
[Task  1/22]  Current/Best:   77.00/ 104.39 GFLOPS | Progress: (12/2640) | 23.99 s
Segmentation fault: 11

Stack trace:
  [bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x3c27360) [0x7f5873ad3360]
[Task  1/22]  Current/Best:   43.63/ 104.39 GFLOPS | Progress: (16/2640) | 30.96 s
Segmentation fault: 11

juierror · September 24, 2020, 5:55am

I have this problem too. Do you have any solution?

comaniac · September 24, 2020, 5:46pm

Have you tried to use random tuner? If it works then that must be the issue with XGboost.

I’m also curious why there’s a stack trace from MXNet. In general you don’t need MXNet anymore after the tuning process has started.

guyzsarun · September 25, 2020, 2:37am

The output from terminal is by using random_tuner

guyzsarun · September 30, 2020, 7:39am

To reproduce the error

Resnet50 models
http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-0000.params
data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json

Using the tune_relay_x86.py in the incubator-tvm/tutorials/autotvm

Load resnet50 in the get_network()

 sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)

 shape_dict = {input_name: input_shape}
        
 mod, params = relay.frontend.from_mxnet(sym, shape_dict, arg_params=arg_params, aux_params=aux_params)

and change the tuner to random