Hi guys, I’m testing a c++ deployment of the gluoncv’s yolo mobilenet (yolo3_mobilenet1.0_coco).
This rather standard deployment using this basic deployment strategy:
tvm::runtime::Module *mod = (tvm::runtime::Module *) detector_handle.get();
tvm::runtime::PackedFunc set_input = mod->GetFunction("set_input");
set_input("data", input);
tvm::runtime::PackedFunc run = mod->GetFunction("run");
run();
tvm::runtime::PackedFunc get_output = mod->GetFunction("get_output");
This deployment works fine on a Jetson TX2 and a Titan 1080 Ti, however on a Turing based NVIDIA 1660 - all sorts of problems happen.
-
during the standard compile process, just simply using relay with no tuning, the forward pass is insanely slow (I get why this happens, so we move on to number 2).
-
during a tuning process similar to this: (https://docs.tvm.ai/tutorials/autotvm/tune_relay_x86.html#sphx-glr-tutorials-autotvm-tune-relay-x86-py) the tuning process for the GPU takes ~10 minutes to forward pass after it’s been ‘tuned’, while cpu takes fractions of a second. I’m not necessarily sure why - even after weeks of changing and augmenting almost every single variable we’re allowed to.
-
in order to get around that problem on this specific GPU (the 1660), I’ve installed THRUST due to some suggestions from this community, but continually run into:
terminate called after throwing an instance of 'dmlc::Error' what(): [11:13:29] /opt/src/tvm/src/runtime/library_module.cc:78: Check failed: ret == 0 (-1 vs. 0) : radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument Stack trace: [bt] (0) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(dmlc::LogMessageFatal::~LogMessageFatal()+0x4e) [0x55f62b298912] [bt] (1) /usr/local/lib/libtvm_runtime.so(+0x7675d) [0x7f60a068775d] [bt] (2) /usr/local/lib/libtvm_runtime.so(+0xec957) [0x7f60a06fd957] [bt] (3) /usr/local/lib/libtvm_runtime.so(tvm::runtime::GraphRuntime::Run()+0x37) [0x7f60a06fd9d7] [bt] (4) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x5a) [0x55f62b2b81aa] [bt] (5) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<>() const+0x96) [0x55f62b2c220c] [bt] (6) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(PoseFromConfig::forward_full(cv::Mat, float)+0x94f) [0x55f62b2aa9f7] [bt] (7) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(TVMPoseNode::callback(boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&, boost::shared_ptr<pcl::PointCloud<pcl::PointXYZRGB> const> const&, nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > >)+0x1344) [0x55f62b2b3e7e] [bt] (8) /opt/catkin_ws/devel/lib/recognition/debug_pose_model(boost::_mfi::mf4<void, TVMPoseNode, boost::shared_ptr<sensor_msgs::Image_<std::a
Any individuals with experience in tuning models with baked-in non-max-supression like the gluoncv yolo method, I’d love any help you could provide in getting over these 1660 errors.
Thanks
-Matt