Custom datatypes in AutoScheduler?

VictorVanAcht · March 22, 2022, 8:11am

I have created a custom datatype in a model that I can compile and run. So far so good.

Now I would like to optimize the execution of this model by using the AutoScheduler. However, each of the child processes that are started by the Autoscheduler errors out with the following error:

No: 31	GFLOPS: 0.00 / 0.00	results: MeasureResult(error_type:CompileHostError, error_msg:TVMError('Traceback (most recent call last):
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/exec/popen_worker.py", line 87, in main
    result = fn(*args, **kwargs)
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/auto_scheduler/measure.py", line 660, in local_build_worker
    return _local_build_worker(inp, build_func, verbose)
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/auto_scheduler/measure.py", line 601, in _local_build_worker
    inp = MeasureInput.deserialize(inp_serialized)
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/auto_scheduler/measure.py", line 148, in deserialize
    deserialize_workload_registry_entry(data[1])
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/auto_scheduler/workload_registry.py", line 249, in deserialize_workload_registry_entry\n
    value = LoadJSON(value)
  File "/home/victor/hpc/tvm-build/tvmgpucopy/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()\ntvm._ffi.base.TVMError: Traceback (most recent call last):
  10: TVMFuncCall
  9: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)\n
  8: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  7: tvm::JSONAttrSetter::Set(tvm::runtime::ObjectPtr<tvm::runtime::Object>*, tvm::JSONNode*)
  6: tvm::ReflectionVTable::VisitAttrs(tvm::runtime::Object*, tvm::AttrVisitor*) const
  5: tvm::JSONAttrSetter::Visit(char const*, tvm::runtime::DataType*)
  4: tvm::runtime::String2DLDataType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  3: tvm::runtime::ParseCustomDatatype(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const**)
  2: tvm::runtime::GetCustomTypeCode(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  1: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::datatype::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  0: tvm::datatype::Registry::GetTypeCode(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  File "/home/victor/hpc/tvm-build/tvmgpucopy/src/target/datatype/registry.cc", line 59
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.\nFor more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (name_to_code_.find(type_name) != name_to_code_.end()) is false: Type name cmpl not registered\n',), all_cost:15.00, Tstamp:1647511127.09)

As the error shows, it cannot find my custom dataype named “cmpl”, whereas I am very sure I did register this data type. [Also proven by the fact that I am able to compile and run my model, without using the AutoScheuler.]

Could this have anything to do with this error + fix? https://discuss.tvm.apache.org/t/autoscheduler-and-autotvm-measure-measure-methods-set-cuda-target-arch/10939/3 https://github.com/apache/tvm/pull/8913

vinx13 · March 22, 2022, 9:02pm

It is possible that they are not propagated to workers, if they are registered in python side, are you able to try registering them again in workers by setting a custom PopenPool initializer like the one in the PR?

VictorVanAcht · March 23, 2022, 2:16pm

@vinx13 I am sorry. I am not an TVM-expert. Your answer is too short for me to understand. Can you elaborate your answer in more detail?

vinx13 · March 23, 2022, 5:10pm

you can try modifying reset_global_scope https://github.com/vinx13/tvm/blob/79a1aaf693feddf1d9835779d7eb587ac0eb21df/python/tvm/autotvm/env.py#L35 and registering your custom data types there, which will be sent to worker processes. If your custom data types are registered in cpp side, it will be available in worker processes and there are no actions needed

VictorVanAcht · March 25, 2022, 8:27am

@vinx13

I modified reset_global_scope() in env.py into the following:

import tvm
def reset_global_scope(global_scope):
    """Reset global autotvm state. This is needed to initialize PopenPool workers."""
    global GLOBAL_SCOPE
    GLOBAL_SCOPE.deep_copy(global_scope)
    AutotvmGlobalScope.current = global_scope

     # Defition of the custom datatype and a few operators
    tvm.target.datatype.register("cmpl", 150)
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({(32, 64): "Float32ToComplex64"}),
        "Cast",
        "llvm",
        "float",
        "cmpl",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({64: "Complex64Add"}),
        "Add",
        "llvm",
        "cmpl",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({64: "Complex64Sub"}),
        "Sub",
        "llvm",
        "cmpl",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({64: "Complex64Mul"}),
        "Mul",
        "llvm",
        "cmpl",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({64: "Complex64Div"}),
        "Div",
        "llvm",
        "cmpl",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.lower_call_pure_extern,
        "Call",
        "llvm",
        "cmpl",
        intrinsic_name="tir.call_pure_extern",
    )
    tvm.target.datatype.register_op(
        tvm.target.datatype.create_lower_func({(64, 32): "Complex64ToFloat32"}),
        "Cast",
        "llvm",
        "cmpl",
        "float",
    )

But I still get the same error message:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/python/tvm/exec/popen_worker.py", line 105, in <module>
    main()
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/python/tvm/exec/popen_worker.py", line 77, in main
    fn, args, kwargs, timeout = cloudpickle.loads(reader.read(bytes_size))
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/python/tvm/runtime/object.py", line 93, in __setstate__
    self.__init_handle_by_constructor__(_ffi_node_api.LoadJSON, handle)
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/python/tvm/_ffi/_ctypes/object.py", line 136, in __init_handle_by_constructor__
    handle = __init_by_constructor__(fconstructor, args)
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  7: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  6: tvm::JSONAttrSetter::Set(tvm::runtime::ObjectPtr<tvm::runtime::Object>*, tvm::JSONNode*)
  5: tvm::JSONAttrSetter::Visit(char const*, tvm::runtime::DataType*)
  4: tvm::runtime::String2DLDataType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  3: tvm::runtime::ParseCustomDatatype(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const**)
  2: tvm::runtime::GetCustomTypeCode(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::datatype::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: tvm::datatype::Registry::GetTypeCode(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  File "/home/victor/hpc/tvm-build/tvmgpucopy2/src/target/datatype/registry.cc", line 59
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (name_to_code_.find(type_name) != name_to_code_.end()) is false: Type name cmpl not registered. Elements in type registration list: 0

Note that I extended the error message a bit by also printing name_to_code_.size() in Registry::GetTypeCode in the file registery.cc. And indeed the length of that std::unordered_map is equal to 0. So indeed it seems as if type registrations are not properly done.

Do you have any suggestions?

vinx13 · March 25, 2022, 5:23pm

The modified code looks good to me. Maybe you can double check what’s going on inside those registrations, e.g. checking whether the mapping in c++ side is indeed updated when register is called