Undefined behavior happens when casting string to bool

Hi community, I‘m studying TVM and its low-level operators recently. These days I tried to construct some simple loops, and I found an undefined behavior may occur if we cast string to a boolean value.

As listed below, I cast string to bool as the condition of a While loop, I found it will crash if I build the module several times.

import tvm
from tvm import tir

v = tir.Cast('bool', tvm.runtime.convert("a"))
body = tir.stmt.While(v, body=tir.Evaluate(tir.const(0)))
func = tir.PrimFunc(params={}, body=body)
mod = tvm.lower(func)
nopt_mod = tvm.build(mod)
nopt_mod = tvm.build(mod)

I build the latest TVM on my server, with Ubuntu 18.04 and llvm-12, after executing my code, I got such a crash message:

(tvm-build) ➜  ~ python3 cast_str.py
[1]    33220 segmentation fault (core dumped)  python3 cast_str.py

To find the root cause, I read the source and found the function CodeGenLLVM::CreateCast doesn’t consider the case casting from handle to int. If to.is_uint() && to.bits() == 1 exists and from is not float number, it will be treated as an Int type.

Since we pass StringImm, whose dtype is handle, as the original type, it will be treated as an int and finally cause undefined behavior, like such crash.

llvm::Value* CodeGenLLVM::CreateCast(DataType from, DataType to, llvm::Value* value) {
  llvm::Type* target = DTypeToLLVMType(to);
  if (value->getType() == target) return value;
  if (to.is_handle()) {
    return builder_->CreateBitCast(value, target);
  } else if (to.is_uint() && to.bits() == 1) {
    if (from.is_float()) {
      llvm::Constant* zero = llvm::ConstantFP::get(DTypeToLLVMType(from), 0.);
      return builder_->CreateFCmpONE(value, zero);
    } else {
      llvm::Constant* zero = llvm::ConstantInt::get(DTypeToLLVMType(from), 0);
      return builder_->CreateICmpNE(value, zero);
    }
  } else if (!from.is_float() && !to.is_float()) {
    return builder_->CreateIntCast(value, target, from.is_int());
  } else if (from.is_float() && to.is_int()) {
    return builder_->CreateFPToSI(value, target);
  } else if (from.is_float() && to.is_uint()) {
    if (to.bits() < 8) {
      value = builder_->CreateFPToUI(value, DTypeToLLVMType(to.with_bits(8)));
      return builder_->CreateIntCast(value, target, false);
    } else {
      return builder_->CreateFPToUI(value, target);
    }
  } else if (from.is_int() && to.is_float()) {
    return builder_->CreateSIToFP(value, target);
  } else if (from.is_uint() && to.is_float()) {
    return builder_->CreateUIToFP(value, target);
  } else {
    ICHECK(from.is_float() && to.is_float());
    return builder_->CreateFPCast(value, target);
  }
}

I’m curious whether casting string to boolean is a legal behavior in TVM, if yes, how can we fix this bug? Or if not, it will be better if we provide some tips to developers to prevent it. I would be grateful if anyone can help me :laughing:

1 Like

Yeah sounds like it is a bug that we could potentially fix. Just to confirm, if we change tir.Cast to tir.cast, does tvm throw meaningful error messages?

It seems that tir.cast is not an attribute of tvm.tir. I know that we can use relay.cast to cast the data type of relay, but for tir, we have to use tir.Cast? @junrushao

It is defined in python/tvm/tir/generic.py, but weirdly it is not imported in __init__.py

Thanks! @junrushao

I have changed tir.Cast to tir.cast and found this crash occurs again without any messages.

It seems that casting from handle to int is undefined behavior in llvm codegen and if I try to trigger it several times, it may weirdly crash.

Ah thanks for confirm this bug!

Yeah for sure it is a bug. Shall we add some checks in tir.cast, implemented in src/tir/op/op.cc?

I think it may not be enough if we only add checks in tir.cast, since we can also construct the expression by tir.Cast directly, which implemented in src/tir/ir/expr.cc. Shall we check the data type of value in Cast::Cast, to limit casting from handle to other data types?

Yeah for sure we can fix Cast::Cast too, but in general, in the namespace tvm.tir we have a convention that lower cases ones that do proper checks (and possibly type escalation), and uppercase ones for directly constructing a node

Thanks for your reply! As we have this convention, I agree that fixing this bug in tir.cast will be more proper. I will send a PR to enhance the check of casting.