@ganler Thanks for inviting me to this discussion. @slyubomirsky Your RFC is very nice and your efforts in developing a Relay fuzzer is appreciated. As is previously introduced by @ganler, I have implemented GenCoG, a computation graph generator for Relay. Since you are familiar with NNSmith, I may mainly discuss the similarities and differences between GenCoG and NNSmith here. @ganler Please point out if I get anything wrong about your work here.
Similarities:
- Both GenCoG and NNSmith generate pure dataflow graphs without control flow.
- Both methods require developers to manually specify constraints of operators and leverage a constraint solver to find solutions.
- Both methods incrementally construct a computation graph.
Differences (actually there are many, but I may only discuss the major ones):
- The most prominent difference is how developers write specifications of operators. GenCoG provides a simple DSL for specifying the constraints in a more organized, concise and comprehensible way. Let’s take
reshape
operator as an example. In GenCoG, we may specify its constraints as follows:
TypeSpec(
attrs={
Attr('newshape', List(Var(ran=dl_rank_ran),
lambda _: Var(int, ran=dim_ran, tmpl=True)))
},
in_num=1,
in_ranks=[Var()],
in_shapes=[List(IN[0].rank, lambda _: Var(tmpl=True))],
in_dtypes=[Var()],
extra=[
Reduce(IN[0].shape, ArithOp.MUL, 1) == Reduce(a('newshape'), ArithOp.MUL, 1)
],
out_num=1,
out_ranks=[Len(a('newshape'))],
out_shapes=[a('newshape')],
out_dtypes=[IN[0].dtype]
)
The solver in GenCoG automatically processes this specification and solves constraints involved. For NNSmith, the specification of reshape
is much longer:
@mark_materialize("core")
class Reshape(UnaryOpBase):
num_var_param = int_range(1, 4)
in_dtypes = [(i,) for i in DTYPE_ALL]
out_dtypes = [(i,) for i in DTYPE_ALL]
def __init__(self, *target_shape):
super().__init__()
self.inp_ranks = [int_range(1, 4)]
self.out_ranks = [(len(target_shape),)]
self.target_shape: List[Union[int, z3.ExprRef]] = list(target_shape)
def type_transfer(self, input_shapes: List[AbsTensor]) -> List[AbsTensor]:
__MAX_SOLVE_SYMBOL__ = 8
# otherwise OOM.
ConstraintCheck.le(
input_shapes[0].ndims + len(self.target_shape), __MAX_SOLVE_SYMBOL__
)
if -1 not in self.target_shape:
return [AbsTensor(self.target_shape, dtype=input_shapes[0].dtype)]
# else
abs_tensor = AbsTensor(self.target_shape, dtype=input_shapes[0].dtype)
auto_dim = -1
accum = 1
for i, v in enumerate(self.target_shape):
# TODO: What to do about bitvectors here?
if v == -1:
SanityCheck.eq(auto_dim, -1)
auto_dim = i
else:
accum = nnsmith_mul(accum, v)
abs_tensor.shape[auto_dim] = nnsmith_div(
reduce(lambda x, y: nnsmith_mul(x, y), input_shapes[0].shape, 1), accum
)
return [abs_tensor]
def requires(self, input_shapes):
ret = []
inp = input_shapes[0]
src_len, dst_len = inp.ndims, len(self.target_shape)
if src_len == 0:
src_len = 1 # special handling for scalar
if dst_len == 0:
dst_len = 1 # special handling for scalar
gres_config = os.getenv("NNSMITH_GRES", "4")
if gres_config == "5":
ng = 1
elif gres_config == "3":
ng = min(src_len, dst_len)
elif gres_config == "4":
ub = min(src_len, dst_len)
ng = random.choices(
range(1, ub + 1), k=1, weights=[2**i for i in range(ub)]
)[0]
else:
raise ValueError(f"NNSMITH_GRES={gres_config} is not recognized")
src_group = random_group(src_len, ng)
dst_group = random_group(dst_len, ng)
self.ng = ng
self.src_group = src_group
self.dst_group = dst_group
assert len(src_group) == len(dst_group) == ng, (src_group, dst_group)
# group constraints
src_vars = inp.shape
dst_vars = self.target_shape
if len(src_vars) == 0:
src_vars = [1] # special handling for scalar
if len(dst_vars) == 0:
dst_vars = [1] # special handling for scalar
cons_group = []
for gid in range(ng):
src_idx = src_group[gid]
dst_idx = dst_group[gid]
src_prod = reduce(nnsmith_mul, [src_vars[i] for i in src_idx], 1)
dst_prod = reduce(nnsmith_mul, [dst_vars[i] for i in dst_idx], 1)
cons_group.append(nnsmith_eq(src_prod, dst_prod))
ret.extend(cons_group)
if os.getenv("NNSMITH_CONS_RESHAPE", "off") != "off":
# should not be too extreme!
__DIM_LIMIT__ = 4096
lim = __DIM_LIMIT__
for s in self.target_shape[::-1]:
ret.append(nnsmith_le(s, lim))
lim //= 2
lim = max(lim, 1)
assert -1 not in self.target_shape
return ret
def deduct_inp_ranks_and_dtype(
self, out_abs_tensor: List[AbsTensor]
) -> List[Tuple[int, DType]]:
return [(-1, out_abs_tensor[0].dtype)]
Broadly speaking, the specification in GenCoG is more declarative while the one in NNSmith is more imperative (e.g., some manual sampling code is involved). I am aware that the specification in NNSmith contains more fine-grained handling of some corner cases, and therefore it may not be directly comparable with the one in GenCoG. However, GenCoG is still able to potentially make the specification simpler.
- GenCoG does not contain the value searching technique in NNSmith, which improves numerical validity. It is still possible that GenCoG can be extended with this part.
That is what I have to share for now. I may add some more details and discuss other related issues later.