Pipeline Executor with VTA Sim

Hi, I am trying to recreate the deploy_detection tutorial from VTA (https://github.com/apache/tvm/blob/main/vta/tutorials/frontend/deploy_detection.py), but instead of using the GraphExecutor I want to use the pipeline_executor. I have done the appropriate modifications, i.e. use the following function as build function:

def vta_build(mod, target, params=None, target_host=None, mod_name="default"):
    with tvm.transform.PassContext(opt_level=3):
        with relay.quantize.qconfig(
            global_scale=23.0,
            skip_conv_layers=[0],
            store_lowbit_output=True,
            round_for_shift=True,

        ):
            mod = relay.quantize.quantize(mod, params=params)
            # print(f"Darknet module: {mod}")

        # Perform graph packing and constant folding for VTA target
        mod = graph_pack(
            mod["main"],
            env.BATCH,
            env.BLOCK_OUT,
            env.WGT_WIDTH,
            start_name=pack_dict[0],
            stop_name=pack_dict[1],
            start_name_idx=pack_dict[2],
            stop_name_idx=pack_dict[3],
        )

    with vta.build_config(disabled_pass={"AlterOpLayout", "tir.CommonSubexprElimTIR"}):
        lib = relay.build(
            mod, target=target, target_host=target_host, params=params
        )
    return lib

My pipeline_config looks as follows:

pipe_config[mod].target = tvm.target.Target(env.target, "llvm") 
pipe_config[mod].target_host = env.target_host 
pipe_config[mod].dev = tvm.ext_dev(0)
pipe_config[mod].build_func = vta_build
pipe_config[mod].params = params
pipe_config["input"]["data"].connect(pipe_config[mod]["input"]["data"])
pipe_config[mod]["output"][0].connect(pipe_config["output"][0])

However, the output values always differ. I was suspecting that the pipeline_executor maybe does not support ext_dev, but if I gather the simulator statistics they seem legit.

If anybody could help me out I would be very grateful!