After @tqchen’s comment, I think I’m in favor of there being a distinction between the two, with tensors being able to indicate where they are located. The gaps in expressibility (e.g. scaling by a PrimValue) and user-friendliness (e.g. passing a python float instead of tvm.nd.array(value, dtype='float32')) should be solvable with automatic wrappers.
For performance differences, I did a bit more digging, and I think we’re okay. The tir.noalias attribute, set by default for most kernels, is used to output the __restrict__ attribute for kernel pointer arguments. My main worry was that optimization would be worse for void scale(float16* buf, size_t n, float16* pointer_to_scalar) than for void scale(float16* buf, size_t n, float16 scalar), in case buf <= pointer_to_scalar < buf+n. With the __restrict__ keyword, we should be good there, but I’d want to performance test it before saying that conclusively.