Ops become slow when using te.var

Just inline the one stage into the other one?

EDIT: wait your if statements require variables which are not defined (blockIdx.x andThreadIdx.x)