Param or Weights preloading into Global memory of device, Relax BYOC JSON Runtime

Hi There!

I have done BYOC in TVM relax and I am using JSON runtime to execute my kernels. Current I am accessing the weight pointer while execute the kernel with in the RUN function. I cannot access weight or param pointer within my constructor of Runtime object while building the model.

Any idea on how to offload the weight to my device global memory while building the model and used the loaded weight while kernel execution instead of loading them on runtime during kernel execution? Any Idea how to handle this thing?

If i get what you mean correctly, you want to decouple weigh handling(where it is storing) from the code.

Usually in such case you want to try to pass in the weight from outside as argument instead of packaging it. We have a pass transform_params which does that(that lifts weights into a tuple arguments), so you can decide how that is loaded, this is actually being used in our LLM flow. Although now sure how well the BYOC works with this flow

1 Like