[RUNTIME][OPENCL] memory leak of cl_kernel when repeat creat and join thread

Hi, my application is running on android platform and use opencl runtime, it create module only once, and each run will create new thread. On this scenes i found that tvm opencl runtime have memory leak.

So I debug the opencl runtime. I found that OpenCLThreadEntry is a thread local storage, and if thread create, the OpenCLModuleNode::InstallKernel() function will be call for creating a new cl_kernel object, but when thread finish , this created cl_kernel object still keep in memory. These cl_kernel objects are only destroyed when the OpenClModulenode object is released.

So I just move the release kernel to OpenCLThreadEntry destructor, this fix my memory leak problem.
my modify commit here:

Because I am not familiar enough for TVM runtime, is there a better modifications to solve this memory leak problem?

Thanks very much for reply