[RFC][VTA] Support for Cloud Devices (OpenCL-compatible)

A typical opencl kernel looks like

__kernel void helloworld(__global char* in, __global char* out)
{
	int num = get_global_id(0);
	out[num] = in[num] + 1;
}

, where get_global_id fetches the id of a global dimension, and kernel would utilize available hardware threads to compute along such dimension.

In addition, while OpenCL is originally designed to target general-purpose computing and the design of VTA is domain-specific, I think bridging OpenCL software stack into VTA hardware design would bring a lot of issues, and would degrade the actual performance.