[DISCUSS] Module based Model Runtime Interface

I use -ftime-report option to do investigate this issue. I find that the bottleneck time is parsing part, not optimize part. So if we could write the tvm_dev_mblob array into object file directly, we should resolve it.

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1408 kB ( 0%) ggc
 phase parsing           :  78.28 (89%) usr  47.62 (99%) sys 125.91 (91%) wall14727874 kB (100%) ggc
 phase lang. deferred    :   3.02 ( 3%) usr   0.00 ( 0%) sys   3.01 ( 2%) wall       0 kB ( 0%) ggc
 phase opt and generate  :   6.81 ( 8%) usr   0.24 ( 1%) sys   7.06 ( 5%) wall       5 kB ( 0%) ggc
 phase finalize          :   0.00 ( 0%) usr   0.04 ( 0%) sys   1.80 ( 1%) wall       0 kB ( 0%) ggc
 garbage collection      :   3.02 ( 3%) usr   0.00 ( 0%) sys   3.01 ( 2%) wall       0 kB ( 0%) ggc
 callgraph construction  :   6.81 ( 8%) usr   0.24 ( 1%) sys   7.06 ( 5%) wall       5 kB ( 0%) ggc
 preprocessing           :  14.37 (16%) usr  24.75 (52%) sys  39.85 (29%) wall      23 kB ( 0%) ggc
 parser (global)         :  63.91 (73%) usr  22.87 (48%) sys  86.06 (62%) wall14727850 kB (100%) ggc
 TOTAL                 :  88.11            47.98           137.87           14729306 kB

And I also test tcc compiler, whose parsing speed is very fast. And the compiling time only need 5.04s.