First of all, I’m not sure what path this discussion should take, i.e. whether RFC needs to follow or not. This post is to present the idea and get some initial feedback.
Problem
LLVM maintains global state, and that global state can have an impact on the behavior of LLVM functions.
A specific example of that are various flags, which a clang user can pass to LLVM via -mllvm option. For example -mllvm -unroll-threshold=100 would set the threshold for loop unrolling to 100. Once that’s set, however, it remains in place, even when generating code for a different target. Since TVM can generate code for multiple targets all in the same compilation, this can become an issue. [Note: the -mllvm option is a clang option, what follows is the LLVM option. Naturally, there are ways to apply these LLVM options without clang.]
People working on individual targets may use the -mllvm flags to fine-tune the LLVM codegen to their needs, or as workarounds for LLVM bugs, but these flags will only be applicable to that target. However, these options will remain effective “forever”, moreover some such options can only be specified once, leading to an error (abort) in LLVM when they are applied for the second time.
Solution
To solve this, we need a mechanism to “reset” the state of global variables in LLVM back to the original state. The only mechanism that allows that (that I am aware of) is via loading/unloading shared libraries. I propose to isolate the LLVM code generation into its own shared library. This library would be loaded (dlopen) when an object code needs to be generated, and unloaded (dlclose) afterwards.
The JIT functionality would be accomplished by separating the codegen step from the execution step: the codegen library would generate an object file, which would then be loaded via a dynamic loader mechanism. This is actually what already happens anyway, except it happens inside of the ExecutionEngine, in the proposal the two steps would be separated.
Thanks for proposing this @kparzysz ! I think that could be a workable solution, though it comes with the drawback of adding an additional e.g. libtvm_llvm.so. I’m wondering if, since I believe we enumerate all of the llvm options in src/target/target_kind.cc, it would be enough to ensure we apply all of those and reset them to defaults when they’re not applied. Or, do they cause downstream global state to be modified which is hard to un-modify?
Also, even in the latest release of clang/LLVM, some options can only be specified once:
$ clang++ -O2 -mllvm -unroll-threshold=100 -mllvm -unroll-threshold=200 hello.cc
clang (LLVM option parsing): for the --unroll-threshold option: may only occur zero or one times!
This check has only recently been removed, and will be effective in clang/LLVM 15.
Yeah that makes sense. In that case it seems like we’re forced to load/unload.
Another thought I had was that at some point constraints like this may force us to split apart the core compiler. For example, an importer might become a subprocess which could live in a separate virtualenv.
Likewise, a codegen could follow the same path. The advantage of that is that then you can be certain that nobody else loaded libtvm_llvm.so (although I think that’s pretty unlikely and shouldn’t necessarily gate progress here). I think we’d need Artifact to land in order to pursue this with codegen, so that folks weren’t attaching non-serializable data structures to runtime::Module.
So I guess for next steps, it’d be great to sketch out a proposal/RFC of how we should do this. Maybe like a brief RFC capturing this thread plus a small PoC which could just grow in the PR would be sufficient…how does that sound?
sounds like a positive goal that we can certainly pursue(of making compilation state independent). although I am not too sure of dlopen/unopen LLVM.
Note that for certain targets like windows or to avoid conflict, there is actually a need to static link LLVM with hidden symbols, so it does not conflict with PyTorch.
perhaps we should invest in a reset llvm state function that resets do the best effort of resetting state
Cold you elaborate on the need for static linking? Is this something that Windows requires in every situation, or is it specific to PyTorch? Do you know how to reproduce this problem?
I think it will be easier to find solutions to such issues than it would be to reset LLVM state. LLVM uses a lot of global variables (often static), and relies on static constructors to do some work…
Edit:
If I understand correctly, PyTorch has some LLVM library code linked into it as well. The plan would not expose LLVM functions outside of the shared library. All LLVM symbols can be local to it, and not visible outside. The shared library would have functions like runtime::Module compile(IRModule). That should avoid conflicts with other definitions of LLVM symbols coming from elsewhere.
I cannot disclose more, but i would say the usecase is similar with the case where we want to distribute in windows (@tqchen might have something sharable)
The main reason is due to LLVM version conflict with other projects that also links to LLVM (when both are imported into the same proccess).
Say PyTorch official build and link against LLVM-10, and then in TVM we build and link against LLVM-11. If the symbols are exposed to the global table, there will be symbol conflict that leads to segfault.
The safest approach is to link LLVM in, while hide LLVM’s symbols(so they do not apppear to others to cause a conflict)
Get it. Yes it is possible that it might resolved the problem.
This is just a personal opinion. Use process loading/unloading to erase the state is a bit like working around the problem in a non-traditional way to tackle library dependencies. Additionally, the cost can come with loading/unloading each time.
Of course, when it comes to the need of isolation, we could choose to use solutions under this vein. A simpler one could be just hide build under a PopenWorker(which brings it to another process with a similar state). I would try to use new process, instead of load/unloading if possible (as loading/unloading also comes with complications of searching the additional DLL path under env, windows/linux specific dlopen etc).
Ideally, we should be able to configure an PassManager pipeline that is somewhat invariant from the LLVM static configuration. I have not read this part deeper enough to concretely say it is possible, but Reading LLVM doc gatherUnrollingPreferences does comes with some functions parameters that specifies unrolling preferences. Of course it depends on how intertwined the LLVM codepath with the static cl option.
Another way is to invest in utility tools to reset the cl options to the desirable state when entering an RAII scope, and recovering the cl option when exiting an RAII scope. I am not that deep into llvm::cl::option to see if that is possible, but it might worth thinking a bit about. As the cl::option does come with operator=, perhaps just need a way to get to the registered cl::option and do the reset(instead of calling ProcessLLVMOption)
The dummy code below shows what do I mean by that(although I am not sure how hard to get this to work, depending on how LLVM structures these options and their registration)
A1 can also be done in other mechanisms, like process forking(PopenWorker), note that each all comes with cost(of creating process state). Also I am not 100% sure how A2/A3 can be done, it will depends on LLVM’s mechanism for handling cl::opt, but the implementation of llvm::ParseCommandLineOptions might give us some insights about how to achieve A3
[20:07:59] original value=0
[20:07:59] set opt=unroll-max-count value=1
[20:07:59] original value=1
[20:07:59] set opt=unroll-max-count value=2
We should be able to use the llvm::cl::getRegisteredOptions() to get the optionmap, do an unsafe cast to the correct cl::opt data structure and obtain the old value, set the new value, and in RAII exit recover the old value.