Modularizing LLVM codegen/JIT

kparzysz · May 13, 2022, 6:42pm

First of all, I’m not sure what path this discussion should take, i.e. whether RFC needs to follow or not. This post is to present the idea and get some initial feedback.

Problem

LLVM maintains global state, and that global state can have an impact on the behavior of LLVM functions.

A specific example of that are various flags, which a clang user can pass to LLVM via -mllvm option. For example -mllvm -unroll-threshold=100 would set the threshold for loop unrolling to 100. Once that’s set, however, it remains in place, even when generating code for a different target. Since TVM can generate code for multiple targets all in the same compilation, this can become an issue. [Note: the -mllvm option is a clang option, what follows is the LLVM option. Naturally, there are ways to apply these LLVM options without clang.]

People working on individual targets may use the -mllvm flags to fine-tune the LLVM codegen to their needs, or as workarounds for LLVM bugs, but these flags will only be applicable to that target. However, these options will remain effective “forever”, moreover some such options can only be specified once, leading to an error (abort) in LLVM when they are applied for the second time.

Solution

To solve this, we need a mechanism to “reset” the state of global variables in LLVM back to the original state. The only mechanism that allows that (that I am aware of) is via loading/unloading shared libraries. I propose to isolate the LLVM code generation into its own shared library. This library would be loaded (dlopen) when an object code needs to be generated, and unloaded (dlclose) afterwards.

The JIT functionality would be accomplished by separating the codegen step from the execution step: the codegen library would generate an object file, which would then be loaded via a dynamic loader mechanism. This is actually what already happens anyway, except it happens inside of the ExecutionEngine, in the proposal the two steps would be separated.

What are everyone’s thoughts about this?

areusch · May 13, 2022, 6:49pm

Thanks for proposing this @kparzysz ! I think that could be a workable solution, though it comes with the drawback of adding an additional e.g. libtvm_llvm.so. I’m wondering if, since I believe we enumerate all of the llvm options in src/target/target_kind.cc, it would be enough to ensure we apply all of those and reset them to defaults when they’re not applied. Or, do they cause downstream global state to be modified which is hard to un-modify?

kparzysz · May 13, 2022, 7:04pm

These options can be anywhere, not just in target_kind. Check out this for example:

github.com

apache/tvm/blob/main/src/target/llvm/codegen_hexagon.cc#L314-L326


// The vector of LLVM options is treated at "argv" from "main(argc, argv)". The entry at
// position 0 is the name of the executable, and is ignored by the LLVM cl::option parser.
// Make sure it's set to "llvm" (tvm.target.hexagon does that).
std::vector<std::string> llvm_options_vec = split(llvm_options_str);
assert(llvm_options_vec.size() >= 1 && llvm_options_vec[0] == "llvm");
llvm_options_vec.insert(std::next(llvm_options_vec.begin()),
                        {"-hexagon-small-data-threshold=0",
                         "-force-target-max-vector-interleave=1", "-hexagon-autohvx=1"});


// Process extra command line options for LLVM. Make sure it's only
// done once.
static bool CallOnce = (ProcessLLVMOptions(llvm_options_vec), true);
(void)CallOnce;

kparzysz · May 13, 2022, 7:08pm

Also, even in the latest release of clang/LLVM, some options can only be specified once:

$ clang++ -O2 -mllvm -unroll-threshold=100 -mllvm -unroll-threshold=200 hello.cc
clang (LLVM option parsing): for the --unroll-threshold option: may only occur zero or one times!

This check has only recently been removed, and will be effective in clang/LLVM 15.

areusch · May 13, 2022, 7:09pm

Yeah that makes sense. In that case it seems like we’re forced to load/unload.

Another thought I had was that at some point constraints like this may force us to split apart the core compiler. For example, an importer might become a subprocess which could live in a separate virtualenv.

Likewise, a codegen could follow the same path. The advantage of that is that then you can be certain that nobody else loaded libtvm_llvm.so (although I think that’s pretty unlikely and shouldn’t necessarily gate progress here). I think we’d need Artifact to land in order to pursue this with codegen, so that folks weren’t attaching non-serializable data structures to runtime::Module.

areusch · May 13, 2022, 7:21pm

So I guess for next steps, it’d be great to sketch out a proposal/RFC of how we should do this. Maybe like a brief RFC capturing this thread plus a small PoC which could just grow in the PR would be sufficient…how does that sound?

kparzysz · May 13, 2022, 7:29pm

Sounds good to me.

// Comment to get 20+ characters.

tqchen · May 13, 2022, 7:50pm

sounds like a positive goal that we can certainly pursue(of making compilation state independent). although I am not too sure of dlopen/unopen LLVM.

Note that for certain targets like windows or to avoid conflict, there is actually a need to static link LLVM with hidden symbols, so it does not conflict with PyTorch.

perhaps we should invest in a reset llvm state function that resets do the best effort of resetting state

kparzysz · May 13, 2022, 8:05pm

Cold you elaborate on the need for static linking? Is this something that Windows requires in every situation, or is it specific to PyTorch? Do you know how to reproduce this problem?

I think it will be easier to find solutions to such issues than it would be to reset LLVM state. LLVM uses a lot of global variables (often static), and relies on static constructors to do some work…

Edit:

If I understand correctly, PyTorch has some LLVM library code linked into it as well. The plan would not expose LLVM functions outside of the shared library. All LLVM symbols can be local to it, and not visible outside. The shared library would have functions like runtime::Module compile(IRModule). That should avoid conflicts with other definitions of LLVM symbols coming from elsewhere.

junrushao · May 13, 2022, 8:22pm

Yeah I like the idea to being able to reset global states

There are indeed usecases to statically link with LLVM when I was working on a TVM-based training framework to minimize dependencies.

Is there alternative we have other than loading/unloading?

kparzysz · May 13, 2022, 8:28pm

Do you remember any details? Was LLVM used for anything other than code generation for existing targets?

junrushao · May 13, 2022, 8:44pm

I cannot disclose more, but i would say the usecase is similar with the case where we want to distribute in windows (@tqchen might have something sharable)

tqchen · May 13, 2022, 9:34pm

The main reason is due to LLVM version conflict with other projects that also links to LLVM (when both are imported into the same proccess).

Say PyTorch official build and link against LLVM-10, and then in TVM we build and link against LLVM-11. If the symbols are exposed to the global table, there will be symbol conflict that leads to segfault.

The safest approach is to link LLVM in, while hide LLVM’s symbols(so they do not apppear to others to cause a conflict)

See https://github.com/apache/tvm/issues/9362

kparzysz · May 13, 2022, 9:42pm

I see. I think the approach from my reply here (see the edit) would take care of this.

tqchen · May 13, 2022, 10:49pm

Get it. Yes it is possible that it might resolved the problem.

This is just a personal opinion. Use process loading/unloading to erase the state is a bit like working around the problem in a non-traditional way to tackle library dependencies. Additionally, the cost can come with loading/unloading each time.

Of course, when it comes to the need of isolation, we could choose to use solutions under this vein. A simpler one could be just hide build under a PopenWorker(which brings it to another process with a similar state). I would try to use new process, instead of load/unloading if possible (as loading/unloading also comes with complications of searching the additional DLL path under env, windows/linux specific dlopen etc).

Ideally, we should be able to configure an PassManager pipeline that is somewhat invariant from the LLVM static configuration. I have not read this part deeper enough to concretely say it is possible, but Reading LLVM doc gatherUnrollingPreferences does comes with some functions parameters that specifies unrolling preferences. Of course it depends on how intertwined the LLVM codepath with the static cl option.

Another way is to invest in utility tools to reset the cl options to the desirable state when entering an RAII scope, and recovering the cl option when exiting an RAII scope. I am not that deep into llvm::cl::option to see if that is possible, but it might worth thinking a bit about. As the cl::option does come with operator=, perhaps just need a way to get to the registered cl::option and do the reset(instead of calling ProcessLLVMOption)

The dummy code below shows what do I mean by that(although I am not sure how hard to get this to work, depending on how LLVM structures these options and their registration)

// hypotethsis code
void CodegenFunc() {
     With<LLVMOptionScope<int>>("unroll-threshold", 10);
    {
        With<LLVMOptionScope<int>>("unroll-threshold", 100);
        CHECK_EQ(GetLLVMOption<int>("unroll-upperbound"), 100);
    }
    CHECK_EQ(GetLLVMOption<int>("unroll-upperbound"), 10);
}

junrushao · May 13, 2022, 10:47pm

Both of your opinions make sense to me, all of which solve the problem worth tackling. Let me we organize our options into A1/A2/A3:

A1. Use shared library loading/unloading to clear global states
A2. Configure an PassManager pipeline that is somewhat invariant from the LLVM static configuration
A3. Use TVM’s existing with RAII scope mechanism to turn on/off global states

Did I get it right?

tqchen · May 13, 2022, 10:53pm

A1 can also be done in other mechanisms, like process forking(PopenWorker), note that each all comes with cost(of creating process state). Also I am not 100% sure how A2/A3 can be done, it will depends on LLVM’s mechanism for handling cl::opt, but the implementation of llvm::ParseCommandLineOptions might give us some insights about how to achieve A3

tqchen · May 14, 2022, 12:11am

OK did some fun explorations, confirmed that A3 can be done through LLVM API. Here is an example code that demonstrate how to do static opt setting

// C++ code
void PlayLLVMOption(std::string name, int value) {                                                                                                                                                          
  // Hack to get the argument list                                                                                                                                                                          
                                                                                                                                                                                                            
  llvm::StringMap<llvm::cl::Option*>& opt_map = llvm::cl::getRegisteredOptions();                                                                                                                           
                                                                                                                                                                                                            
  auto it = opt_map.find(name);                                                                                                                                                                             
                                                                                                                                                                                                            
  if (it != opt_map.end()) {                                                                                                                                                                                
    auto ptr = static_cast<llvm::cl::opt<int>*>(it->second);                                                                                                                                                
                                                                                                                                                                                                            
    LOG(INFO) << "original value=" << *ptr;                                                                                                                                                                 
    *ptr = value;                                                                                                                                                                                           
    LOG(INFO) << "set opt=" << name << " value=" << value;                                                                                                                                                  
  }                                                                                                                                                                                                         
}                                                                                                                                                                                                           
                                                                                                                                                                                                            
                                                                                                                                                                                                            
TVM_REGISTER_GLOBAL("testing.play_llvm_opt").set_body_typed(PlayLLVMOption);

Python code

import tvm.testing._ffi_api                                                                                                                                                                                 
                                                                                                                                                                                                            
tvm.testing._ffi_api.play_llvm_opt("unroll-max-count", 1)                                                                                                                                                        
tvm.testing._ffi_api.play_llvm_opt("unroll-max-count", 2)

Output

[20:07:59]  original value=0
[20:07:59]  set opt=unroll-max-count value=1
[20:07:59]  original value=1
[20:07:59]  set opt=unroll-max-count value=2

We should be able to use the llvm::cl::getRegisteredOptions() to get the optionmap, do an unsafe cast to the correct cl::opt data structure and obtain the old value, set the new value, and in RAII exit recover the old value.

tqchen · May 14, 2022, 12:30pm

Just for discussion reference, here is a PR that implements A3 https://github.com/apache/tvm/pull/11320

junrushao · May 14, 2022, 5:10am

This clear implementation of A3 makes a lot of sense to me in terms of functionality and simplicity