Modularizing LLVM codegen/JIT

Also, even in the latest release of clang/LLVM, some options can only be specified once:

$ clang++ -O2 -mllvm -unroll-threshold=100 -mllvm -unroll-threshold=200 hello.cc
clang (LLVM option parsing): for the --unroll-threshold option: may only occur zero or one times!

This check has only recently been removed, and will be effective in clang/LLVM 15.

Yeah that makes sense. In that case it seems like we’re forced to load/unload.

Another thought I had was that at some point constraints like this may force us to split apart the core compiler. For example, an importer might become a subprocess which could live in a separate virtualenv.

Likewise, a codegen could follow the same path. The advantage of that is that then you can be certain that nobody else loaded libtvm_llvm.so (although I think that’s pretty unlikely and shouldn’t necessarily gate progress here). I think we’d need Artifact to land in order to pursue this with codegen, so that folks weren’t attaching non-serializable data structures to runtime::Module.

1 Like

So I guess for next steps, it’d be great to sketch out a proposal/RFC of how we should do this. Maybe like a brief RFC capturing this thread plus a small PoC which could just grow in the PR would be sufficient…how does that sound?

Sounds good to me.

// Comment to get 20+ characters.

sounds like a positive goal that we can certainly pursue(of making compilation state independent). although I am not too sure of dlopen/unopen LLVM.

Note that for certain targets like windows or to avoid conflict, there is actually a need to static link LLVM with hidden symbols, so it does not conflict with PyTorch.

perhaps we should invest in a reset llvm state function that resets do the best effort of resetting state

Cold you elaborate on the need for static linking? Is this something that Windows requires in every situation, or is it specific to PyTorch? Do you know how to reproduce this problem?

I think it will be easier to find solutions to such issues than it would be to reset LLVM state. LLVM uses a lot of global variables (often static), and relies on static constructors to do some work…

Edit:

If I understand correctly, PyTorch has some LLVM library code linked into it as well. The plan would not expose LLVM functions outside of the shared library. All LLVM symbols can be local to it, and not visible outside. The shared library would have functions like runtime::Module compile(IRModule). That should avoid conflicts with other definitions of LLVM symbols coming from elsewhere.

Yeah I like the idea to being able to reset global states :+1:

There are indeed usecases to statically link with LLVM when I was working on a TVM-based training framework to minimize dependencies.

Is there alternative we have other than loading/unloading?

Do you remember any details? Was LLVM used for anything other than code generation for existing targets?

1 Like

I cannot disclose more, but i would say the usecase is similar with the case where we want to distribute in windows (@tqchen might have something sharable)

The main reason is due to LLVM version conflict with other projects that also links to LLVM (when both are imported into the same proccess).

Say PyTorch official build and link against LLVM-10, and then in TVM we build and link against LLVM-11. If the symbols are exposed to the global table, there will be symbol conflict that leads to segfault.

The safest approach is to link LLVM in, while hide LLVM’s symbols(so they do not apppear to others to cause a conflict)

See https://github.com/apache/tvm/issues/9362

I see. I think the approach from my reply here (see the edit) would take care of this.

1 Like

Get it. Yes it is possible that it might resolved the problem.

This is just a personal opinion. Use process loading/unloading to erase the state is a bit like working around the problem in a non-traditional way to tackle library dependencies. Additionally, the cost can come with loading/unloading each time.

Of course, when it comes to the need of isolation, we could choose to use solutions under this vein. A simpler one could be just hide build under a PopenWorker(which brings it to another process with a similar state). I would try to use new process, instead of load/unloading if possible (as loading/unloading also comes with complications of searching the additional DLL path under env, windows/linux specific dlopen etc).

Ideally, we should be able to configure an PassManager pipeline that is somewhat invariant from the LLVM static configuration. I have not read this part deeper enough to concretely say it is possible, but Reading LLVM doc gatherUnrollingPreferences does comes with some functions parameters that specifies unrolling preferences. Of course it depends on how intertwined the LLVM codepath with the static cl option.

Another way is to invest in utility tools to reset the cl options to the desirable state when entering an RAII scope, and recovering the cl option when exiting an RAII scope. I am not that deep into llvm::cl::option to see if that is possible, but it might worth thinking a bit about. As the cl::option does come with operator=, perhaps just need a way to get to the registered cl::option and do the reset(instead of calling ProcessLLVMOption)

The dummy code below shows what do I mean by that(although I am not sure how hard to get this to work, depending on how LLVM structures these options and their registration)

// hypotethsis code
void CodegenFunc() {
     With<LLVMOptionScope<int>>("unroll-threshold", 10);
    {
        With<LLVMOptionScope<int>>("unroll-threshold", 100);
        CHECK_EQ(GetLLVMOption<int>("unroll-upperbound"), 100);
    }
    CHECK_EQ(GetLLVMOption<int>("unroll-upperbound"), 10);
}

1 Like

Both of your opinions make sense to me, all of which solve the problem worth tackling. Let me we organize our options into A1/A2/A3:

  • A1. Use shared library loading/unloading to clear global states
  • A2. Configure an PassManager pipeline that is somewhat invariant from the LLVM static configuration
  • A3. Use TVM’s existing with RAII scope mechanism to turn on/off global states

Did I get it right?

A1 can also be done in other mechanisms, like process forking(PopenWorker), note that each all comes with cost(of creating process state). Also I am not 100% sure how A2/A3 can be done, it will depends on LLVM’s mechanism for handling cl::opt, but the implementation of llvm::ParseCommandLineOptions might give us some insights about how to achieve A3

OK did some fun explorations, confirmed that A3 can be done through LLVM API. Here is an example code that demonstrate how to do static opt setting

// C++ code
void PlayLLVMOption(std::string name, int value) {                                                                                                                                                          
  // Hack to get the argument list                                                                                                                                                                          
                                                                                                                                                                                                            
  llvm::StringMap<llvm::cl::Option*>& opt_map = llvm::cl::getRegisteredOptions();                                                                                                                           
                                                                                                                                                                                                            
  auto it = opt_map.find(name);                                                                                                                                                                             
                                                                                                                                                                                                            
  if (it != opt_map.end()) {                                                                                                                                                                                
    auto ptr = static_cast<llvm::cl::opt<int>*>(it->second);                                                                                                                                                
                                                                                                                                                                                                            
    LOG(INFO) << "original value=" << *ptr;                                                                                                                                                                 
    *ptr = value;                                                                                                                                                                                           
    LOG(INFO) << "set opt=" << name << " value=" << value;                                                                                                                                                  
  }                                                                                                                                                                                                         
}                                                                                                                                                                                                           
                                                                                                                                                                                                            
                                                                                                                                                                                                            
TVM_REGISTER_GLOBAL("testing.play_llvm_opt").set_body_typed(PlayLLVMOption);                                                                                                                                                                                             

Python code

import tvm.testing._ffi_api                                                                                                                                                                                 
                                                                                                                                                                                                            
tvm.testing._ffi_api.play_llvm_opt("unroll-max-count", 1)                                                                                                                                                        
tvm.testing._ffi_api.play_llvm_opt("unroll-max-count", 2) 

Output

[20:07:59]  original value=0
[20:07:59]  set opt=unroll-max-count value=1
[20:07:59]  original value=1
[20:07:59]  set opt=unroll-max-count value=2

We should be able to use the llvm::cl::getRegisteredOptions() to get the optionmap, do an unsafe cast to the correct cl::opt data structure and obtain the old value, set the new value, and in RAII exit recover the old value.

1 Like

Just for discussion reference, here is a PR that implements A3 https://github.com/apache/tvm/pull/11320

This clear implementation of A3 makes a lot of sense to me in terms of functionality and simplicity

Interesting. Thanks for checking!

Edit: If this approach works, I’ll change the plan to use that instead of dlopen/dlclose.

Can we wait a bit with this PR, until it’s clearer what mechanism we will need?

Of course, the PR is mainly to demonstrate the mechanism. We can wait until we agree on the right mechanism

1 Like