Introducing and modernize FFI system

As we capture several year’s of lessons, it is important for us to summarize and bring solid modules for our future development. We are happy to share that we reached a milestone of modernizing the FFI foundation of the project and introduce a new minimal and lightweight module. Specifically, we bring a new minimal and lightweight module tvm ffi based on our lessons in the past few years. It implements a modern version of the Unified Packed and Object RFC that unifies the packed function call and object systems.

Summary of the change:

  • A dedicated clean Any/AnyView that can store strong and weak references of items
  • Function(previously PackedFunc) system built on top of the Any/AnyView
  • A minimal C API that backs the overall calls. We are stabilizing the API with a goal to bring clean, stable FFI conventions for both compiled and registered code
  • A rewrite of core python binding and generated code based on the module
  • Update existing code and test cases to the new module
  • Latest dlpack support

The new module brings many benefits thanks to the cleaner design, to name a few:

  • Any can support both POD types(int) and object types.
  • Containers (e.g. Array) can now also contain Any value, e.g. now Array<int> is supported, no need for boxed types
  • Error handling now upgrades to object-based, allowing cleaner traceback across languages
  • Map now preserves insertion orders
  • Path toward isolated stabilize minimum core ABI/API foundation module
  • Type traits based design that cleanly defines how values interact with Any system
  • Automatic conversion of different types based on traits if needed

Because FFI upgrade is at heart of the project, the change touches every component of the system. Importantly, this is an upgrade of the ABI so the change is not backward compatible. The code compiled under the old FFI won’t work under the new one. We did provide example ABI translation (e.g. LegacyTVMArgValueToFFIAny) functions for compatibility. The PR tries to leave files in their old places while creating redirections. The goal is to have the first milestone landed and infrastructure in place, so we can do further refactors to complete features and cleanup legacy code as trackable PRs. As of now, python binding and compiled code are under the new convention while RPC and some other bindings still relies on legacy ABI translation. We will work on upgrades in the coming PRs, including areas such as reflection, phasing out legacy redirections etc.

Please checkout the PR here, feel free to bring up questions, we will also update this post with some of the latest updates on the ffi

The changes started from an initial code module implementation in collaboration with @junrushao one year ago.

Then it independently evolved driven by the needs of a full upgrade in tree, while keeping things minimal and lightweight. We will work on upgrades in the coming PRs, including areas such as reflection, phasing out legacy redirections etc.

Upgrade Note

Some upgrade note of the dependent code

  • For general containers like Map<ObjectRef, ObjectRef>, consider use Map<Any, Any> instead
  • Use ffi::Function in place of PackedFunc
  • Any now requires explicit cast cast<T> or .as<T>() (returns optional) for better type safety
    • Likely one can insert args[i].cast<T>() to explicitly cast to T, or use typed version
  • Checkout test cases for some of the example usages
  • For places where some form of boxing are needed, the attributes mostly becomes POD type, e.g. we now use bool and int64_t for int and bool attributes

There will also be followup PRs to remove some of the legacy files and redirections. the first commit contains some of the redirections (e.g. using PackedFunc=ffi::Function) so it might be useful to first rebase to the first PR commit, then add followup PRs

Followup on DLPack based enhancement, we now can take in torch tensors as arguments https://github.com/apache/tvm/pull/17927

Rencently we upgrade our work on the newest code, honestly it is a hard work to do it, but it is worth, everybody can gain lots from the new ffi’s design and implementation.

The new ffi drop the ctypes solution, only Cython solution is provided, this make debug a little harder, now if you want to know debug the argument process logic, remember below c function name, add breakpoint in gdb.

__pyx_f_3tvm_3ffi_4core_make_args __pyx_f_3tvm_3ffi_4core_FuncCall TVMFFIFunctionCall

1 Like

Note that in the new ffi, we have better backtrace support through lib backtrace, so right now if an exception is thrown from cxx end, the trace will be incorporated into the python trace and also clickable if we are in editors like vscode

from tvm import ffi as tvm_ffi

def error_trace():
    # call a test function that raises error from c++ end
    cxx_test_raise_error = tvm_ffi.get_global_func("testing.test_raise_error")
    cxx_test_raise_error("ValueError", "error XYZ")

error_trace()

Will return something like

Traceback (most recent call last):
  File "/tvm/apps/debug.py", line 25, in <module>
    error_trace()
  File "/tvm/apps/debug.py", line 23, in error_trace
    cxx_test_raise_error("ValueError", "error XYZ")
  File "tvm/ffi/cython/function.pxi", line 228, in tvm.ffi.core.Function.__call__
    raise move_from_last_error().py_error()
  File "/tvm/ffi/src/ffi/testing.cc", line 30, in tvm::ffi::TestRaiseError(tvm::ffi::String, tvm::ffi::String)
    throw ffi::Error(kind, msg, TVM_FFI_TRACEBACK_HERE);
ValueError: error XYZ

Yes, I love it so much, for the Python debugger I prefer pudb, this future enable up and down the call stack in both Python and C++ code very smoothly, just like go through the pure C++ program’s stack in gdb TUI, so… nice.

but I found the RPC exception error message and call stack can be displayed on client side.