[RFC] Setting up Mypy type checking for TVM codebase

kazimuth · February 24, 2021, 6:39pm

Mypy is a static type checker for Python. It is run as a separate tool (like a linter) that looks over your codebase and finds typing errors.

Here’s a made-up example. Say you’ve got the code:

def count_layers(expr: Union[tvm.relay.Expr, tvm.relay.Function], valid_ops: List[str]) -> int:
    count_pass = LayerCounter(valid_ops)
    count_pass.visit(expr)
    # ...
    return "whoops, this isn't an int!!"

when you run mypy, you’ll get an error:

$ mypy python/tvm
tvm/relay/analysis/count_layers.py:2: error: Incompatible return value type (got "str", expected "int")

(Mypy can also catch lots of other subtler errors, this is just a simple example.)

Note that Mypy doesn’t actually run any of your code! It just parses it and looks at type annotations.

I have been working with TVM some, and I like having Mypy type checking when working in a large codebase. I was considering making a PR to add it to TVM, but before I dove into actually working on that I wanted to see if there’s interest. I think that setting up Mypy could have many benefits. The main reason that this change would not be desirable is that it would require workflow changes for all TVM contributors, which may be too much of a hassle.

Imo the real benefit of having Mypy is that it encourages developers to use type signatures throughout the codebase. This makes navigating and understanding unfamiliar code much easier. It also makes refactoring easier, since if you change the type of a function, you will immediately get a bunch of typing errors and know what else you need to change, without running tests. It also helps IDEs give much better hints about code as you write it, which may be especially helpful for TVM users (that use IDEs).

TVM actually has types currently, but they mostly live in doc comments, rather than as something that can be statically checked. I’m proposing to change types in doc comments to actual type annotations, and then set up Mypy to check those annotations. The checking can be viewed as an additional testing step, kind of like a linter. It adds no overhead at runtime, and would probably want to be run in CI.

Challenges

Workflow changes

This change may imply workflow changes for all TVM devs. If type-checking is run in CI then everybody needs to learn how to deal with and fix Mypy type errors. If type-checking is not run in CI, type annotations may go out of date, and Mypy errors will accumulate.

Dependencies

Many of TVM’s dependencies do not have types. This may be changing though; for instance, numpy recently added type annotations. Also, mypy is fine mixing typed and untyped code, so we can just tell it to ignore those dependencies for now (treat them as type Any) in the mypy config file. See, for example, the mypy config file for the XArray project, which is typed despite having many untyped dependencies.

I think we would actually want to do this for Numpy as well since we can’t guarantee that numpy 1.20 will be installed on users’ machines for the forseeable future; previous versions didn’t have types.

FFI

TVM has APIs defined at runtime through FFI, which are difficult to annotate types for, and will need to be kept in sync with the TVM C++ codebase. I see 3 possible solutions for this:

Annotate all FFI calls with #type: ignore comments, which will instruct Mypy not to check them. This is a pain and requires pervasive changes to the codebase.
Create .pyi stub files for all FFI modules, either by hand or through an automated script.
Create a Mypy plugin to override type checking for all TVM FFI modules.

Of these, I think the best option would be manual maintenance of ffi stub files, possibly assisted by scripts. For example we could have a script “generate_ffi_stubs.py” that you run whenever you make a change to the global registry on the C++ side. The stubs could be fairly lightweight, we could just type all PackedFuncs and types exposed to python as “Any”. This is a workflow change that everybody would have to get used to though.

Alternatively we could try and create a mypy plugin to do this automatically whenever mypy is invoked. But I’m not sure it’s actually possible to inject names into a module using a mypy plugin; see the plugin API. Basically there’s no way to inject names into a module when you see a particular function call (i.e. tvm._ffi._init_api).

False positives

In using Mypy I’ve noticed that it complains about some things that you might not typically think of as “errors” in python code. For instance, mypy tracks whether values can be None, and throws an error when you pass a possibly-None value to a function that is not annotated as accepting None. It will also complain if you change the API of a method in a subclass compared to a superclass.

These errors can be annoying in the short-term but I think in the long term they may be beneficial, in encouraging more rigor in the TVM codebase. Of course you can always suppress type errors by commenting # type: ignore on the relevant line of code, so they shouldn’t end up blocking anything in the short term.

Mypy Setup Documentation

Users will immediately get the benefits from having type annotations in TVM in terms of IDE support. If they also want to run the Mypy checker on their own code, they’ll probably need to set up some custom configuration for Mypy based on TVM. This could be added as a new documentation page / tutorial.

Conclusion

So yeah, that’s the idea. I think this change could improve the reliability and accessibility of the TVM codebase, at the cost of everybody learning a new workflow. I think this could be done in a couple of PRs; and maybe the type-checking could be kept as a non-blocking error in CI for a while as people get used to it. I’d be happy to make the PRs, I’m familiar with refactoring codebases to use Mypy.

Key questions:

Do people think this change would be worthwhile to make? Are there reservations?
How should we manage typing for FFI modules?
How in-depth do we want to set up types? We could do fairly lightweight types for most of TVM and leave invariants to be checked at runtime, or try and set up smarter typings that allow more things to be checked during type-checking. This can get essentially as fancy as we want using a Mypy plugin.

junrushao · February 24, 2021, 6:54pm

CC @Hzfengsy. I think we need set up a rule that new code is properly type annotated, like our TensorIR upstreaming work

kazimuth · February 24, 2021, 7:05pm

Oh, one other question just occurred to me: Should we keep types in doc comments, or remove them and only have them in the function signature? If we have both, they can get out of sync. But we’d need the documentation generator to understand the type signatures, and idk how TVM’s documentation is set up right now.

junrushao · February 24, 2021, 7:12pm

I don’t have a clear plan yet for existing codebase. @tqchen @jroesch may have some ideas

tqchen · February 24, 2021, 7:42pm

Thanks @kazimuth for proposing this! I agree this is a worthwhile direction to take.

The main question is how to enable incremental adoption. Given most FFI have wrapper functions in python, it might be fine to ignore typing of FFI and just go ahead and type the wrapper functions. We might be able to start by introducing Mypy, and ignore most of the codebase, then gradually increasing the range that requires type coverage.

On the documentation end. I experimented with the sphinx napoleon plugin for type annotations, and the rendering is not as perfect as having type annotations on the docstring. The other alternative plugin GitHub - agronholm/sphinx-autodoc-typehints: Type hints support for the Sphinx autodoc extension is also not perfect in resolving the type names and provide the right type (when a type is exposed in multiple scopes).

So to get good rendering(without manually add type to the docstring) we might need to hack up our own solution sphinx plugin, or improve existing ones.

Given that we are typing the parameter name twice in the code(once in the argument list and another time in the docstring), one could argue that we can do the same for type types. So it might be helpful to start with add type annotation to both docstring and the arguments. We could incorporate a sphinx plugin that checks the consistency between the two later.

kazimuth · February 24, 2021, 7:57pm

The main question is how to enable incremental adoption. Given most FFI have wrapper functions in python, it might be fine to ignore typing of FFI and just go ahead and type the wrapper functions. We might be able to start by introducing Mypy, and ignore most of the codebase, then gradually increasing the range that requires type coverage.

I agree that this is the right route to take. Mypy has a flag check_untyped_defs, which if set to False will disable Mypy errors for all functions without type signatures. We can then introduce types over time. Some modules do have types currently though, so any errors in those modules will need to be fixed in the initial PR.

However, this means that if we add types to a wrapper function, it will immediately throw an error because you’re calling an FFI function that, as far as Mypy is concerned, doesn’t exist! So then you have to add #type: ignore all over the place.

I think a script that walks the registry looking at names and sticking them into .pyi files would be quite easy to write. So you’ll just get an _ffi_api.pyi file in each directory (note: .pyi, not .py) that looks like:

AnnotatedRegionSet: Any
CallGraph: Any
CollectDeviceAnnotationOps: Any
check_basic_block_normal_form: Any
detect_feature: Any
# ...

These files will be ignored at runtime but MyPy will read them while checking. Then you don’t have to put # type: ignore everywhere, but you’re still doing most of the typing work in the wrappers.

Given that we are typing the parameter name twice in the code(once in the argument list and another time in the docstring), one could argue that we can do the same for type types. So it might be helpful to start with add type annotation to both docstring and the arguments.

Yeah, this seems reasonable. And if something gets out of sync it’s not the end of the world.

mdw-octoml · February 24, 2021, 7:55pm

Just wanted to add a big +1 to this. Python type annotations and checking have been invaluable in many projects I have worked on, and it is absolutely worth the effort - in my experience - to invest in this.

jroesch · February 24, 2021, 8:12pm

This is probably ancient history now but the very first version of Relay was completely typed using MyPy and we ended up removing most of the annotations due to Python versioning issues at the time. The approach we had taken before was to write a .pyi for every raw FFI export so we would have ffi.py expose the functions from C++ and then a corresponding .pyi file which provided stubs for all method signatures.

We then could easily type all the wrapper methods and classes using standard MyPy, later when we auto-generate bindings we could automate the generation of the .pyi files.

It would be nice to move away from writing types twice in the long term, but I think we could first do a pass bringing them into the types and later going back and cleaning up docs if we find an acceptable solution.

I think many of the internal methods don’t have docstrings anyways so it would still be a minority of functions which require repeating them twice.

I think the right approach for incremental adoption is to just run the checker in the less strict mode and slowly turn on types, one day in the future we can turn on more strict enforcement.

tkonolige · February 24, 2021, 8:59pm

If we only run Mypy on the CI, I think we can ensure that we have up to date packages from numpy.

kazimuth · February 24, 2021, 9:57pm

Oh interesting, I didn’t know that. TVM as a whole is still on Python 3.6 right?

Also it occurs to me that a problem with the stub generation approach would be that, if not all parts of TVM are always built, some functions will be missing from the global registry. So the results of stub generation will be non-deterministic, which is unfortunate…

Okay, another solution would be to change the _ffi_api modules into objects that live in a module. Then you could have (e.g. in python/tvm/ir/__init__.py)

_ffi_api = tvm._ffi.init_api("tvm.ir") # type: Any

Then in python/tvm/ir/expr.py, you can still do from . import _ffi_api, but you’d be importing that object rather than a module. Then you can call _ffi_api.Stuff() and Mypy will not complain.

Alternatively, we can maintain the FFI stubs by hand, or put # type: ignore on every FFI call.

That’s true. However if we rely on the numpy types being present, if you’re on a version less than 1.20 during development you won’t be able to run mypy locally. (In my current conda install numpy 1.18 was selected by default, not sure what affects that.)

jroesch · March 2, 2021, 11:16pm

At the time we still had to support 2.7 which in retrospect we should of pushed harder for everyone to move to newer version of Python. The Python community is glacially slow at upgrading.

@kazimuth I don’t think we should have to make the stub generation non-deterministic.

In the previous work we just defined the FFI modules as files which lived in each sub-module and then put the .pyi files directly next to the implementation file which used the TVM built-in to initialize the FFI.

junrushao · March 2, 2021, 11:18pm

@jroesch Is there anything we can do in the CI to enforce type annotation for new code?

mikepapadim · June 22, 2021, 7:29pm

Regarding MyPy , I just opened a PR (Initial support for enabling MyPy in CI by mikepapadim · Pull Request #8302 · apache/tvm · GitHub) which adds initial support for MyPy as part of the initial linting stage in Jenkins.

Currently, errors are ignored in order to pass Jenkins pipeline.

IMO this PR can be merged as initial support.

The question that raises is how to proceed after that. Enabling MyPy on the whole project produces thousands of types errors. So, I think is feasible fixing type errors in some modules and enabling it after.

What do you think about that? Is there any preference for modules that can benefit immediately?

Wheest · July 31, 2021, 10:09am

Much of the Python codebase has type hinting in the comments (for example in auto_scheduler code). In theory adding type hinting in function arguments could be automated in this case.

For Python code that does not have that, what kind of transition plan would make sense? Notifying authors of functions from git blame, and giving them some months to make a new PR?

Or perhaps making it a community effort to transition? It could be a good way to onboard new and early stage contributors, since a) the required code changes are not that complex b) could increase familiarity with the codebase c) CI should do most of the code review. I know as an early stage contributor myself, I would be keen to scour the codebase, adding type hinting, and learning more about the structure of the project.

junrushao · August 1, 2021, 9:56pm

@Wheest I am not super certain if there is a fully automated process given our current docstring-based type annotation may be casual or inaccurate, but it is definitely doable if we could call for community contribution