Mypy is a static type checker for Python. It is run as a separate tool (like a linter) that looks over your codebase and finds typing errors.
Here’s a made-up example. Say you’ve got the code:
def count_layers(expr: Union[tvm.relay.Expr, tvm.relay.Function], valid_ops: List[str]) -> int:
count_pass = LayerCounter(valid_ops)
count_pass.visit(expr)
# ...
return "whoops, this isn't an int!!"
when you run mypy, you’ll get an error:
$ mypy python/tvm
tvm/relay/analysis/count_layers.py:2: error: Incompatible return value type (got "str", expected "int")
(Mypy can also catch lots of other subtler errors, this is just a simple example.)
Note that Mypy doesn’t actually run any of your code! It just parses it and looks at type annotations.
I have been working with TVM some, and I like having Mypy type checking when working in a large codebase. I was considering making a PR to add it to TVM, but before I dove into actually working on that I wanted to see if there’s interest. I think that setting up Mypy could have many benefits. The main reason that this change would not be desirable is that it would require workflow changes for all TVM contributors, which may be too much of a hassle.
Imo the real benefit of having Mypy is that it encourages developers to use type signatures throughout the codebase. This makes navigating and understanding unfamiliar code much easier. It also makes refactoring easier, since if you change the type of a function, you will immediately get a bunch of typing errors and know what else you need to change, without running tests. It also helps IDEs give much better hints about code as you write it, which may be especially helpful for TVM users (that use IDEs).
TVM actually has types currently, but they mostly live in doc comments, rather than as something that can be statically checked. I’m proposing to change types in doc comments to actual type annotations, and then set up Mypy to check those annotations. The checking can be viewed as an additional testing step, kind of like a linter. It adds no overhead at runtime, and would probably want to be run in CI.
Challenges
Workflow changes
This change may imply workflow changes for all TVM devs. If type-checking is run in CI then everybody needs to learn how to deal with and fix Mypy type errors. If type-checking is not run in CI, type annotations may go out of date, and Mypy errors will accumulate.
Dependencies
Many of TVM’s dependencies do not have types. This may be changing though; for instance, numpy recently added type annotations. Also, mypy is fine mixing typed and untyped code, so we can just tell it to ignore those dependencies for now (treat them as type Any) in the mypy config file. See, for example, the mypy config file for the XArray project, which is typed despite having many untyped dependencies.
I think we would actually want to do this for Numpy as well since we can’t guarantee that numpy 1.20 will be installed on users’ machines for the forseeable future; previous versions didn’t have types.
FFI
TVM has APIs defined at runtime through FFI, which are difficult to annotate types for, and will need to be kept in sync with the TVM C++ codebase. I see 3 possible solutions for this:
- Annotate all FFI calls with
#type: ignore
comments, which will instruct Mypy not to check them. This is a pain and requires pervasive changes to the codebase. - Create
.pyi
stub files for all FFI modules, either by hand or through an automated script. - Create a Mypy plugin to override type checking for all TVM FFI modules.
Of these, I think the best option would be manual maintenance of ffi stub files, possibly assisted by scripts. For example we could have a script “generate_ffi_stubs.py” that you run whenever you make a change to the global registry on the C++ side. The stubs could be fairly lightweight, we could just type all PackedFuncs and types exposed to python as “Any”. This is a workflow change that everybody would have to get used to though.
Alternatively we could try and create a mypy plugin to do this automatically whenever mypy is invoked. But I’m not sure it’s actually possible to inject names into a module using a mypy plugin; see the plugin API. Basically there’s no way to inject names into a module when you see a particular function call (i.e. tvm._ffi._init_api
).
False positives
In using Mypy I’ve noticed that it complains about some things that you might not typically think of as “errors” in python code. For instance, mypy tracks whether values can be None, and throws an error when you pass a possibly-None value to a function that is not annotated as accepting None. It will also complain if you change the API of a method in a subclass compared to a superclass.
These errors can be annoying in the short-term but I think in the long term they may be beneficial, in encouraging more rigor in the TVM codebase. Of course you can always suppress type errors by commenting # type: ignore
on the relevant line of code, so they shouldn’t end up blocking anything in the short term.
Mypy Setup Documentation
Users will immediately get the benefits from having type annotations in TVM in terms of IDE support. If they also want to run the Mypy checker on their own code, they’ll probably need to set up some custom configuration for Mypy based on TVM. This could be added as a new documentation page / tutorial.
Conclusion
So yeah, that’s the idea. I think this change could improve the reliability and accessibility of the TVM codebase, at the cost of everybody learning a new workflow. I think this could be done in a couple of PRs; and maybe the type-checking could be kept as a non-blocking error in CI for a while as people get used to it. I’d be happy to make the PRs, I’m familiar with refactoring codebases to use Mypy.
Key questions:
- Do people think this change would be worthwhile to make? Are there reservations?
- How should we manage typing for FFI modules?
- How in-depth do we want to set up types? We could do fairly lightweight types for most of TVM and leave invariants to be checked at runtime, or try and set up smarter typings that allow more things to be checked during type-checking. This can get essentially as fancy as we want using a Mypy plugin.