[discussion] TVM Docker Image Refactoring

hogepodge · June 16, 2021, 10:05pm

As I’ve been working on TVM, I’ve found it useful to have a configurable Docker image to help with tutorial development and demos. The TVM project also includes a number of configurations used in CI and demos that are written to test on a variety of platforms. The TLC Project also has a collection of Docker images that are being used to build out packaging. I would imagine that others in the community have written their own solutions to meet their particular aims, and I wanted to start a discussion about how we might be able to reconcile these efforts and come up with a more general solution that could meet a variety of use cases.

Some of these use cases might include:

An easy to use development environment that works with both shell access and IDEs like VSCode.
CI/CD and testing.
Model optimization platform.
Minimized Runtime for training and inference.
Different features enabled in TVM to meet specific needs.

These use cases also include a variety of platforms:

Multiple CPU architectures, including x86 and Arm
Multiple GPU architectures, including NVidia and AMD

The approach I’ve taken in my personal project, located at GitHub - hogepodge/tvm-docker: A basic Docker-based installation of TVM, uses a pipeline method that addresses some of the use cases. Currently in TVM custom configurations are defined by scripts that are copied into the base image, then executed during build time. To make this method more generic, the Docker build system can broken into two major phases:

Configuration: in this phase, for a custom configuration system and python packages can be installed, cmake settings can be declared, and additional custom scripts can be run to prepare the environment for building. Custom scripts are located within subdirectories of the config directory.
Build: in this phase, custom build scripts for different parts of the project can be run. Custom scripts are located within subdirectories of the build directory.

In between the configuration and build phases, TVM is downloaded from GitHub.

The configuration subdirectories can have four standard files:

apt.txt: This file contains a list of system packages to be installed.
pip.txt: This file contains a list of pip packages to be installed.
cmake.txt: This file contains cmake configuration directives.
custom.sh: This file contains a custom bash script.

Additional files may be included in the subdirectory, and used by the custom.sh.

The build subdirectories expect one standard file:

build.sh: The script to execute for the build phase.

Additional files may also be included in this subdirectory.

A build is customized by specifying configuration and build pipelines. A pipeline is a list of colon-separated names that correspond to config or build subdirectories. Order matters, as cmake directives that appear later in the pipeline will override earlier directives.

The pipelines are defined in the environment variables:

CONFIG_PIPELINE
ARG BUILD_PIPELINE

For example, the current default build will run:

ARG CONFIG_PIPELINE=base:devel
ARG BUILD_PIPELINE=build:python:docs

Where base will install the minimum set of packages required to build TVM, and devel will add support for working with GitHub inside of the container. build will build TVM, python will build and install Python bindings, and docs will build the docs.

Some benefits of this approach is that it allows for fewer Dockerfiles to be maintained, and reduces code duplication. The default build is suitable for development work, and is easily modified to support other configurations. Dependencies for specific configurations are clearly identified. It provides a basic structure for standardizing how configurations are handled, and can be expanded to support new needs (for example, it may be necessary to add pre-run scrips to the config phase.

Currently this approach does not take into account base images other than ubuntu, which limits builds meant to work with CUDA or for more generic packaging. However, this structure could be incorporated into different base images, and possibly take advantage of templates.

This is one example of how to handle the combinatorial complexity of building and distributing TVM Docker images. I’m curious to know how others are handling Docker builds and if there are more appropriate solutions available.