Unofficial autoTVM on Windows Guide

jmorrill · November 12, 2019, 5:14am

At work I only had limited access to capable Linux boxes with a GPU, but was really impressed reading about this project and the benchmarks. I wanted to the kick the tvm tires on my 16 core, NVidia 1080 (pretty slow compared to what all these guys use), but it was Windows and autotvm isn’t supported there.

I put in the time and have enough implemented to auto tune models we have on my Windows box. I hope it would be of use to someone else in a similar position as me and possibly in the future serve as a proof-of-concept for someone more knowledgeable to make official support. I do question this even as an importance as I don’t know how many Windows users are interested, and it could all be for nothing if Windows eventually gives GPU support in WSL (Won’t be in 2.0, but they have signaled they are interested)

I have uploaded a hodge-podge guide on google docs here. In no way is it a bullet proof, step by step guide. But if you are familiar with installing tvm from source on Linux, it hopefully is enough to get started on Windows. At the very least it’s some reference for myself that I am sharing

FrozenGene · November 12, 2019, 5:34am

Great Job! Could you describe more details about USE_OPENMP ? What problem when we use TVM thread pool?

I am thinking about whether we could have one section named as “Resources”, which could contain community members docs, like this , like @mli’s Dive into Deep Learning Compiler. @tqchen

jmorrill · November 12, 2019, 5:50am

It’s been a few weeks but IIRC, it was a python threading.Thread.start() in check_remote(...) (measure_methods.py) was dead locking. The thread target would never run and it would never exit thread.start().

I remember using Process Explorer to investigate the thread stacks and found it was a Python thread being blocked deep inside the CPython runtime. TVM_NUM_THREADS=1 fixed it, but using openmp, I didn’t need to set it.

jonso · December 4, 2019, 1:24am

@jmorrill I am seeing an issue when trying to auto-tune on Windows:

AttributeError: Schedule object has no attributed code_hash

It comes from this block of task.py:

ctx = ApplyConfig(ret.config_space)
    with ctx:
        with target:
            sch, _ = func(*args)
            ret.config_space.code_hash = getattr(sch, 'code_hash', None)

It seems that getattr shouldn’t be throwing here, and the default value should be used. Have you seen this before?

jmorrill · December 4, 2019, 2:03am

Are you using my custom branch?

That line throws on Windows, which I catch in my modification here: https://github.com/jmorrill/tvm/blob/9846d2c0d2480c77a5a2691fe4122757e0f248ff/python/tvm/autotvm/task/task.py#L196

jonso · December 4, 2019, 2:29am

Got it, I didn’t realize you had already fixed this. Is there any concern about committing this back to the main repo? Or fixing the root cause (Oct 2019 commit)?

jmorrill · December 4, 2019, 2:38am

That code is part of autotvm, which of course isn’t officially supported in Windows, so I didn’t want to bug the reviewers and make a PR to fix something that generally doesn’t work anyways.

jonso · December 4, 2019, 2:40am

Would you mind sending out a PR with your changes to make it work on Windows?

I’m happy to review it - I make an effort to mention Windows support in all of the changes I review. I think we should formalize the effort to make AutoTVM work on Windows

jmorrill · December 4, 2019, 3:14am

I think it would be great to formalize Windows support and would be happy to work on that, but my gut feeling is not optimistic on the buy-in from the project owners…and probably to shy to ask.

One reason is the maintenance/testing on supporting Autotvm in Windows increases quite a bit. In my branch, I’ve been careful to preserve behavior of posix platforms, but I had to add a lot of 'if os.name == ‘nt’ in there. Future changes the project owners may want to make could be encumbered by having to support Windows.

I had to do a lot of little hacks to make Windows run close to Linux speed. Most of it because there is no fork (), threading in Python is poor, and multithread.Process is very slow to spawn…so I had to cache processes and process pools.

I recently added Windows support to the C++ RPC server, which was a big perf win and less hacky compared to what I had to do with the Python RPC server.

I think if more stuff can be pushed out of Python and into C++, the more chance of having a good Windows implementation…specially in the local_executor and xgboost_cost_model.

jonso · December 4, 2019, 3:45am

I totally understand what you’re saying.

How about this: would you mind sending a PR off of your fork? On the PR, we can have more detailed discussion on the specific code design. I really think it would be worthwhile to start committing these changes back. I think the other reviewers / committers would be happy to see improved Windows support.

jmorrill · December 4, 2019, 3:51am

Sounds great. Send me a link to your repo. I only have two PRs under my belt (over my whole life) so I’m still a bit green with some git features. =)

May not be able to get to my computer tonight but will be on tomorrow.

jonso · December 4, 2019, 3:59am

You can send a PR on the public TVM repo! Then we can get more input. And no problem on timing

kice · December 17, 2019, 12:34am

I still cannot get it to work. Would you like to provide a working example?

jmorrill · December 17, 2019, 1:42am

What are you stuck on?

kice · December 17, 2019, 1:57am

Too many error, and since the code dose not print to error properly, I format it a little bit

Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\contrib\cc.py", line 185, in _windows_shared
    link_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  File "C:\Program Files\Python37\lib\subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "C:\Program Files\Python37\lib\subprocess.py", line 1207, in _execute_child
    startupinfo)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\_ffi\_ctypes\function.py", line 72, in cfun
    rv = local_pyfunc(*pyargs)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\rpc\server.py", line 84, in load_module
    m = _load_module(path)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\module.py", line 266, in load
    _cc.create_shared(path + ".so", files)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\contrib\cc.py", lin

raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'))
Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\_ffi\_ctypes\function.py", line 72, in cfun
    rv = local_pyfunc(*pyargs)
  File "C:\Program Files\Python37\lib\site-packages\tvm-0.6.0-py3.7-win-amd64.egg\tvm\autotvm\measure\measure_methods.py", line 621, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

I tried to auto tune on cuDNN, with the code from Auto-tuning a convolutional network for NVIDIA GPU

jmorrill · December 17, 2019, 2:01am

Did you build with -DUSE_CUDA=ON?

Are you executing under an “x64 Native Tools Command Prompt”?

kice · December 17, 2019, 2:03am

Yes, I run the code with VS2019 x64 native Tool cmd.

And TVM was built with CUDA and cuDNN.

jmorrill · December 17, 2019, 5:27am

Can you download and install llvm bins from http://releases.llvm.org/download.html

Make sure add to bin dir to system path is checked in the installer UI.

Next restart the x64 tools command prompt (so it picks up the new PATH var)

Seems to me lld-link.exe may not be not installed, which is around where your stack Trace is showing an issue.

jmorrill · December 19, 2019, 9:49pm

Did a draft PR for you @jonso

github.com/apache/tvm

Windows support for autotvm - Do not merge

master ← jmorrill:windows_support

opened 09:47PM - 19 Dec 19 UTC

jmorrill

+432 -99

This PR is not meant to give anyone a heart attack. @soiferj encouraged me to su…bmit this PR so he could take a peek. So please don't code review seriously for merge. Feel free to close if you don't want to look at it :) Two discuss topics related: https://discuss.tvm.ai/t/unofficial-autotvm-on-windows-guide/4711 (Google doc has notes on quirks) https://discuss.tvm.ai/t/added-windows-support-to-c-rpc-server/5007 Currently, there is no support for autotvm in Windows out of the box. Most challenges are related to fork() not being supported, which autotvm code uses extensively for getting multi-core performance in python. Having no fork() means data sent to process pools must be able to be pickled. Also, having no fork() means python pools or subprocesses need to be reused for performance reasons. This was very apparent in the `local_executor.py`, where starting a new python subprocess w/ python entry point could take almost 1000ms. To overcome these issues I have opted to use pathos library in some spots which uses dill to serialize. dill can serialize much more than pickle, notably functions. I've tried to keep the linux behavior the same, but have not tested it. Most of the time I "ifdef"ed the python code with `os.name == 'nt'` so it was easy to spot. Notable problems are: - Need to fix ipv6 in `base.py` `get_addr_family` - If IP_ANY (0.0.0.0) was having trouble, so i replaced IP_ANY with 127.0.0.1 - local_executor.py, timeouts are not supported because a pool is used for perf reasons. Timeouts will work on RPC server side if using the C++ RPC server. - I'm new to Python, so things may be able to be expressed better - Took some liberties with C++ RPC and main CMakeLists.txt, which may not be appreciated - Python RPC server, I restart the python subprocess after n-trials as some cuda kernels cause big leaks and killing the proc is the only way to fix. I suggest the C++ RPC server as its much faster. - Possibly many more.

kice · December 23, 2019, 2:15am

For the code_hash thing, here is the fix. xD