Add a docker image for caffe development. #3518

elezar · 2016-01-05T15:30:28Z

The idea is to add docker files for Caffe. Both for development purposes (my original motivation) and for providing pre-built Caffe images. The following steps can be considered:

Create an "official" Caffe image that allows for caffe to be run without first building it (NVIDIA has their own version at https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile, for example). This would be used as in Add a docker image for caffe development. #3518 (comment).
Automatically build the Docker image and push it to Docker Hub (or some other repository).
Clean up the development Docker file (and the script around it) to be more concise and suited to starting a Caffe build environment.

elezar · 2016-01-06T08:46:49Z

Thanks for the pointers, I will have a look at the best practices and adjust the docker file accordingly. The code to change the user started as a more general script, and I can try to simplify it to be better suited to the task of building caffe.

In terms of usefulness, I use the image locally and thought I would submit it as a pull request so that it is part of caffe as it was also mentioned in #2313 (comment).

If it is not found to be useful, we can close the pull request without merging, and I will move the utilities to its own repository.

flx42 · 2016-01-06T17:45:02Z

I just think there is a lot of code to set the user, it might scare Docker beginners.

But don't get me wrong, I think this PR is very useful, the Caffe users list is full of topics where people are struggling to compile Caffe on their machine. You don't need to convince me of the benefits offered by Docker, but I'm not a maintainer of Caffe.
I think it would be great to have an additional Dockerfile where Caffe and all its dependencies would be built, this would be for users that are not interested in doing low-level Caffe development. Using Caffe won't even require cloning the github repo anymore, it would be done in the Dockerfile, using Caffe would be as simple as this:

docker build -t caffe github.com/BVLC/caffe#:docker
nvidia-docker run -ti caffe [args...]

AwokeKnowing · 2016-01-06T18:28:05Z

This is excellent. This will allow a reliable way to guide students through tutorials, as well as let programmers cleanly keep up to date with caffe releases as well as keep their scripts running on previous versions if breaking changes occur. Hopefully there will be a repository on dockerhub of images of past versions (from here on out). I hope the dockerfile can get some attention and get "generalized" or whatever it needs to be a standard way of using caffe.

elezar · 2016-01-07T12:57:21Z

Given that there seems to be demand, what needs to be changed/added so that this is as useful as possible? Some thoughts:

Create an "official" Caffe image that allows for caffe to be run without first building it (NVIDIA has their own version at https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile, for example). This would be used as in Add a docker image for caffe development. #3518 (comment).
Automatically build the Docker image and push it to Docker Hub (or some other repository).
Clean up the development Docker file (and the script around it) to be more concise and suited to starting a Caffe build environment.

Something that would be nice to have would be that the docker images be tagged using the Caffe version. What is the status of #3311?

flx42 · 2016-01-11T17:36:49Z

I think those are good ideas, I would keep things simple as a start, it will be easier for the project maintainers to review, especially if they don't have any prior experience with Docker.
I think we should start with a single Docker image where Caffe if entirely pre-built, new users will be able to start a shell with Caffe installed with just a few commands.
Then, if this first approach is successful, we could add another image for development, for advanced users.

elezar · 2016-01-25T13:41:04Z

I have made some changes to the Dockerfiles. There are now two (CPU and GPU) Dockerfiles which provide runtime containers. These can be used as replacements for the caffe executable?

I have also simplified the script for running the development container somewhat.

I would appreciate input wrt finishing this pull request off? Does BVLC have a Docker Hub orgainisation, for example, so that we can set up triggered builds there? Are there any comments related to where the Dockerfiles are specified, and whether other flavors (e.g. OpenBLAS, CUDA 7.0, non-CUDNN) should be made available?

flx42 · 2016-01-26T22:06:14Z

I think this is a good start, I don't think other flavors are required right now. The benefit of Docker is to bundle all the dependencies, so for instance cuDNN is already included in the image. It will be transparent for users.

I would use the cmake build system actually instead of having to copy Makefile.config.example and then using sed to modify the options.

elezar · 2016-01-29T16:55:52Z

Ok. I will switch the docker files to cmake instead and only have the two flavours (CPU and GPU with the latest tested CUDA and CUDNN).

Any comments on the location of the docker files? Or are docker/runtime/cpu and docker/runtime/gpu ok for this purpose?

flx42 · 2016-01-29T17:34:27Z

Regarding the location: ask the maintainers, they are all labeled to this PR so they will probably answer when they have the time :)

flx42 · 2016-02-10T19:47:09Z

@elezar Are you done working on this PR? If you're done let me know so I can review it again.

@shelhamer: you tagged this PR, is Docker going to be on your roadmap in the future?

elezar · 2016-02-10T22:29:12Z

From my side, the Docker images are done. If someone could check out the PR
and merge, or give me feedback, that would be greatly appreciated.

flx42 · 2016-02-11T22:11:23Z

Please squash all your patches into one for easier review (see git rebase --interactive).

elezar · 2016-02-12T14:09:17Z

@flx42 I have squashed the changes, and tried to make the docker files more in keeping with the steps given on the installation website.

I have also removed the additions (such as jupyter ports) which are not core to the requirements of the images. I have added Dockerfile generation options to the makefile (although they are themselves checked in) so that these are generated automatically from templates. This ensures that they are uniform, and adds some options in terms of extending these for future versions.

jeffdonahue · 2016-02-12T22:02:36Z

docker/templates/devel.template

+    wget
+
+# Install the python requirements.
+RUN wget https://raw.githubusercontent.com/BVLC/caffe/master/python/requirements.txt -O /tmp/requirements.txt && \


I've never used docker so I'm definitely not an expert, but is it necessary to wget requirements.txt like this rather than getting it from the (presumably already downloaded) local copy of caffe?

As I mentioned in response to one of @flx42 's comments, the development
image is intended to be used to build caffe from source locally (with
folders mounted as volumes. See start_caffe_docker.sh for example). This
means that the caffe source (and thus requirements.txt) is not available to
the image at build time. You will note that the runtime Dockerfiles (and
their corresponding templates) do use the locally available requirements
files.

Of course, if someone has a suggestion as to how this could be improved,
I'm all ears.

This means that the caffe source (and thus requirements.txt) is not available to the image at build time.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place). Yes, you may need to run docker build at a higher level (run it at the root level of the repo, and ignore some things in .dockerignore to make it build faster), but everything is visible to the docker build script.

Why not just ADD the single file to /tmp/ (ADD python/requirements.txt /tmp/).

If you use ADD and change requirements.txt, re-building the docker image will see the new version of the file, and use that. That will not happen with RUN since the command does not change (and thus you will use a stale copy).

If you use RUN, the build will always use the remote master repository version (via wget), leading to some unintuitive errors if one tries to edit requirements.txt in a local copy and re-build the image.

seanbell · 2016-02-13T15:52:07Z

docker/runtime/gpu/Dockerfile

+
+CMD bash
+# Clone Caffe repo and move into it
+RUN cd /opt && git clone https://github.com/BVLC/caffe.git && cd caffe && \


Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

A more flexible approach would be to use ADD to add the local source files into /opt/. You check out the code first, then run docker build -t <some-tag> docker/runtime/gpu/Dockerfile.

seanbell · 2016-02-13T16:05:26Z

@elezar How do you see these images as being used? I understand that you want to create a canonical version that uses the official master-branch code, but I don't the proposed code achieves that.

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

I agree that for development you want to mount in the source code, and that is indeed helpful. But for everything else that gets copied into the image itself, I think ADD is the right way to go, not git clone or wget. If you do it with ADD, two people building the same container (from the same commit or git tag) will get the same result.

Alternatively, if you really want to use wget or git clone (which I think is a bad idea), you need to change the URL to refer to a specific commit or version number.

flx42 · 2016-02-13T19:12:25Z

Thank you for joining the debate, @seanbell! Disclaimer: if you don't know, I'm a maintainer of https://github.com/NVIDIA/nvidia-docker

Let's merge the answers into the main thread.

Why not? The source is available locally because it was checked out (in order to get the Dockerfile in the first place).

Not with his approach, he wants users to compile Caffe after shelling inside the container.
The docker build step will simply setup an environment with all the dependencies installed, the current code is mounted when executing:
https://github.com/zalando/caffe/blob/314152192e69b1fd83d7db962cb88add2fb64c0f/docker/start_caffe_docker.sh#L36
I think the idea is that you do all your development (coding, compiling, debugging) while staying inside the same container shell and the file edits will be persisted to your host filesystem (because of the volume).
But this looks like an advanced use case and I'm not sure we should encourage this method, at least not now.
Also, @seanbell is right, the wget approach looks error prone.

Why not instead require that you check out the repository, and then run docker build inside it? The way you've written it, you can only create a runtime docker image for the remote master branch. What if I want to make a runtime image for my own build? People modify caffe and have local versions all the time.

I think this use case is more suited to the "devel" image. The runtime image is precisely for people that don't want to do that. They want to start using Caffe, or they are already advanced users but they don't have custom layers. People do use deb packages sometimes, right? :)
Ideally, the runtime image could simply install a Caffe deb package and all the dependencies, this is what we did for NVIDIA/caffe:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/caffe/Dockerfile

An advantage of this self-contained approach is that you don't need to clone the code, you can build a Dockerfile from GitHub directly:

docker build -t caffe github.com/BVLC/caffe#:docker

I like the convenience of avoiding to copy the whole repo on my local file system. And the point of this runtime image is to rely on a pristine, well-tested version, you don't want to ADD your local changes by mistake. Since there are no deb packages currently, we can git clone to a specific hash or tag instead. Always checking out the latest master is not good for reproducibility, indeed.

@seanbell, what you suggest sounds fine for the devel images and maybe that's what should be done, instead of having to mount the source at container launch time. Developers would then still be able to use their own branches.

Retrospectively, I think it might be better to start simple by only including the runtime images in this PR. This would be sufficient for caffe newcomers during hands-on tutorials.
The runtime images should be easier for the maintainers to review, it's pretty straightforward: it just looks like a sequence of build steps.
The devel image could be in a latter PR once people are more familiar with Docker.

elezar · 2016-02-14T03:49:06Z

Thanks for the comments @seanbell, and also the nice summary of the use cases @flx42. I think you may be right in suggesting that this pull request only include the runtime images. This is the use case that probably caters most to the target audience. That is to say people who want to quickly try caffe, or for a sort of archiving for particular versions. I will remove the other files for the time being and add them as a separate pull request (I should still be able to address my own use-case with some shell scripts, although the images may be larger than needed).

With regards to:

The way it is written, the RUN command always checks out from master which doesn't have any version information. That isn't reliable since (a) the docker cache won't re-run the RUN commands by default, and (b) two people who build the same container will get different results since the git HEAD of the master branch changes over time.

Yes, that is true. I do have the following to add:
Even if the command were to be run again, anyone building the images would have a bleeding edge (or possibly broken) version. One option would be to use the release tags (once they are available), or even just specify a particular hash. Ideally the update of the tag/hash in the docker file should be automated so that the overhead of releasing a new tag is reduced. I'm not sure how to get past the different version problem -- other than providing tagged images on Docker Hub or the like.

shelhamer · 2016-02-27T03:52:08Z

I was able to build the caffe:{cpu,gpu} images and run with docker, nvidia-docker respectively to invoke caffe --version and open a shell.

However, the LeNet docker example fails because nvidia-docker neither recognizes -v nor -w and instead requires the full --volume= and --workdir= arguments. @elezar could you update the example script? @flx42 is this normal with nvidia-docker or did I mess up?

Here is an example call

nvidia-docker run --rm -ti -u 1000:1000 --volume=/home/shelhamer/caffe-dev:/workspace --workdir=/workspace caffe:gpu caffe train --solver=examples/mnist/lenet_solver.prototxt

that ran fine.

flx42 · 2016-02-27T04:41:40Z

@shelhamer Yeah that's a bug in nvidia-docker caused by the Docker 1.10 update, their options are now ambiguous: NVIDIA/nvidia-docker#46
I have a patch for this, I will push it Monday if it passes my tests.
The current workaround is to use long options as you discovered. So, it's not related to this PR; but it doesn't hurt to use long options since it's more explicit anyway.

shelhamer · 2016-02-27T05:09:08Z

Alright, well I'm happy to merge once the example works whether by long options in this PR or a fix to nvidia-docker. @elezar feel free to force push an amended commit.

(If you do amend, consider formatting the commit message so it has a "title" line <80 chars and then further description after a newline as it formats better on the command line and on github, but no worries.)

These can be used as direct replacements for the Caffe executable.

elezar · 2016-02-27T09:53:14Z

I have switched to the --volume and --workdir options instead of -v and -w, respectively. These changes have also been reflected in the README. I was only able to run a quick sanity check on OSX. I don't currently have access to a Linux box. @shelhamer if you could just try to run the example again, and merge if you are happy.

[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)

shelhamer · 2016-02-27T20:08:40Z

Thanks for the Caffe containers @elezar ! Next we'll figure out how to include these on the docker hub.

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

Fix flags for nvidia-docker from #3518

flx42 · 2016-02-27T20:20:20Z

Thank you for the additional fix @shelhamer, this bug is embarrassing for us, we will fix it quickly.

elezar · 2016-02-28T12:59:52Z

Thanks @shelhamer.

As for getting the images onto Docker Hub, I see two options. Either one sets up an automated build on Docker Hub itself (I have already done this for https://hub.docker.com/r/elezar/caffe), or one builds the images as part of some CI/CD steps and pushes them. I think the latter is the route that NVIDIA follows for https://hub.docker.com/r/nvidia/. Maybe @flx42 could comment on this?

I should add that for the automated builds on Docker Hub seem to be failing for the GPU images (with no log output). @flx42 do you have an idea what the reason for this may be?

flx42 · 2016-02-29T05:42:46Z

@elezar: we tried using the automated build on the DockerHub but we gave up on this idea because it was not suitable for our complex model with multiple images (see the discussion here).
In your case, I think both options should work. But be aware that doing automated testing for the GPU images could be challenging.

I don't know why it would fail on the Docker Hub, it used to work when I tested.

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)

Fix flags for nvidia-docker from BVLC#3518

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC/caffe#3518 (comment)

shelhamer added enhancement community JD JL ES labels Jan 11, 2016

elezar mentioned this pull request Jan 15, 2016

Add library versioning and mark version 1.0.0-rc3 #3311

Merged

elezar force-pushed the feature/docker_images branch from 6b63146 to 6546ca7 Compare January 25, 2016 13:37

elezar force-pushed the feature/docker_images branch from 6a3ba7e to a890083 Compare February 10, 2016 09:13

elezar force-pushed the feature/docker_images branch from 462d18a to 3141521 Compare February 12, 2016 14:06

jeffdonahue reviewed Feb 12, 2016
View reviewed changes

elezar force-pushed the feature/docker_images branch 2 times, most recently from 462d18a to 3141521 Compare February 13, 2016 11:05

seanbell reviewed Feb 13, 2016
View reviewed changes

elezar force-pushed the feature/docker_images branch from e3059ec to 7d62930 Compare February 27, 2016 09:47

Add Dockerfiles for creating Caffe executable images.

6cba462

These can be used as direct replacements for the Caffe executable.

elezar force-pushed the feature/docker_images branch from 7d62930 to 6cba462 Compare February 27, 2016 09:50

shelhamer added a commit that referenced this pull request Feb 27, 2016

Merge pull request #3518 from zalando/feature/docker_images

59d099c

[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)

shelhamer merged commit 59d099c into BVLC:master Feb 27, 2016

shelhamer added a commit to shelhamer/caffe that referenced this pull request Feb 27, 2016

fix flags in BVLC#3518 for nvidia-docker

cfa2c0c

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

shelhamer added a commit that referenced this pull request Feb 27, 2016

Merge pull request #3740 from shelhamer/fix-docker-flags

04aa36e

Fix flags for nvidia-docker from #3518

elezar deleted the feature/docker_images branch February 28, 2016 10:38

BlGene pushed a commit to BlGene/caffe that referenced this pull request Feb 29, 2016

fix flags in BVLC#3518 for nvidia-docker

5bc9aaa

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

BlGene pushed a commit to BlGene/caffe that referenced this pull request Mar 4, 2016

fix flags in BVLC#3518 for nvidia-docker

06690ff

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Mar 17, 2016

fix flags in BVLC#3518 for nvidia-docker

4ca67f2

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

SvenTwo pushed a commit to SvenTwo/caffe that referenced this pull request Apr 6, 2016

fix flags in BVLC#3518 for nvidia-docker

f6c4879

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

fix flags in BVLC#3518 for nvidia-docker

94c4db6

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

Merge pull request BVLC#3518 from zalando/feature/docker_images

4a75fa9

[build] Add docker images for running caffe out-of-the-box (caffe:cpu, caffe:gpu)

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

Merge pull request BVLC#3740 from shelhamer/fix-docker-flags

8ebb8bf

Fix flags for nvidia-docker from BVLC#3518

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016

fix flags in BVLC#3518 for nvidia-docker

a175281

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Oct 24, 2016

fix flags in BVLC#3518 for nvidia-docker

0770031

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

fix flags in BVLC#3518 for nvidia-docker

f9658a6

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

fix flags in BVLC#3518 for nvidia-docker

844f732

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

fix flags in BVLC#3518 for nvidia-docker

7f1dc98

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

zouxiaochuan pushed a commit to zouxiaochuan/caffe that referenced this pull request Feb 15, 2017

fix flags in BVLC#3518 for nvidia-docker

029f431

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC#3518 (comment)

acmiyaguchi pushed a commit to acmiyaguchi/caffe-fast-rcnn that referenced this pull request Nov 13, 2017

fix flags in #3518 for nvidia-docker

cf8e653

nvidia-docker requires long args with equal sign as of docker 1.10: see BVLC/caffe#3518 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a docker image for caffe development. #3518

Add a docker image for caffe development. #3518

Add a docker image for caffe development. #3518

Add a docker image for caffe development. #3518

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment