Hi,
thanks for the tutorial. I am happy things have worked out for you, and this docker solution seems very elegant. I know this is not a debugging forum, but I am sort of running out of options and would like to ask for your help!
I am trying to apply your solution myself and it seems that my TensorFlow container is not using the CUDA driver, because I get all the way to the python console inside the container and when trying to run a TensorFlow session, I get an error saying that there is no GPU device available (I ran a script which multiplies two constants a,b in a tf.matmul(a,b) node). Something like:
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
…
…
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
Everyone keeps saying that all I need is the NVIDIA driver, but my nvidia-smi is showing that my driver is correctly installed :
NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1
I have a GeForce GTX 970 which has been running TensorFlow in the past, but since I migrated to Ubuntu 18.04, this is the first time I tried TensorFlow on my GPU.
Where would you start?
Thanks a lot again for the tutorial