Installing CUDA with FAI

Thu Oct 24 14:59:19 CEST 2024

>>>>> On Thu, 24 Oct 2024 14:50:20 +0200 (CEST), Stephan Frank <stephan.frank at ini.rub.de> said:

    > Amongst other approaches I have tried the runfile installation like so:

    >> chroot /target apt install -y make linux-headers-$(uname -r)
    >> chroot /target wget -nc https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run
    >> chroot /target sh cuda_12.6.2_560.35.03_linux.run --driver --toolkit
I never used the run files. I always use the .deb packages.

    > This usually hangs because it wants to uninstall nouveau drivers and asks for permission via a graphical interface.
Why not removing the nouveau package via FAI before calling a
customization script?

    > Bonus question: Is there a good way to autmatically figure out whether the machine can even use CUDA/nvidia drivers? So I don't have to sort machines by hardware in the class file.
There's the package nvidia-detect.

Here's some code I use:

NV_DEVICES=$(lspci -mn | awk '{ gsub("\"",""); if (($2 == "0300" || $2 == "0302") && ($3 == "10de" || $3 == "12d2")) { print $1 } }')
if [ -n "$NV_DEVICES" ]; then
   echo NVIDIA
fi

or

nvidia-smi -L >/dev/null 2>/dev/null
if [ $? -eq 0 ]; then
  echo nvidia GPU detected
fi

--
regards Thomas