GPU and vGPU Support

Introduction

We basically have to differentiate two different use cases: The GPU nodes shall be used inside of the Kubernetes cluster or the gpu nodes shall be used outside of the Kubernetes cluster, i.e. in an OpenStack cluster deployed on Kubernetes.

Internal Usage

Internal usage means you want to run e.g. AI workload inside of your Kubernetes cluster.

If you want to make use of GPUs inside of Kubernetes, set the following:

[kubernetes]
# Set this variable if this cluster contains worker with GPU access
# and you want to make use of these inside of the cluster,
# so that the driver and surrounding framework is deployed.
is_gpu_cluster = true

This will trigger the setup automation. GPU support inside Kubernetes is achieved by 3 main components:

  • The presence of the nvidia driver on the workers

  • The presence of the nvidia container toolkit for the container runtime

  • The presence of the nvidia device plugin for kubernetes

Implementation-wise the role gpu-support-detection uses a pci scan script to detect the presence of an nvidia gpu with the vendor product id (10de:XXXX). If the detection is successful the node_has_gpu fact is defined for the host, and further logic is triggered.

External Usage

If the GPU cards should be used outside of the Kubernetes cluster, on an abstraction layer above (an OpenStack cluster), we must differentiate two different use cases: PCIe passthrough and vGPU (slicing).

For direct PCIe passthrough, we must not load (better install) any nvidia drivers.

vGPU Support

The nvidia vGPUs are licensed products and therefore need extra configuration.

Nova and the vGPU manager do not get along with parallel creation of vGPU VM. To avoid the crash use the option -parallelism=1 with terraform apply

vGPU support requires i.e. the installation of a vGPU management software to slice the actual GPU into virtual ones. The responsible role is vgpu`-support (see here). The procedure is described in the following section.

The vGPU Support is only available for NVIDIA GPUs which support GPU virtualization. If you want to check if your GPU supports virtualization you can check the official NVIDIA Guide.

Both AMD CPUs and INTEL CPUs are supported in Yaook for GPU virtualization. To virtualize the GPU the BIOS setting VT-D/IOMMU has to be enabled. Therefore a enable_iommu.cfg snippet will be automatically added in /etc/default/grub.d. This is useful because the grubfile is not changed and therefore presets are kept and the *.cfg files in grub.d are loaded after the grubfile. This allows us to make additional modifications in the future.

To enable the vGPU support in Yaook/k8s, the following variables must be set in the config.toml. The config.template.toml can be found here. The vGPU Manager software can be downloaded in the NVIDIA Licensing portal.

# vGPU Support
[nvidia.vgpu]
driver_blob_url = "foo"   # vGPU manager storage location
manager_filename = "bar"  # vGPU manager

After Yaook/k8s has been rolled out, the folder for the chosen configuration still has to be found. The following steps have to be done only once and are needed for Yaook Operator and Openstack.

Note

It is recommended to save the folder name including the configuration and GPU so that the process only needs to be performed once.

A distinction must be made between two cases.

  1. NVIDIA GPU that does not support SR-IOV. (All GPUs before the Ampere architectur)

    Physical GPUs supporting virtual GPUs propose mediate device types (mdev). To see the required properties, go to the following folder. Note: You still need to get the right PCI port, in which the GPU is plugged in.

    $ lspci | grep NVIDIA
    82:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
    

    Find the folder with your desired vGPU configuration. Replace “vgpu-type” with your chosen vGPU configuration.

    $ grep -l "vgpu-type" nvidia-*/name
    
  2. NVIDIA GPU that supports SR-IOV. (All GPUs of the Ampere architecture or newer)

    Obtain the bus, domain, slot and function of the available virtual functions on the GPU.

    $ ls -l /sys/bus/pci/devices/domain\:bus\:slot.function/ | grep virtfn
    

    This example shows the output of this command for a physical GPU with the slot 00, bus 82, domain 0000 and function 0.

    $ ls -l /sys/bus/pci/devices/0000:82:00.0/ | grep virtfn
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn0 -> ../0000:82:00.4
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn1 -> ../0000:82:00.5
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn10 -> ../0000:82:01.6
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn11 -> ../0000:82:01.7
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn12 -> ../0000:82:02.0
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn13 -> ../0000:82:02.1
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn14 -> ../0000:82:02.2
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn15 -> ../0000:82:02.3
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn2 -> ../0000:82:00.6
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn3 -> ../0000:82:00.7
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn4 -> ../0000:82:01.0
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn5 -> ../0000:82:01.1
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn6 -> ../0000:82:01.2
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn7 -> ../0000:82:01.3
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn8 -> ../0000:82:01.4
    lrwxrwxrwx 1 root root           0 Jul 25 07:57 virtfn9 -> ../0000:82:01.5
    

    Choose the virtual function on which you want to create the vGPU. Change to the mdev_supported_types directory on which you want to create the vGPU and find the subdirectory, that contains your chosen vGPU configuration. Replace vgpu-type with your chosen vGPU configuration.

    $ cd /sys/class/mdev_bus/0000\:82\:00.4/mdev_supported_types/
    $ grep -l "vgpu-type" nvidia-*/name
    
  3. With the subdirectory name information you can proceed with the Yaook Operator. There you can set the enable_vgpu_types in the nova.yaml. The file is located under operator/docs/examples/nova.yaml.

    compute:
      configTemplates:
      - nodeSelectors:
        - matchLabels: {}
        novaComputeConfig:
          DEFAULT:
            debug: True
          devices:
            enabled_vgpu_types:
            - nvidia-233
    

Physical host considerations

Customers may have different scheduling preferences.

Some vGPU VM might fail to start depending on the vGPU model if ECC is enabled.