Table of contents

1. Side-channel attacks2. GPU memory leakage3. Driver and firmware exploits4. GPU-specific malware5. Virtualization security risks6. AI and ML poisoning attacks7. Cryptojacking and GPU resource abuse8. Buffer overflowsNext stepsAdditional resources

Get the industry’s best GPU server hosting◦ NVIDIA hardware
◦ Optimized configs
◦ Industry-leading support

GPU → Vulnerability

GPU vulnerability: 8 security risks and how to address them

GPUs have revolutionized everything from AI training to high-performance computing, but their rise in server environments has also introduced a new frontier of security risks. While you’re likely well-versed in CUDA cores, memory bandwidth, and parallel processing, have you considered how these same architectural strengths can become attack surfaces?

Unlike traditional CPU servers, GPU-powered systems face unique threats—from side-channel exploits to virtualization vulnerabilities—that can compromise performance, data integrity, and even entire infrastructures. If you think your GPU workloads are safe just because they’re running in a secure data center, it’s time to take a closer look.

Get premium GPU server hosting

Unlock unparalleled performance with leading-edge GPU hosting services.

Explore GPU hosting

1. Side-channel attacks

Side-channel attacks exploit unintended information leakage from hardware, such as timing variations, power consumption, or electromagnetic emissions. While side-channel attacks are common in CPUs, GPUs are particularly vulnerable due to their parallel processing nature. Since multiple processes may share the same GPU hardware, an attacker running code on the same GPU can infer secret data by observing execution patterns, memory access times, or cache behavior.

One well-known GPU side-channel attack is keystroke inference, where an attacker monitors GPU-based rendering workloads to extract keystrokes or other user input patterns. Similarly, researchers have shown that cryptographic operations processed on a GPU can be susceptible to timing attacks, revealing secret keys.

Mitigation strategies:

Strong workload isolation: Avoid running sensitive and non-sensitive workloads on the same GPU in multitenant environments.
Cache partitioning: Use software techniques to reduce shared cache exposure in multi-tenant environments.
Randomized scheduling: Introduce randomization in GPU task execution to make timing-based attacks more difficult. For mitigating against key-stroke inference, randomized keyboard layouts can be used
Limit GPU access for untrusted code: Restrict GPU resource allocation to verified applications and prevent unauthorized execution.

2. GPU memory leakage

Unlike CPUs, GPUs do not always have robust memory isolation, especially in multitenant environments. If memory is not properly cleared when a process ends, an attacker could retrieve leftover data from another user’s workload. This issue is particularly concerning in cloud GPU environments where different customers use the same physical hardware.

For example, in some NVIDIA CUDA and AMD ROCm implementations, improperly deallocated memory can retain portions of deep learning model weights, sensitive images, or cryptographic data. This could allow a malicious user to extract data from previous workloads, potentially revealing proprietary or confidential information.

Mitigation strategies:

Zero GPU memory upon deallocation: Ensure that GPU memory is explicitly wiped when a process terminates.
GPU memory encryption: Use hardware features like AMD Secure Memory Encryption (SME) or similar technologies to protect in-use data.
GPU namespace isolation: Implement strict GPU memory access policies for multi-tenant environments.
Periodic system resets: In shared GPU environments, restarting GPU instances periodically can help clear lingering cache data.

3. Driver and firmware exploits

GPU drivers and firmware are frequently updated to patch security vulnerabilities, but outdated or improperly configured drivers present an easy attack vector. Exploits in GPU drivers can enable attackers to execute arbitrary code at the kernel level, escalate privileges, or crash entire systems.

For instance, past vulnerabilities in NVIDIA and AMD drivers have enabled remote code execution (RCE) by sending malicious shader code or malformed API calls. Attackers can also use privilege escalation exploits to gain admin control over GPU-accelerated servers, compromising the entire system.

Mitigation strategies:

Regular driver updates: Always install the latest security patches for GPU drivers.
Disable unnecessary GPU features: Reduce the attack surface by disabling APIs (e.g., CUDA, OpenCL) unless explicitly required. Any enabled APIs access should be firewall limited and/or rate limited
Enforce strict user permissions: Limit driver-level access to only authorized users.
Monitor driver vulnerability disclosures: Subscribe to security advisories from NVIDIA, AMD, and Intel to stay informed about known issues.

4. GPU-specific malware

Traditional antivirus and endpoint protection tools focus on detecting CPU-based threats, leaving GPUs relatively unprotected.

GPU-specific malware can execute malicious code directly within the GPU’s memory, making it harder to detect with standard security tools. Some GPU rootkits, such as Jellyfish (a proof-of-concept GPU malware), have demonstrated the ability to persist in GPU memory and avoid detection by conventional security software.

Attackers can also use GPUs for malware obfuscation, offloading key parts of an exploit to the GPU to evade detection. Since GPU execution operates separately from the CPU, traditional forensic tools may not even detect the presence of malicious code.

Mitigation strategies:

Use GPU-aware security tools: Employ security solutions that monitor GPU memory and execution (e.g., kernel-mode GPU monitoring tools).
Restrict GPU execution permissions: Block untrusted users from executing GPU workloads.
Regular firmware integrity checks: Validate firmware integrity to detect unauthorized modifications.

5. Virtualization security risks

GPU virtualization allows multiple virtual machines (VMs) to share a single GPU, which introduces risks such as GPU escape attacks—where an attacker in one VM gains access to another VM’s memory or workloads. Vulnerabilities in NVIDIA vGPU, AMD MxGPU, or Intel GVT-g have led to cases where guest VMs could crash the host system or escalate privileges.

One major concern is GPU passthrough attacks, where an attacker gains direct access to a GPU in a VM and exploits vulnerabilities to bypass hypervisor security. For example, attackers could modify a VM’s GPU memory mappings to access another tenant’s data or even the host system.

Mitigation strategies:

Use dedicated bare metal GPUs for sensitive workloads: Avoid sharing GPUs between tenants handling confidential data.
Apply strict VM isolation policies: Use SR-IOV (Single Root I/O Virtualization) or similar security-enhanced virtualization technologies.
Monitor for GPU escape exploits: Regularly check for known vulnerabilities in virtualization platforms.

6. AI and machine learning poisoning attacks

Since many GPU servers are used for AI training, they are susceptible to data poisoning attacks, where an attacker injects manipulated data into the training pipeline to alter the model’s behavior. This can lead to biased predictions, backdoor vulnerabilities, or security loopholes in deployed AI applications.

For instance, an attacker could introduce poisoned images into a facial recognition dataset to cause misclassification, or insert adversarial examples into a fraud detection model to reduce its effectiveness.

Mitigation strategies:

Validate training data sources: Ensure datasets are sourced from trusted and verified repositories.
Use differential privacy techniques: Apply noise to training data to prevent attackers from learning specific patterns.
Employ anomaly detection: Use AI-based detection models to identify and filter out poisoned data.

7. Cryptojacking and GPU resource abuse

Cryptojacking is when an attacker hijacks GPU resources to mine cryptocurrency without the owner’s consent. Cloud-hosted GPU servers are particularly vulnerable, because attackers can scan for instances with weak security and deploy cryptojacking scripts.

One common method involves compromising exposed SSH services or exploiting vulnerabilities in containerized environments. Since cryptojacking malware often runs at full GPU capacity, it can degrade server performance and increase energy costs.

Mitigation strategies:

Monitor GPU usage metrics: Set up alerts for unexpected GPU utilization spikes.
Use workload scheduling restrictions: Prevent unauthorized execution of mining-related workloads.
Harden server access: Disable password-based SSH login and enforce multi-factor authentication (MFA). Use only SSH keys and force expiration on those so they are rotated frequently. Audit keys on a regular schedule to remove any that no longer need access. Set up SSH on a non-standard port.
Deploy cryptojacking detection tools: Use tools like Akamai Guardicore Segmentation, or cloud-native security monitoring to detect abnormal GPU activity.

8. Buffer overflows

Buffer overflow vulnerabilities occur when a program writes more data to a buffer (temporary memory storage) than it can hold, leading to memory corruption. This can allow attackers to overwrite adjacent memory regions, execute arbitrary code, or crash a system.

While buffer overflows are a well-known issue in CPU-based applications, they are particularly dangerous in GPUs due to the complexity of GPU drivers, shader compilers, and parallel processing frameworks.

GPU buffer overflows can arise from several sources:

Vulnerabilities in GPU drivers and APIs: Since GPU vendors like NVIDIA (CUDA), AMD (ROCm), and OpenCL rely on complex memory management mechanisms, flaws in these APIs can lead to exploitable buffer overflows.
Improperly handled shader programs: Many GPUs process custom shaders for graphics rendering, AI workloads, and computational tasks. If a shader program doesn’t properly validate memory access, it can trigger an overflow.
GPU memory shared between processes: In multi-tenant environments, a buffer overflow in one process could leak sensitive data or corrupt memory used by another process.

In 2022, for example, security researchers identified a buffer overflow vulnerability in NVIDIA’s CUDA driver, allowing local attackers to escalate privileges and execute arbitrary code. Similar vulnerabilities have been found in GPU shader compilers, where specially crafted shader programs could trigger overflows to gain unauthorized access.

Mitigation strategies:

Regularly update GPU drivers and runtime libraries: NVIDIA, AMD, and Intel frequently release security patches to fix memory management flaws.
Use compiler security features: When developing CUDA, OpenCL, or Vulkan applications, enable bounds checking and compiler protections against buffer overflows.
Enforce strict memory access controls: Limit GPU memory allocation to trusted applications and prevent unprivileged users from running arbitrary GPU code.
Harden virtualized environments: In cloud GPU hosting, use SR-IOV or dedicated GPU instances to isolate workloads and prevent overflow-related memory leaks between tenants.
Implement GPU memory sanitization: Ensure that memory buffers are zeroed out after use to prevent residual data exposure.

Additional GPU security strategies

To pre-emptively secure GPU servers, consider:

Frequent driver and firmware updates to patch known vulnerabilities
Proper memory isolation in multi-tenant environments
Monitoring GPU activity with security tools designed for GPU workloads
Restricting access to GPU virtualization features and auditing permissions
Using hardware security features (e.g., NVIDIA’s security modules, AMD Secure Processor)

While GPU servers share some security risks with traditional dedicated servers, their specialized processing and shared multi-user environments introduce additional attack surfaces that need to be addressed.

Getting started with GPU server hosting

The most secure GPU server is a dedicated server in a secure data center. A dedicated GPU server doesn’t share all the same vulnerabilities as a cloud GPU or GPU as a Service arrangement.

If you need a powerful GPU server with the tightest possible security, Liquid Web can help. We give leading AI brands peace of mind with our security-first GPU server hosting. You can choose from a variety of server configurations—all compliance-audited, DDoS protected, and housed in top-tier data centers.

Click below to learn more or start a chat right now with one of our dedicated server experts.

Ready to get started?

Unlock unparalleled performance with Liquid Web’s leading-edge GPU hosting services and convenient AI/ML software stack.

Explore GPUs

Additional resources

What is a GPU? →

A complete beginner’s guide to GPUs and GPU hosting

Best GPU server hosting [2025] →

Top 4 GPU hosting providers side-by-side so you can decide which is best for you

A100 vs H100 vs L40S →

A simple side-by-side comparison of different NVIDIA GPUs and how to decide

Amy Moruzzi is a Systems Engineer at Liquid Web with years of experience maintaining large fleets of servers in a wide variety of areas—including system management, deployment, maintenance, clustering, virtualization, and application level support. She specializes in Linux, but has experience working across the entire stack. Amy also enjoys creating software and tools to automate processes and make customers’ lives easier.