GPU Management
gflow detects NVIDIA GPUs (via NVML) and allocates them to jobs by setting CUDA_VISIBLE_DEVICES.
Quick Start
# Start the daemon (if not already running)
gflowd up
# See availability + current allocations
ginfo
# Submit a GPU job
gbatch --gpus 1 python train.py
# Track jobs and allocations
gqueue -s Running,Queued -f JOBID,NAME,ST,NODES,NODELIST(REASON)Inspect GPUs
ginfoExample output:
PARTITION GPUS NODES STATE JOB(REASON)
gpu 1 1 idle
gpu 1 0 allocated 5 (train-resnet)NODESshows the physical GPU indices.- If a GPU is busy but not allocated by gflow, it may appear with a reason (when available).
Non-gflow GPU usage:
- If NVML reports running compute processes on a GPU, gflow treats it as unavailable (often shown as
Unmanaged) and will not allocate it. - gflow does not preempt/kill non-gflow processes; jobs wait until the GPU becomes idle.
If you need per-GPU restriction status (allowed vs restricted):
gctl show-gpusRequirements
- NVIDIA GPU(s) + driver
- NVML library available (
libnvidia-ml.so)
Quick check:
nvidia-smi
gflowd up
ginfoOn systems without GPUs, gflow still works; only GPU allocation is unavailable.
Request GPUs
gbatch --gpus 1 python train.py
gbatch --gpus 2 python multi_gpu_train.pyWhen a job starts, gflow assigns physical GPU indices and exports them via CUDA_VISIBLE_DEVICES (which frameworks typically renumber starting from 0).
To see allocated GPU IDs:
gqueue -s Running -f JOBID,NAME,ST,NODES,NODELIST(REASON)
gjob show <job_id>Shared GPU Mode
Use shared mode when you want multiple jobs to co-locate on one physical GPU.
gbatch --gpus 1 --shared --gpu-memory 20G python train.py--sharedjobs only share with other--sharedjobs.--sharedrequires a per-GPU VRAM limit via--gpu-memory(alias:--max-gpu-mem).--memory(--max-mem) is still host RAM, not GPU VRAM.
GPU Visibility
#!/bin/bash
# GFLOW --gpus 2
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
python train.pyRestrict Which GPUs gflow Uses
Limit which physical GPUs the scheduler is allowed to allocate (affects new allocations only):
gctl set-gpus 0,2
gctl show-gpus
# Or via daemon CLI flag (overrides config)
gflowd restart --gpus 0-3See also: Configuration -> GPU Selection.
Choose GPU Allocation Strategy
When multiple GPUs are available for a job, you can choose how gflow selects them:
sequential(default): picks lower indices first.random: randomizes GPU selection order.
[daemon]
gpu_allocation_strategy = "sequential"
# gpu_allocation_strategy = "random"Or override on daemon startup:
gflowd up --gpu-allocation-strategy randomTroubleshooting
Job not getting GPU
ginfo
gqueue -j <job_id> -f JOBID,ST,NODES,NODELIST(REASON)
gctl show-gpusJob sees wrong GPUs
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
gqueue -f JOBID,NODELIST(REASON)Out of memory
nvidia-smi --query-gpu=memory.free,memory.used --format=csvIf shared jobs fail with OOM, verify --gpu-memory is set and sized appropriately for each job.
See Also
- Job Submission - Complete job submission guide
- Job Dependencies - Workflow management
- Time Limits - Job timeout management
- Quick Reference - Command cheat sheet