Job Dependencies
Job dependencies let you build workflows where a job waits for other job(s) to finish before it can start.
Quick Start
bash
gbatch --time 10 python preprocess.py
gbatch --depends-on @ --gpus 1 --time 4:00:00 python train.py
gbatch --depends-on @ --time 10 python evaluate.pyDependency Options
Single dependency
bash
gbatch --depends-on <job_id|@|@~N> python next.pyShorthands:
@: the most recently submitted job@~N: the Nth most recent submission (e.g.@~1is the previous job)
Multiple dependencies (AND / OR)
bash
# AND: all parents must finish successfully
gbatch --depends-on-all 101,102,103 python merge.py
# OR: any one parent finishing successfully is enough
gbatch --depends-on-any 201,202,203 python process_first_success.py@ shorthands also work in lists (e.g. --depends-on-all @,@~1,@~2).
In scripts (directive)
Script directives support only --depends-on (single dependency):
bash
#!/bin/bash
# GFLOW --depends-on=123
python next.pyAuto-cancellation
By default, if a dependency fails/cancels/times out, dependent jobs are auto-cancelled. Disable this if you want them to stay queued:
bash
gbatch --depends-on <job_id> --no-auto-cancel python next.pyWhen auto-cancel is disabled, the dependent job will not start automatically even after the parent fails; you must cancel or resubmit it.
Monitor Dependencies
bash
# Tree view
gqueue -tExample output:
JOBID NAME ST TIME NODES NODELIST(REASON)
1 prep CD 00:02:15 0 -
├─2 train R 00:10:03 1 0
└─3 eval PD - 0 (WaitingForDependency)bash
# Focus on a subset
gqueue -j <job_id>,<job_id> -t
# Why a queued job is waiting
gqueue -s Queued -f JOBID,NAME,ST,NODELIST(REASON)Troubleshooting
Dependent job not starting
bash
gqueue -t
gqueue -j <parent_job_id> -f JOBID,ST
ginfoRedo a whole chain after fixing a failure
bash
gjob redo <job_id> --cascadeSee Also
- Job Lifecycle - Reasons and state transitions
- Time Limits - Prevent runaway jobs