Job Dependencies

Job dependencies let you build workflows where a job waits for other job(s) to finish before it can start.

Quick Start

bash

gbatch --time 10 python preprocess.py
gbatch --depends-on @ --gpus 1 --time 4:00:00 python train.py
gbatch --depends-on @ --time 10 python evaluate.py

Dependency Options

Single dependency

bash

gbatch --depends-on <job_id|@|@~N> python next.py

Shorthands:

@: the most recently submitted job
@~N: the Nth most recent submission (e.g. @~1 is the previous job)

Multiple dependencies (AND / OR)

bash

# AND: all parents must finish successfully
gbatch --depends-on-all 101,102,103 python merge.py

# OR: any one parent finishing successfully is enough
gbatch --depends-on-any 201,202,203 python process_first_success.py

@ shorthands also work in lists (e.g. --depends-on-all @,@~1,@~2).

In scripts (directive)

Script directives support only --depends-on (single dependency):

bash

#!/bin/bash
# GFLOW --depends-on=123

python next.py

Auto-cancellation

By default, if a dependency fails/cancels/times out, dependent jobs are auto-cancelled. Disable this if you want them to stay queued:

bash

gbatch --depends-on <job_id> --no-auto-cancel python next.py

When auto-cancel is disabled, the dependent job will not start automatically even after the parent fails; you must cancel or resubmit it.

Monitor Dependencies

bash

# Tree view
gqueue -t

Example output:

JOBID  NAME   ST  TIME      NODES  NODELIST(REASON)
1      prep   CD  00:02:15  0      -
├─2    train  R   00:10:03  1      0
└─3    eval   PD  -         0      (WaitingForDependency)

bash

# Focus on a subset
gqueue -j <job_id>,<job_id> -t

# Why a queued job is waiting
gqueue -s Queued -f JOBID,NAME,ST,NODELIST(REASON)

Troubleshooting

Dependent job not starting

bash

gqueue -t
gqueue -j <parent_job_id> -f JOBID,ST
ginfo

Redo a whole chain after fixing a failure

bash

gjob redo <job_id> --cascade

Job Dependencies ​

Quick Start ​

Dependency Options ​

Single dependency ​

Multiple dependencies (AND / OR) ​

In scripts (directive) ​

Auto-cancellation ​

Monitor Dependencies ​

Troubleshooting ​

Dependent job not starting ​

Redo a whole chain after fixing a failure ​

See Also ​

Job Dependencies

Quick Start

Dependency Options

Single dependency

Multiple dependencies (AND / OR)

In scripts (directive)

Auto-cancellation

Monitor Dependencies

Troubleshooting

Dependent job not starting

Redo a whole chain after fixing a failure

See Also