Job Lifecycle

This guide explains the complete lifecycle of jobs in gflow, including state transitions, status checking, and recovery operations.

Job States

gflow jobs can be in one of seven states:

State	Short	Description
Queued	PD	Job is waiting to run (pending dependencies or resources)
Hold	H	Job is on hold by user request
Running	R	Job is currently executing
Finished	CD	Job completed successfully
Failed	F	Job terminated with an error
Cancelled	CA	Job was cancelled by user or system
Timeout	TO	Job exceeded its time limit

Active States (job is not yet complete):

Completed States (job has finished):

The following diagram shows all possible state transitions in gflow:

From Queued:

→ Running: When dependencies are met AND resources are available
→ Hold: User runs gjob hold <job_id>
→ Cancelled: User runs gcancel <job_id> OR a dependency fails (with auto-cancel enabled)

From Hold:

From Running:

From Completed States:

Jobs in certain states have an associated reason that provides more context:

State	Reason	Description
Queued	`WaitingForDependency`	Job is waiting for parent jobs to finish
Queued	`WaitingForResources`	Job is waiting for available GPUs/memory
Hold	`JobHeldUser`	Job was put on hold by user request
Cancelled	`CancelledByUser`	User explicitly cancelled the job
Cancelled	`DependencyFailed:<job_id>`	Job was auto-cancelled because job `<job_id>` failed
Cancelled	`SystemError:<msg>`	Job was cancelled due to a system error

View the reason with gjob show <job_id> or gqueue -f JOBID,ST,REASON.

The following diagram shows how to check job status and take appropriate actions: