Skip to content

Job Lifecycle

This guide explains the complete lifecycle of jobs in gflow, including state transitions, status checking, and recovery operations.

Job States

gflow jobs can be in one of seven states:

StateShortDescription
QueuedPDJob is waiting to run (pending dependencies or resources)
HoldHJob is on hold by user request
RunningRJob is currently executing
FinishedCDJob completed successfully
FailedFJob terminated with an error
CancelledCAJob was cancelled by user or system
TimeoutTOJob exceeded its time limit

State Categories

Active States (job is not yet complete):

  • Queued, Hold, Running

Completed States (job has finished):

  • Finished, Failed, Cancelled, Timeout

State Transition Diagram

The following diagram shows all possible state transitions in gflow:

State Transition Rules

From Queued:

  • Running: When dependencies are met AND resources are available
  • Hold: User runs gjob hold <job_id>
  • Cancelled: User runs gcancel <job_id> OR a dependency fails (with auto-cancel enabled)

From Hold:

  • Queued: User runs gjob release <job_id>
  • Cancelled: User runs gcancel <job_id>

From Running:

  • Finished: Job script/command exits with code 0
  • Failed: Job script/command exits with non-zero code
  • Cancelled: User runs gcancel <job_id>
  • Timeout: Job exceeds its time limit (set with --time)

From Completed States:

  • No transitions (final states)
  • Use gjob redo <job_id> to create a new job with the same parameters

Job State Reasons

Jobs in certain states have an associated reason that provides more context:

StateReasonDescription
QueuedWaitingForDependencyJob is waiting for parent jobs to finish
QueuedWaitingForResourcesJob is waiting for available GPUs/memory
HoldJobHeldUserJob was put on hold by user request
CancelledCancelledByUserUser explicitly cancelled the job
CancelledDependencyFailed:<job_id>Job was auto-cancelled because job <job_id> failed
CancelledSystemError:<msg>Job was cancelled due to a system error

View the reason with gjob show <job_id> or gqueue -f JOBID,ST,REASON.

Status Checking Workflow

The following diagram shows how to check job status and take appropriate actions:

See Also

Released under the MIT License.