Configuration
Most users can run gflow without configuration. Use a config file (TOML) and/or environment variables when you need to change where the daemon listens, or restrict GPU usage.
Config File
Default location:
~/.config/gflow/gflow.tomlGenerate one interactively:
gflowd initMinimal example:
[daemon]
host = "localhost"
port = 59000
# gpus = [0, 2]
# gpu_allocation_strategy = "sequential" # or "random"All CLIs accept --config <path> to use a different file:
gflowd --config <path> up
ginfo --config <path>
gbatch --config <path> --gpus 1 python train.pyDaemon Settings
Host and Port
[daemon]
host = "localhost"
port = 59000- Default:
localhost:59000 - Use
0.0.0.0only if you understand the security implications.
GPU Selection
Restrict which physical GPUs the scheduler is allowed to allocate.
Config file:
[daemon]
gpus = [0, 2]GPU Allocation Strategy
Control how gflow picks GPU indices when multiple GPUs are available.
Config file:
[daemon]
gpu_allocation_strategy = "sequential" # default
# gpu_allocation_strategy = "random"sequential: deterministic, prefer lower GPU indices first.random: randomize GPU selection order each scheduling cycle.
Daemon CLI flag (overrides config):
gflowd up --gpu-allocation-strategy random
gflowd restart --gpu-allocation-strategy sequentialDaemon CLI flag (overrides config):
gflowd up --gpus 0,2
gflowd restart --gpus 0-3Runtime control (affects new allocations only):
gctl set-gpus 0,2
gctl set-gpus all
gctl show-gpusSupported specs: 0, 0,2,4, 0-3, 0-1,3,5-6.
Precedence (highest → lowest):
- CLI flag (
gflowd up --gpus ...) - Env var (
GFLOW_DAEMON_GPUS=...) - Config file (
daemon.gpus = [...]) - Default: all detected GPUs
For allocation strategy:
- CLI flag (
gflowd up --gpu-allocation-strategy ...) - Env var (
GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=...) - Config file (
daemon.gpu_allocation_strategy = "...") - Default:
sequential
Timezone
Configure timezone for displaying and parsing reservation times.
Config file:
timezone = "Asia/Shanghai"Per-command override:
gctl reserve create --user alice --gpus 2 --start "2026-02-01 14:00" --duration "2h" --timezone "UTC"Supported formats:
- IANA timezone names:
"Asia/Shanghai","America/Los_Angeles","UTC" - Time input: ISO8601 (
"2026-02-01T14:00:00Z") or simple format ("2026-02-01 14:00")
Precedence (highest → lowest):
- CLI flag (
--timezone) - Config file (
timezone = "...") - Default: local system timezone
Project Tracking
Use project settings to standardize job ownership metadata across teams.
[projects]
known_projects = ["ml-research", "cv-team"]
require_project = falseknown_projects: allowed project codes. Empty means any non-empty code is allowed.require_project: whentrue, every submitted job must include a non-empty project.- Project values are normalized (trimmed). Whitespace-only values are treated as unset.
- Project code length limit: 64 characters.
- If both settings are used, project must be present and in
known_projects.
Related CLI usage:
gbatch --project ml-research python train.py
gqueue --project ml-research
gqueue --format JOBID,NAME,PROJECT,ST,TIMENotifications (Webhooks)
gflowd can send HTTP POST webhooks for job and system events (best-effort).
Enable and configure:
[notifications]
enabled = true
max_concurrent_deliveries = 16
[[notifications.webhooks]]
url = "https://api.example.com/gflow/events"
events = ["job_completed", "job_failed", "job_timeout"] # or ["*"]
filter_users = ["alice", "bob"] # optional
headers = { Authorization = "Bearer token123" } # optional
timeout_secs = 10
max_retries = 3Supported event names:
job_submittedjob_startedjob_completedjob_failedjob_cancelledjob_timeoutjob_heldjob_releasedgpu_available(only when a GPU becomes available)reservation_createdreservation_cancelled
Payload shape (fields may be omitted depending on event):
{
"event": "job_completed",
"timestamp": "2026-02-04T12:30:45Z",
"job": { "id": 42, "user": "alice", "state": "Finished" },
"scheduler": { "host": "gpu-server-01", "version": "0.4.11" }
}Notes:
events = ["*"]subscribes to all supported events.- Use
filter_usersto restrict notifications by job submitter / reservation owner. max_retriesuses exponential backoff (best-effort); deliveries may be skipped if the daemon is overloaded.- Be careful with sensitive data: webhooks can include job metadata and usernames.
Logging
gflowd: use-v/--verbose(seegflowd --help).- Client commands (
gbatch,gqueue,ginfo,gjob,gctl): useRUST_LOG(e.g.RUST_LOG=info).
Environment Variables
export GFLOW_DAEMON_HOST=localhost
export GFLOW_DAEMON_PORT=59000
export GFLOW_DAEMON_GPUS=0,2
export GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=randomFiles and State
gflow follows the XDG Base Directory spec:
~/.config/gflow/gflow.toml
~/.local/share/gflow/state.msgpack (or state.json for legacy)
~/.local/share/gflow/logs/<job_id>.logState Persistence Format
Starting from version 0.4.11, gflowd uses MessagePack binary format for state persistence:
- New installations: State is saved to
state.msgpack(binary format) - Automatic migration: Existing
state.jsonfiles are automatically migrated tostate.msgpackon first load - Backward compatibility: gflowd can still read old
state.jsonfiles
Recovery mode (state file issues)
If the state file cannot be deserialized or migrated (e.g. after upgrading/downgrading versions), gflowd enters recovery mode:
gflowdcontinues running, but does not overwrite the state file.- State changes are persisted to a single-snapshot journal file:
~/.local/share/gflow/state.journal.jsonl(it is overwritten on each save). /healthreturns200withstatus: "recovery"andmode: "journal".- A backup copy is created next to the state file (e.g.
state.msgpack.backup.<timestamp>orstate.msgpack.corrupt.<timestamp>).
When the state file becomes readable again, gflowd loads the latest journal snapshot, rewrites the state file, and truncates the journal.
If the journal file is not writable, gflowd falls back to read-only mode and mutating APIs return 503.
To recover, upgrade/downgrade to a version that can read/migrate your state, or restore from the backup file.
Troubleshooting
Config file not found
ls -la ~/.config/gflow/gflow.tomlPort already in use
Change the port:
[daemon]
port = 59001See Also
- Installation - Initial setup
- Quick Start - Basic usage
- GPU Management - GPU allocation