Configuration

Most users can run gflow without configuration. Use a config file (TOML) and/or environment variables when you need to change where the daemon listens, or restrict GPU usage.

Config File

Default location:

~/.config/gflow/gflow.toml

Generate one interactively:

bash

gflowd init

Minimal example:

toml

[daemon]
host = "localhost"
port = 59000
# gpus = [0, 2]
# gpu_allocation_strategy = "sequential" # or "random"

All CLIs accept --config <path> to use a different file:

bash

gflowd --config <path> up
ginfo --config <path>
gbatch --config <path> --gpus 1 python train.py

Daemon Settings

Host and Port

toml

[daemon]
host = "localhost"
port = 59000

Default: localhost:59000
Use 0.0.0.0 only if you understand the security implications.

GPU Selection

Restrict which physical GPUs the scheduler is allowed to allocate.

Config file:

toml

[daemon]
gpus = [0, 2]

GPU Allocation Strategy

Control how gflow picks GPU indices when multiple GPUs are available.

Config file:

toml

[daemon]
gpu_allocation_strategy = "sequential" # default
# gpu_allocation_strategy = "random"

sequential: deterministic, prefer lower GPU indices first.
random: randomize GPU selection order each scheduling cycle.

Daemon CLI flag (overrides config):

bash

gflowd up --gpu-allocation-strategy random
gflowd restart --gpu-allocation-strategy sequential

Daemon CLI flag (overrides config):

bash

gflowd up --gpus 0,2
gflowd restart --gpus 0-3

Runtime control (affects new allocations only):

bash

gctl set-gpus 0,2
gctl set-gpus all
gctl show-gpus

Supported specs: 0, 0,2,4, 0-3, 0-1,3,5-6.

Precedence (highest → lowest):

CLI flag (gflowd up --gpus ...)
Env var (GFLOW_DAEMON_GPUS=...)
Config file (daemon.gpus = [...])
Default: all detected GPUs

For allocation strategy:

CLI flag (gflowd up --gpu-allocation-strategy ...)
Env var (GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=...)
Config file (daemon.gpu_allocation_strategy = "...")
Default: sequential

Timezone

Configure timezone for displaying and parsing reservation times.

Config file:

toml

timezone = "Asia/Shanghai"

Per-command override:

bash

gctl reserve create --user alice --gpus 2 --start "2026-02-01 14:00" --duration "2h" --timezone "UTC"

Supported formats:

IANA timezone names: "Asia/Shanghai", "America/Los_Angeles", "UTC"
Time input: ISO8601 ("2026-02-01T14:00:00Z") or simple format ("2026-02-01 14:00")

Precedence (highest → lowest):

CLI flag (--timezone)
Config file (timezone = "...")
Default: local system timezone

Project Tracking

Use project settings to standardize job ownership metadata across teams.

toml

[projects]
known_projects = ["ml-research", "cv-team"]
require_project = false

known_projects: allowed project codes. Empty means any non-empty code is allowed.
require_project: when true, every submitted job must include a non-empty project.
Project values are normalized (trimmed). Whitespace-only values are treated as unset.
Project code length limit: 64 characters.
If both settings are used, project must be present and in known_projects.

Related CLI usage:

bash

gbatch --project ml-research python train.py
gqueue --project ml-research
gqueue --format JOBID,NAME,PROJECT,ST,TIME

Notifications (Webhooks)

gflowd can send HTTP POST webhooks for job and system events (best-effort).

Enable and configure:

toml

[notifications]
enabled = true
max_concurrent_deliveries = 16

[[notifications.webhooks]]
url = "https://api.example.com/gflow/events"
events = ["job_completed", "job_failed", "job_timeout"] # or ["*"]
filter_users = ["alice", "bob"] # optional
headers = { Authorization = "Bearer token123" } # optional
timeout_secs = 10
max_retries = 3

Supported event names:

job_submitted
job_started
job_completed
job_failed
job_cancelled
job_timeout
job_held
job_released
gpu_available (only when a GPU becomes available)
reservation_created
reservation_cancelled

Payload shape (fields may be omitted depending on event):

json

{
  "event": "job_completed",
  "timestamp": "2026-02-04T12:30:45Z",
  "job": { "id": 42, "user": "alice", "state": "Finished" },
  "scheduler": { "host": "gpu-server-01", "version": "0.4.11" }
}

Notes:

events = ["*"] subscribes to all supported events.
Use filter_users to restrict notifications by job submitter / reservation owner.
max_retries uses exponential backoff (best-effort); deliveries may be skipped if the daemon is overloaded.
Be careful with sensitive data: webhooks can include job metadata and usernames.

Logging

gflowd: use -v/--verbose (see gflowd --help).
Client commands (gbatch, gqueue, ginfo, gjob, gctl): use RUST_LOG (e.g. RUST_LOG=info).

Environment Variables

bash

export GFLOW_DAEMON_HOST=localhost
export GFLOW_DAEMON_PORT=59000
export GFLOW_DAEMON_GPUS=0,2
export GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=random

Files and State

gflow follows the XDG Base Directory spec:

text

~/.config/gflow/gflow.toml
~/.local/share/gflow/state.msgpack  (or state.json for legacy)
~/.local/share/gflow/logs/<job_id>.log

State Persistence Format

Starting from version 0.4.11, gflowd uses MessagePack binary format for state persistence:

New installations: State is saved to state.msgpack (binary format)
Automatic migration: Existing state.json files are automatically migrated to state.msgpack on first load
Backward compatibility: gflowd can still read old state.json files

Recovery mode (state file issues)

If the state file cannot be deserialized or migrated (e.g. after upgrading/downgrading versions), gflowd enters recovery mode:

gflowd continues running, but does not overwrite the state file.
State changes are persisted to a single-snapshot journal file: ~/.local/share/gflow/state.journal.jsonl (it is overwritten on each save).
/health returns 200 with status: "recovery" and mode: "journal".
A backup copy is created next to the state file (e.g. state.msgpack.backup.<timestamp> or state.msgpack.corrupt.<timestamp>).

When the state file becomes readable again, gflowd loads the latest journal snapshot, rewrites the state file, and truncates the journal.

If the journal file is not writable, gflowd falls back to read-only mode and mutating APIs return 503.

To recover, upgrade/downgrade to a version that can read/migrate your state, or restore from the backup file.

Troubleshooting

Config file not found

bash

ls -la ~/.config/gflow/gflow.toml

Port already in use

Change the port:

toml

[daemon]
port = 59001

Configuration ​

Config File ​

Daemon Settings ​

Host and Port ​

GPU Selection ​

GPU Allocation Strategy ​

Timezone ​

Project Tracking ​

Notifications (Webhooks) ​

Logging ​

Environment Variables ​

Files and State ​

State Persistence Format ​

Recovery mode (state file issues) ​

Troubleshooting ​

Config file not found ​

Port already in use ​

See Also ​

Configuration

Config File

Daemon Settings

Host and Port

GPU Selection

GPU Allocation Strategy

Timezone

Project Tracking

Notifications (Webhooks)

Logging

Environment Variables

Files and State

State Persistence Format

Recovery mode (state file issues)

Troubleshooting

Config file not found

Port already in use

See Also