Skip to content

Loading and dumping

Validation works on Python values, but data arrives as text: a JSON request body, a YAML config file, a TOML manifest. Probatio reads and writes all three through one set of functions. JSON read and write and TOML read work on the standard library; YAML (read and write) and TOML write need an optional extra, covered under Backends below. Every function here is exported from the top level.

The loaders parse text into Python values. There is one per format, plus a unified entry point:

  • load_json(source), load_yaml(source), load_toml(source): parse a known format.
  • load(source, format=None): dispatch on format, or auto-detect it from a path extension when format is omitted.

A source is the content itself (a string or bytes), a pathlib.Path read from disk, or a file-like object.

from probatio import load_json
load_json('{"port": 8080}') # {'port': 8080}

Parsing alone does not validate. A parsed value is whatever the text said, so run it through a schema:

from probatio import Schema, Required, load_json
schema = Schema({Required("port"): int})
schema(load_json('{"port": 8080}')) # {'port': 8080}

That two-step is common enough that Schema has convenience methods to parse and validate in one call: schema.load_json(source), schema.load_yaml(source), schema.load_toml(source), and schema.load(source, format=None). Same result, one step:

from probatio import Schema, Required
schema = Schema({Required("port"): int})
schema.load_json('{"port": 8080}') # {'port': 8080}

load infers the format from a path extension. Write a file, then read it back without naming the format:

from pathlib import Path
from probatio import load
Path("config.json").write_text('{"port": 8080}')
load(Path("config.json")) # {'port': 8080}

The dumpers go the other way, serializing a value to text. The same shape: one per format, plus a unified entry point.

  • dump_json(value), dump_yaml(value), dump_toml(value): serialize to a known format.
  • dump(value, format): dispatch on format ("json", "yaml", or "toml").
from probatio import dump_json, load_json
text = dump_json({"port": 8080})
load_json(text) # {'port': 8080}

Before handing a value to the backend, the dumpers normalize the few non-native types a validated value commonly carries: Decimal becomes a float, and set, frozenset, and tuple become a list. The temporal types are format-aware. TOML has native datetime, date, and time, so those pass through and round-trip as the same type; JSON and YAML have no temporal types, so they become ISO 8601 strings. JSON also has no nan or inf, so a non-finite float is refused with a clear error rather than silently corrupted (the fast backend would turn it into null, the standard library into an invalid token). YAML and TOML keep non-finite floats, since both can represent them. The normalization is one-way: a set, frozenset, or tuple dumps as a list and loads back as a list, not as the original type. This is a convenience for round-tripping validated data, not a general serialization framework. Reach for a default hook or a dedicated serializer when you need more.

Probatio uses a fast backend when one is installed and falls back to the standard library otherwise. The backends are detected once at import time:

  • JSON: orjson when present, otherwise the standard library’s json.
  • YAML: YAMLRocks when present, then PyYAML’s safe loader and dumper. YAML is not a hard dependency. Install the probatio[yaml] or probatio[fast] extra to get a parser.
  • TOML: reading uses the standard library’s tomllib, always available on the supported Python versions. Writing needs tomli-w (the probatio[toml] extra), since the standard library does not write TOML.

On a parse error, each loader raises the backend’s own exception, not a single probatio type: orjson raises orjson.JSONDecodeError (a subclass of the standard library’s json.JSONDecodeError), the standard library raises json.JSONDecodeError, YAMLRocks and PyYAML raise their own parse errors, and load_toml raises tomllib.TOMLDecodeError. Catch ValueError to cover the JSON and TOML cases across backends; for YAML, catch the parser’s error type.

You do not select a backend. The fast one is used automatically when installed, and the result is the same value either way:

from probatio import dump, load_json
text = dump({"port": 8080}, "json")
load_json(text) # {'port': 8080}

Every loader and dumper takes an optional options mapping that is forwarded to the active backend. Without it, the backend stays invisible (consistent output either way). With it, you tune the backend directly, so the call becomes specific to whichever backend is installed.

The clearest case is the YAML spec version. YAMLRocks parses YAML 1.2 by default, where yes is a plain string; switch it to 1.1 and yes becomes a boolean:

import yamlrocks
from probatio import load_yaml
load_yaml("flag: yes")["flag"] # 'yes'
load_yaml("flag: yes", options={"option": yamlrocks.OPT_YAML_1_1})["flag"] # True

The same options reaches dump_* (for example orjson.OPT_INDENT_2 to pretty-print JSON) and the other formats (parse_float for TOML, sort_keys for PyYAML). Since options are backend-specific, passing them couples the call to the backend you have, which is the trade for the extra control.

Passing the same options on every call gets old. Two layers sit beneath a call’s own options. A process-wide default, set once (at your application’s entry point), applies to every later call for that format:

import yamlrocks
from probatio import load_yaml, set_default_options, clear_default_options
set_default_options("yaml", load={"option": yamlrocks.OPT_YAML_1_1})
load_yaml("flag: yes")["flag"] # True
clear_default_options() # reset (so the rest of this page is unaffected)

A scoped override applies only inside a with block and never leaks to other code (it is async- and thread-safe), so reusable libraries should prefer it over mutating the global:

import yamlrocks
from probatio import load_yaml, default_options
with default_options("yaml", load={"option": yamlrocks.OPT_YAML_1_1}):
inside = load_yaml("flag: yes")["flag"]
inside # True

A call’s own options win over a scoped default, which wins over the process-wide one. Set the global only where you own the whole process (an application, not a library that others import).

When a config fails validation, the useful question is where in the file. load_yaml_with_locations answers it: it returns (data, locator), where the locator maps a validation error’s path back to the source position. Hand the locator to humanize_error and each failure gains the place it points at.

from probatio import Schema, Required, Range, MultipleInvalid, load_yaml_with_locations
from probatio.humanize import humanize_error
data, locator = load_yaml_with_locations("server:\n port: 70000\n")
schema = Schema({Required("server"): {Required("port"): Range(min=1, max=65535)}})
try:
schema(data)
except MultipleInvalid as err:
print(humanize_error(data, err, locator=locator))
# value must be at most 65535 for dictionary value @ data['server']['port']. Got 70000 (at 2:9)

The locator returns a Location (with line, column, and file) that programs can read directly, or that renders as file:line:column. It points at the exact value, scalar leaves included. A Path source fills in the file, following nested !include layers to the source that holds the value. A path that is not in the document yields None.