|
rdm-parser 0.1.0
Reusable parsing library for time series data standardization
|
Functions | |
| str|None | _detect_by_extension (Path path) |
| str|None | _detect_by_content (str text) |
| ParseResult | _error_result (str code, str message) |
| ParseResult | parse_file (str|Path data_path, str|Path|None metadata_path=None, str|None encoding=None) |
Top-level dispatcher.
``parse_file()`` detects the testbench format from the input file and
delegates to the matching parser, returning the standard ``ParseResult``
shape:
{
"metadata": {...} | None,
"records": [{"time_stamp", "cell_voltage", "current_density"}, ...],
"errors": [{"code", "message", "line"}, ...],
}
Format detection runs **two** checks:
1. **Extension** — ``.dat`` → BZ011, ``.csv`` → Greenlight.
2. **Content** — first non-blank line of the file:
- starts with ``Datum\\t`` → BZ011
- starts with ``Format,`` → Greenlight
If either check yields nothing, the other wins. If both yield a result and
they disagree, an ``ERR_FORMAT_MISMATCH`` is returned (the file extension
lies about the contents — better to fail loudly than silently mis-parse).
| ParseResult rdm_parser.standardizer.parse_file | ( | str | Path | data_path, |
| str | Path | None | metadata_path = None, |
||
| str | None | encoding = None |
||
| ) |
Detect the testbench format of ``data_path`` and parse it.
Args:
data_path: Path to the data file (.dat for BZ011, .csv for Greenlight).
metadata_path: Required for BZ011 (companion JSON). Ignored for Greenlight.
encoding: Forwarded to the underlying parser. ``None`` = utf-8 with latin-1 fallback.
Returns:
A ``ParseResult`` dict. On detection failure, ``records`` is empty
and ``errors`` carries one of:
- ``FILE_READ_ERROR`` — could not read the data file.
- ``FORMAT_UNKNOWN`` — neither extension nor content match a known format.
- ``FORMAT_MISMATCH`` — extension and content disagree.
- ``METADATA_REQUIRED`` — detected BZ011 but no metadata_path was given.