rdm-parser 0.1.0
Reusable parsing library for time series data standardization
Loading...
Searching...
No Matches
rdm_parser.standardizer Namespace Reference

Functions

str|None _detect_by_extension (Path path)
 
str|None _detect_by_content (str text)
 
ParseResult _error_result (str code, str message)
 
ParseResult parse_file (str|Path data_path, str|Path|None metadata_path=None, str|None encoding=None)
 

Variables

 log = get_logger(__name__)
 
str ERR_FILE_READ = "FILE_READ_ERROR"
 
str ERR_FORMAT_UNKNOWN = "FORMAT_UNKNOWN"
 
str ERR_FORMAT_MISMATCH = "FORMAT_MISMATCH"
 
str ERR_METADATA_REQUIRED = "METADATA_REQUIRED"
 
str FORMAT_BZ011 = "bz011"
 
str FORMAT_GREENLIGHT = "greenlight"
 

Detailed Description

Top-level dispatcher.

``parse_file()`` detects the testbench format from the input file and
delegates to the matching parser, returning the standard ``ParseResult``
shape:

    {
        "metadata": {...} | None,
        "records": [{"time_stamp", "cell_voltage", "current_density"}, ...],
        "errors": [{"code", "message", "line"}, ...],
    }

Format detection runs **two** checks:

1. **Extension** — ``.dat`` → BZ011, ``.csv`` → Greenlight.
2. **Content** — first non-blank line of the file:
   - starts with ``Datum\\t``  → BZ011
   - starts with ``Format,``    → Greenlight

If either check yields nothing, the other wins. If both yield a result and
they disagree, an ``ERR_FORMAT_MISMATCH`` is returned (the file extension
lies about the contents — better to fail loudly than silently mis-parse).

Function Documentation

◆ parse_file()

ParseResult rdm_parser.standardizer.parse_file ( str | Path  data_path,
str | Path | None   metadata_path = None,
str | None   encoding = None 
)
Detect the testbench format of ``data_path`` and parse it.

Args:
    data_path: Path to the data file (.dat for BZ011, .csv for Greenlight).
    metadata_path: Required for BZ011 (companion JSON). Ignored for Greenlight.
    encoding: Forwarded to the underlying parser. ``None`` = utf-8 with latin-1 fallback.

Returns:
    A ``ParseResult`` dict. On detection failure, ``records`` is empty
    and ``errors`` carries one of:
        - ``FILE_READ_ERROR`` — could not read the data file.
        - ``FORMAT_UNKNOWN`` — neither extension nor content match a known format.
        - ``FORMAT_MISMATCH`` — extension and content disagree.
        - ``METADATA_REQUIRED`` — detected BZ011 but no metadata_path was given.