imf_reader.cache.manager ======================== .. py:module:: imf_reader.cache.manager .. autoapi-nested-parse:: File-based binary-payload cache with atomic NFS-safe writes. Each entry consists of: - ```` — the raw payload (e.g. a zip file) - ``.manifest.json`` — ``{"created_at", "size_bytes", "schema_version"}`` Writes are atomic: the payload is staged to a ``.tmp..`` file, fsynced, then renamed into place so a crashed writer never leaves a half-written entry visible to readers. Concurrency safety uses ``SoftFileLock`` (mkdir-based) rather than ``FileLock`` (fcntl/LockFileEx) because OS-level locks are unreliable on NFS/SMB. Double-checked locking: the cache-hit path reads ``final_path`` + manifest without acquiring the lock (idempotent); only the miss path takes the lock and re-checks inside it so a concurrent writer that populated the entry between the unlocked check and the lock acquisition avoids a redundant download. Attributes ---------- .. autoapisummary:: imf_reader.cache.manager.logger imf_reader.cache.manager._MANIFEST_SUFFIX imf_reader.cache.manager._TMP_PATTERN imf_reader.cache.manager._STALE_TMP_SECONDS Classes ------- .. autoapisummary:: imf_reader.cache.manager.CacheManager Functions --------- .. autoapisummary:: imf_reader.cache.manager._now_iso imf_reader.cache.manager._parse_iso Module Contents --------------- .. py:data:: logger .. py:data:: _MANIFEST_SUFFIX :value: '.manifest.json' .. py:data:: _TMP_PATTERN :value: '.tmp.' .. py:data:: _STALE_TMP_SECONDS :value: 86400 .. py:function:: _now_iso() -> str .. py:function:: _parse_iso(s: str) -> datetime.datetime .. py:class:: CacheManager(*, sublayer: str, ttl: datetime.timedelta, keep_n: int = 4) File-based cache for binary payloads (e.g. WEO SDMX zips). Each call to ``get_or_fetch`` returns the on-disk ``Path`` to the cached payload. On a cache miss the ``fetch_fn`` callable is invoked, its result validated, and then written atomically. :param sublayer: Subdirectory name under the cache root (e.g. ``"weo_sdmx"``). :param ttl: How long a cached entry is considered fresh. :param keep_n: Maximum number of entries to retain (LRU eviction removes oldest beyond this limit). Defaults to ``4``. .. py:attribute:: _sublayer .. py:attribute:: _ttl .. py:attribute:: _keep_n :value: 4 .. py:attribute:: _sublayer_dir :type: pathlib.Path | None :value: None .. py:method:: _get_sublayer_dir() -> pathlib.Path .. py:method:: _final_path(key: str) -> pathlib.Path .. py:method:: _manifest_path(key: str) -> pathlib.Path .. py:method:: _tmp_path(final: pathlib.Path) -> pathlib.Path .. py:method:: _lock_path(key: str) -> pathlib.Path .. py:method:: _on_cache_dir_changed(new_root: pathlib.Path) -> None .. py:method:: _read_manifest(key: str) -> dict | None .. py:method:: _not_expired(manifest: dict) -> bool .. py:method:: _atomic_write(final: pathlib.Path, content: bytes) -> None Write *content* to *final* atomically via a host+pid-suffixed tmp file. .. py:method:: _write_manifest(key: str, size_bytes: int) -> None .. py:method:: _evict_lru() -> None Remove the oldest entries beyond ``keep_n``. .. py:method:: _sweep_orphan_tmp() -> None Remove ``*.tmp.*`` files older than 1 hour left by crashed processes. .. py:method:: get_or_fetch(key: str, fetch_fn: collections.abc.Callable[[], bytes], *, validator: collections.abc.Callable[[bytes], None] | None = None) -> pathlib.Path Return the on-disk path to the cached payload for *key*. On a cache hit (entry exists and TTL has not expired) the path is returned immediately — no re-validation and no lock acquisition. On a cache miss the lock is acquired, the entry is re-checked inside the lock (double-checked locking — another writer may have populated the entry between the unlocked check and the lock acquisition), and if still absent ``fetch_fn`` is called to download the content. The ``validator`` is then invoked on the raw bytes; on failure the entry is removed and :exc:`~imf_reader.config.BulkPayloadCorruptError` is raised so the next call can retry cleanly. :param key: Cache key (used as the filename, e.g. ``"weo_april_2024.zip"``). :param fetch_fn: Zero-arg callable that downloads and returns the raw bytes. :param validator: Optional callable that raises on corrupt content. Called only on a cache miss / refetch (never on a hit). :returns: Absolute ``Path`` to the cached payload on disk. :raises BulkPayloadCorruptError: When the downloaded payload fails validation. .. py:method:: clear() -> None Remove all cached entries (payloads and manifests) in this sublayer.