imf_reader.cache.manager

File-based binary-payload cache with atomic NFS-safe writes.

Each entry consists of:
  • <key> — the raw payload (e.g. a zip file)

  • <key>.manifest.json{"created_at", "size_bytes", "schema_version"}

Writes are atomic: the payload is staged to a <key>.tmp.<host>.<pid> file, fsynced, then renamed into place so a crashed writer never leaves a half-written entry visible to readers.

Concurrency safety uses SoftFileLock (mkdir-based) rather than FileLock (fcntl/LockFileEx) because OS-level locks are unreliable on NFS/SMB.

Double-checked locking: the cache-hit path reads final_path + manifest without acquiring the lock (idempotent); only the miss path takes the lock and re-checks inside it so a concurrent writer that populated the entry between the unlocked check and the lock acquisition avoids a redundant download.

Attributes

logger

_MANIFEST_SUFFIX

_TMP_PATTERN

_STALE_TMP_SECONDS

Classes

CacheManager

File-based cache for binary payloads (e.g. WEO SDMX zips).

Functions

_now_iso(→ str)

_parse_iso(→ datetime.datetime)

Module Contents

imf_reader.cache.manager.logger
imf_reader.cache.manager._MANIFEST_SUFFIX = '.manifest.json'
imf_reader.cache.manager._TMP_PATTERN = '.tmp.'
imf_reader.cache.manager._STALE_TMP_SECONDS = 86400
imf_reader.cache.manager._now_iso() str
imf_reader.cache.manager._parse_iso(s: str) datetime.datetime
class imf_reader.cache.manager.CacheManager(*, sublayer: str, ttl: datetime.timedelta, keep_n: int = 4)

File-based cache for binary payloads (e.g. WEO SDMX zips).

Each call to get_or_fetch returns the on-disk Path to the cached payload. On a cache miss the fetch_fn callable is invoked, its result validated, and then written atomically.

Parameters:
  • sublayer – Subdirectory name under the cache root (e.g. "weo_sdmx").

  • ttl – How long a cached entry is considered fresh.

  • keep_n – Maximum number of entries to retain (LRU eviction removes oldest beyond this limit). Defaults to 4.

_sublayer
_ttl
_keep_n = 4
_sublayer_dir: pathlib.Path | None = None
_get_sublayer_dir() pathlib.Path
_final_path(key: str) pathlib.Path
_manifest_path(key: str) pathlib.Path
_tmp_path(final: pathlib.Path) pathlib.Path
_lock_path(key: str) pathlib.Path
_on_cache_dir_changed(new_root: pathlib.Path) None
_read_manifest(key: str) dict | None
_not_expired(manifest: dict) bool
_atomic_write(final: pathlib.Path, content: bytes) None

Write content to final atomically via a host+pid-suffixed tmp file.

_write_manifest(key: str, size_bytes: int) None
_evict_lru() None

Remove the oldest entries beyond keep_n.

_sweep_orphan_tmp() None

Remove *.tmp.* files older than 1 hour left by crashed processes.

get_or_fetch(key: str, fetch_fn: collections.abc.Callable[[], bytes], *, validator: collections.abc.Callable[[bytes], None] | None = None) pathlib.Path

Return the on-disk path to the cached payload for key.

On a cache hit (entry exists and TTL has not expired) the path is returned immediately — no re-validation and no lock acquisition.

On a cache miss the lock is acquired, the entry is re-checked inside the lock (double-checked locking — another writer may have populated the entry between the unlocked check and the lock acquisition), and if still absent fetch_fn is called to download the content. The validator is then invoked on the raw bytes; on failure the entry is removed and BulkPayloadCorruptError is raised so the next call can retry cleanly.

Parameters:
  • key – Cache key (used as the filename, e.g. "weo_april_2024.zip").

  • fetch_fn – Zero-arg callable that downloads and returns the raw bytes.

  • validator – Optional callable that raises on corrupt content. Called only on a cache miss / refetch (never on a hit).

Returns:

Absolute Path to the cached payload on disk.

Raises:

BulkPayloadCorruptError – When the downloaded payload fails validation.

clear() None

Remove all cached entries (payloads and manifests) in this sublayer.