imf_reader.cache.manager
File-based binary-payload cache with atomic NFS-safe writes.
- Each entry consists of:
<key>— the raw payload (e.g. a zip file)<key>.manifest.json—{"created_at", "size_bytes", "schema_version"}
Writes are atomic: the payload is staged to a <key>.tmp.<host>.<pid> file,
fsynced, then renamed into place so a crashed writer never leaves a half-written
entry visible to readers.
Concurrency safety uses SoftFileLock (mkdir-based) rather than FileLock
(fcntl/LockFileEx) because OS-level locks are unreliable on NFS/SMB.
Double-checked locking: the cache-hit path reads final_path + manifest
without acquiring the lock (idempotent); only the miss path takes the lock and
re-checks inside it so a concurrent writer that populated the entry between the
unlocked check and the lock acquisition avoids a redundant download.
Attributes
Classes
File-based cache for binary payloads (e.g. WEO SDMX zips). |
Functions
|
|
|
Module Contents
- imf_reader.cache.manager.logger
- imf_reader.cache.manager._MANIFEST_SUFFIX = '.manifest.json'
- imf_reader.cache.manager._TMP_PATTERN = '.tmp.'
- imf_reader.cache.manager._STALE_TMP_SECONDS = 86400
- imf_reader.cache.manager._now_iso() str
- imf_reader.cache.manager._parse_iso(s: str) datetime.datetime
- class imf_reader.cache.manager.CacheManager(*, sublayer: str, ttl: datetime.timedelta, keep_n: int = 4)
File-based cache for binary payloads (e.g. WEO SDMX zips).
Each call to
get_or_fetchreturns the on-diskPathto the cached payload. On a cache miss thefetch_fncallable is invoked, its result validated, and then written atomically.- Parameters:
sublayer – Subdirectory name under the cache root (e.g.
"weo_sdmx").ttl – How long a cached entry is considered fresh.
keep_n – Maximum number of entries to retain (LRU eviction removes oldest beyond this limit). Defaults to
4.
- _sublayer
- _ttl
- _keep_n = 4
- _sublayer_dir: pathlib.Path | None = None
- _get_sublayer_dir() pathlib.Path
- _final_path(key: str) pathlib.Path
- _manifest_path(key: str) pathlib.Path
- _tmp_path(final: pathlib.Path) pathlib.Path
- _lock_path(key: str) pathlib.Path
- _on_cache_dir_changed(new_root: pathlib.Path) None
- _read_manifest(key: str) dict | None
- _not_expired(manifest: dict) bool
- _atomic_write(final: pathlib.Path, content: bytes) None
Write content to final atomically via a host+pid-suffixed tmp file.
- _write_manifest(key: str, size_bytes: int) None
- _evict_lru() None
Remove the oldest entries beyond
keep_n.
- _sweep_orphan_tmp() None
Remove
*.tmp.*files older than 1 hour left by crashed processes.
- get_or_fetch(key: str, fetch_fn: collections.abc.Callable[[], bytes], *, validator: collections.abc.Callable[[bytes], None] | None = None) pathlib.Path
Return the on-disk path to the cached payload for key.
On a cache hit (entry exists and TTL has not expired) the path is returned immediately — no re-validation and no lock acquisition.
On a cache miss the lock is acquired, the entry is re-checked inside the lock (double-checked locking — another writer may have populated the entry between the unlocked check and the lock acquisition), and if still absent
fetch_fnis called to download the content. Thevalidatoris then invoked on the raw bytes; on failure the entry is removed andBulkPayloadCorruptErroris raised so the next call can retry cleanly.- Parameters:
key – Cache key (used as the filename, e.g.
"weo_april_2024.zip").fetch_fn – Zero-arg callable that downloads and returns the raw bytes.
validator – Optional callable that raises on corrupt content. Called only on a cache miss / refetch (never on a hit).
- Returns:
Absolute
Pathto the cached payload on disk.- Raises:
BulkPayloadCorruptError – When the downloaded payload fails validation.
- clear() None
Remove all cached entries (payloads and manifests) in this sublayer.