imf_reader.weo.scraper

Functions to scrape the IMF WEO website

Attributes

BASE_URL

_zip_cache

Classes

SDMXScraper

Class to scrape the IMF WEO website for SDMX files.

Functions

_get_zip_cache()

Return the module-level CacheManager, creating it on first access.

get_soup(→ bs4.BeautifulSoup)

Get the BeautifulSoup object of the IMF WEO website.

Module Contents

imf_reader.weo.scraper.BASE_URL = 'https://www.imf.org/'
imf_reader.weo.scraper._zip_cache = None
imf_reader.weo.scraper._get_zip_cache()

Return the module-level CacheManager, creating it on first access.

imf_reader.weo.scraper.get_soup(month: str, year: str | int) bs4.BeautifulSoup

Get the BeautifulSoup object of the IMF WEO website.

Parameters:
  • month – The month of the data to download. Can be April or October.

  • year – The year of the data to download.

Returns:

BeautifulSoup object of the IMF WEO website.

class imf_reader.weo.scraper.SDMXScraper

Class to scrape the IMF WEO website for SDMX files. To use this class, call the scrape method with the month and year of the data to download.

static get_sdmx_url(soup: bs4.BeautifulSoup) str

Get the url to download the WEO data in SDMX format.

Parameters:

soup – BeautifulSoup object of the IMF WEO website.

Returns:

The url to download the SDMX data.

static get_sdmx_folder(sdmx_url: str) zipfile.ZipFile

download SDMX data files as a zip file object

Parameters:

sdmx_url – The url to download the SDMX data files.

Returns:

The zip file object containing the SDMX data files.

static scrape(month: str, year: str | int) zipfile.ZipFile

Pipeline to scrape SDMX files, with disk-backed caching.

The first call for a given (month, year) downloads the ~30 MB SDMX zip from the IMF website, validates it, and stores it atomically on disk. Subsequent calls within the TTL window (7 days) return the cached copy without any HTTP requests.

Parameters:
  • month – The month of the data to download. Can be April or October.

  • year – The year of the data to download.

Returns:

The zip file object containing the SDMX data files.

Raises:

BulkPayloadCorruptError – If the downloaded (or cached) zip fails integrity validation.