imf_reader.weo.scraper
Functions to scrape the IMF WEO website
Attributes
Classes
Class to scrape the IMF WEO website for SDMX files. |
Functions
Return the module-level CacheManager, creating it on first access. |
|
|
Get the BeautifulSoup object of the IMF WEO website. |
Module Contents
- imf_reader.weo.scraper.BASE_URL = 'https://www.imf.org/'
- imf_reader.weo.scraper._zip_cache = None
- imf_reader.weo.scraper._get_zip_cache()
Return the module-level CacheManager, creating it on first access.
- imf_reader.weo.scraper.get_soup(month: str, year: str | int) bs4.BeautifulSoup
Get the BeautifulSoup object of the IMF WEO website.
- Parameters:
month – The month of the data to download. Can be April or October.
year – The year of the data to download.
- Returns:
BeautifulSoup object of the IMF WEO website.
- class imf_reader.weo.scraper.SDMXScraper
Class to scrape the IMF WEO website for SDMX files. To use this class, call the scrape method with the month and year of the data to download.
- static get_sdmx_url(soup: bs4.BeautifulSoup) str
Get the url to download the WEO data in SDMX format.
- Parameters:
soup – BeautifulSoup object of the IMF WEO website.
- Returns:
The url to download the SDMX data.
- static get_sdmx_folder(sdmx_url: str) zipfile.ZipFile
download SDMX data files as a zip file object
- Parameters:
sdmx_url – The url to download the SDMX data files.
- Returns:
The zip file object containing the SDMX data files.
- static scrape(month: str, year: str | int) zipfile.ZipFile
Pipeline to scrape SDMX files, with disk-backed caching.
The first call for a given
(month, year)downloads the ~30 MB SDMX zip from the IMF website, validates it, and stores it atomically on disk. Subsequent calls within the TTL window (7 days) return the cached copy without any HTTP requests.- Parameters:
month – The month of the data to download. Can be April or October.
year – The year of the data to download.
- Returns:
The zip file object containing the SDMX data files.
- Raises:
BulkPayloadCorruptError – If the downloaded (or cached) zip fails integrity validation.