imf_reader.weo.scraper ====================== .. py:module:: imf_reader.weo.scraper .. autoapi-nested-parse:: Functions to scrape the IMF WEO website Attributes ---------- .. autoapisummary:: imf_reader.weo.scraper.BASE_URL imf_reader.weo.scraper._zip_cache Classes ------- .. autoapisummary:: imf_reader.weo.scraper.SDMXScraper Functions --------- .. autoapisummary:: imf_reader.weo.scraper._get_zip_cache imf_reader.weo.scraper.get_soup Module Contents --------------- .. py:data:: BASE_URL :value: 'https://www.imf.org/' .. py:data:: _zip_cache :value: None .. py:function:: _get_zip_cache() Return the module-level CacheManager, creating it on first access. .. py:function:: get_soup(month: str, year: str | int) -> bs4.BeautifulSoup Get the BeautifulSoup object of the IMF WEO website. :param month: The month of the data to download. Can be April or October. :param year: The year of the data to download. :returns: BeautifulSoup object of the IMF WEO website. .. py:class:: SDMXScraper Class to scrape the IMF WEO website for SDMX files. To use this class, call the scrape method with the month and year of the data to download. .. py:method:: get_sdmx_url(soup: bs4.BeautifulSoup) -> str :staticmethod: Get the url to download the WEO data in SDMX format. :param soup: BeautifulSoup object of the IMF WEO website. :returns: The url to download the SDMX data. .. py:method:: get_sdmx_folder(sdmx_url: str) -> zipfile.ZipFile :staticmethod: download SDMX data files as a zip file object :param sdmx_url: The url to download the SDMX data files. :returns: The zip file object containing the SDMX data files. .. py:method:: scrape(month: str, year: str | int) -> zipfile.ZipFile :staticmethod: Pipeline to scrape SDMX files, with disk-backed caching. The first call for a given ``(month, year)`` downloads the ~30 MB SDMX zip from the IMF website, validates it, and stores it atomically on disk. Subsequent calls within the TTL window (7 days) return the cached copy without any HTTP requests. :param month: The month of the data to download. Can be April or October. :param year: The year of the data to download. :returns: The zip file object containing the SDMX data files. :raises BulkPayloadCorruptError: If the downloaded (or cached) zip fails integrity validation.