Back to Main Site

Data Ingestion and Restitution Pipeline

Architecture of the weather data pipeline from acquisition through database storage to route-ready interpolated grids

Technical Article
This article describes the data engineering architecture that underpins WindMar's weather-aware route optimization. It covers how meteorological and oceanographic data are acquired from external providers, compressed and stored in PostgreSQL, retrieved on demand, and delivered to the A* optimizer as temporally interpolated grid fields. For the definition of individual weather parameters, see Weather Fields. For details on the upstream data sources themselves, see Weather Data Acquisition.

Abstract

Weather-aware maritime route optimization requires the ability to serve spatially and temporally varying meteorological fields — wind, waves, swell, wind-wave decomposition, and ocean currents — to a graph-search algorithm that evaluates thousands of candidate route segments per optimization pass. Each evaluation demands weather conditions at a specific geographic coordinate and future time, placing stringent requirements on retrieval latency: the data must be available in sub-millisecond time per query, yet the underlying forecast grids are large, multi-parameter, and arrive from remote APIs with response times measured in seconds to minutes. This article presents the four-stage pipeline that WindMar implements to bridge this gap: (1) acquisition from GFS and CMEMS forecast services, (2) zlib compression and storage in a PostgreSQL schema optimised for bulk retrieval, (3) decompression with spatial cropping via boolean masking, and (4) trilinear interpolation across latitude, longitude, and forecast hour to produce a continuous weather field for the voyage calculator. The pipeline ingests 14 distinct meteorological parameters across up to 41 forecast timesteps, achieving typical retrieval latencies below 100 milliseconds and per-point interpolation costs below 1 millisecond.

1. Introduction

The central engineering challenge in weather-aware route optimization is the mismatch between the time scale at which weather data can be fetched from external services and the time scale at which the optimizer needs to query it. A single CMEMS wave forecast download for the global domain can take 30 seconds to several minutes depending on network conditions and server load. A single GFS GRIB2 file download requires approximately 2 seconds, with NOAA imposing rate limits on consecutive requests. The A* pathfinding algorithm, by contrast, expands hundreds to thousands of nodes per second, and each node expansion requires weather conditions at the corresponding position and time. Querying an external API for each node expansion would make the optimization infeasible.

WindMar resolves this mismatch through a database-backed ingestion pipeline that decouples the slow, IO-bound acquisition process from the fast, CPU-bound optimization process. Weather data is fetched once, compressed, and stored in PostgreSQL. The optimizer then reads from the database, benefiting from local disk latency rather than network latency. An additional temporal interpolation layer sits between the database and the optimizer, providing a continuous weather field from discrete forecast timesteps. This architecture also provides offline capability: once data has been ingested, routes can be optimized without any network connectivity.

The pipeline consists of four stages arranged in a linear data flow:

  1. Acquisition — Fetching raw forecast grids from GFS (wind), CMEMS (waves, currents), and ERA5 (fallback) via their respective APIs and client libraries.
  2. Compression and Storage — Converting numpy arrays to zlib-compressed binary blobs and writing them to PostgreSQL alongside forecast run metadata.
  3. Retrieval and Decompression — Selecting the latest complete forecast run, loading the relevant parameter grids, decompressing them, and cropping to the voyage bounding box.
  4. Temporal Interpolation — Interpolating between adjacent forecast timesteps in three dimensions (latitude, longitude, time) to produce a weather value at any arbitrary position and time within the forecast horizon.

Each stage is implemented as a distinct Python class with a well-defined interface, enabling independent testing and replacement. The following sections describe each stage in detail, with references to the source modules: copernicus.py (acquisition), weather_ingestion.py (compression and storage), db_weather_provider.py (retrieval), and temporal_weather_provider.py (interpolation).

2. Data Acquisition Layer

The acquisition layer is implemented in CopernicusDataProvider (module copernicus.py) and is responsible for fetching raw forecast grids from three external data sources. Each source serves a distinct set of meteorological parameters and uses a different access protocol, but all produce the same output format: a dictionary mapping parameter names to numpy arrays on a regular latitude-longitude grid, packaged in a WeatherData dataclass.

2.1 Three Ingestion Paths

The acquisition layer implements three distinct ingestion paths, each tailored to the upstream data source:

  • Wind (GFS) — The Global Forecast System provides 10-metre U and V wind components via the NOAA NOMADS GRIB filter. Files are downloaded as GRIB2, parsed with cfgrib and xarray, and converted to numpy arrays. Each file is approximately 80 KB for a Mediterranean-scale subregion. NOMADS imposes a rate limit that WindMar respects with a 2-second delay between consecutive requests, yielding ~82 seconds for a full 41-file forecast sequence.
  • Wave forecast (CMEMS) — The Copernicus Marine Environment Monitoring Service provides significant wave height, peak period, direction, and swell/wind-wave decomposition via the copernicusmarine Python library. The wave ingestion path supports multi-timestep downloads, fetching forecast hours 0 through 120 in 3-hour increments (41 timesteps) in a single API call that returns an xarray.Dataset.
  • Current forecast (CMEMS) — The same Copernicus infrastructure provides ocean surface current U and V components. Like wave data, current forecasts support multi-timestep ingestion across the 0–120h forecast horizon.

2.2 Bounding Box and Temporal Windowing

All three paths apply spatial and temporal windowing at the API level to minimise download volume. The bounding box is specified in the acquisition request and restricts the returned grid to the geographic extent of the planned voyage, with a configurable margin. Temporal windowing limits the forecast hours to those relevant for the voyage duration, although the default behaviour is to download the full 0–120h range to support both the current optimization and any subsequent re-optimizations within the same forecast cycle.

2.3 Fallback Hierarchy

When the primary data source is unavailable — due to API outages, credential configuration errors, or network interruptions — the acquisition layer follows a fallback hierarchy to ensure the optimizer always has weather data available:

Wind:     DB cache (previous run) --> GFS live (NOMADS) --> ERA5 live (CDS)
Waves:    DB cache (previous run) --> CMEMS live
Currents: DB cache (previous run) --> CMEMS live

The database cache is always tried first, as it provides the lowest latency and does not depend on external network connectivity. If the latest cached run is older than the supersession threshold (24 hours), the system falls back to a live API call. For wind data, ERA5 reanalysis serves as a tertiary fallback, providing gap-free historical coverage at the cost of a ~5-day data lag.

2.4 Data Source Comparison

Source Protocol Spatial Resolution Temporal Resolution Parameters Typical Download
GFS (NOMADS) HTTP GRIB filter 0.25° (~28 km) 3h steps, 0–120h wind_u, wind_v ~80 KB per file, ~3.3 MB total
CMEMS Waves copernicusmarine SDK 0.083° (~9 km) 3h steps, 0–120h wave_hs, wave_tp, wave_dir, swell_*, windwave_* ~50–200 MB (global)
CMEMS Currents copernicusmarine SDK 0.083° (~9 km) 3h steps, 0–120h current_u, current_v ~20–80 MB (global)
ERA5 (fallback) CDS API 0.25° (~31 km) Hourly (reanalysis) wind_u, wind_v Variable (~5–50 MB)

3. Ingestion and Compression

The ingestion layer is implemented in WeatherIngestionService (module weather_ingestion.py) and is responsible for transforming raw forecast grids from the acquisition layer into compressed binary records in PostgreSQL. The service operates on a global grid spanning latitudes −85° to 85° and longitudes −179.75° to 179.75° at a uniform resolution of 0.5°, although subsets of this grid may be ingested when bounding-box windowing is applied at the acquisition stage.

3.1 Orchestration

The top-level ingest_all() method orchestrates the ingestion of wave and current data. Wind data is not ingested through this path because GFS downloads are already fast enough to be cached on the filesystem (via the GRIB cache described in Weather Data Acquisition) and served directly to the frontend forecast timeline. Wave and current data, by contrast, arrive as large xarray datasets that benefit from database-backed compression and indexed retrieval.

For single-snapshot ingestion, ingest_waves() and ingest_currents() fetch the current analysis (forecast hour 0) and store a single grid per parameter. For multi-timestep forecast ingestion, ingest_wave_forecast_frames() and ingest_current_forecast_frames() accept a dictionary mapping forecast hours to WeatherData objects and store all 41 timesteps (0, 3, 6, …, 120h) under a single forecast run identifier.

3.2 Database Schema

The ingestion service writes to two PostgreSQL tables that together form a two-level hierarchy: forecast runs and grid data. The weather_forecast_runs table records metadata about each ingestion event, while weather_grid_data stores the actual compressed arrays with a foreign key to the parent run.

-- Forecast run metadata
CREATE TABLE weather_forecast_runs (
    id              SERIAL PRIMARY KEY,
    source          VARCHAR(50) NOT NULL,    -- 'CMEMS_wave', 'CMEMS_current', 'GFS', 'ERA5'
    run_time        TIMESTAMP NOT NULL,      -- Model initialisation time
    status          VARCHAR(20) NOT NULL,    -- 'ingesting', 'complete', 'superseded'
    grid_resolution FLOAT,                   -- 0.5 (degrees)
    lat_min         FLOAT,
    lat_max         FLOAT,
    lon_min         FLOAT,
    lon_max         FLOAT,
    forecast_hours  INTEGER,                 -- Number of timesteps ingested
    ingested_at     TIMESTAMP DEFAULT NOW()
);

-- Compressed grid arrays
CREATE TABLE weather_grid_data (
    id              SERIAL PRIMARY KEY,
    run_id          INTEGER REFERENCES weather_forecast_runs(id),
    forecast_hour   INTEGER NOT NULL,        -- 0, 3, 6, ..., 120
    parameter       VARCHAR(50) NOT NULL,    -- 'wave_hs', 'wind_u', etc.
    lats            BYTEA NOT NULL,          -- zlib-compressed float32 1D array
    lons            BYTEA NOT NULL,          -- zlib-compressed float32 1D array
    data            BYTEA NOT NULL,          -- zlib-compressed float32 2D array (flattened)
    shape_rows      INTEGER NOT NULL,        -- Number of latitude points
    shape_cols      INTEGER NOT NULL,        -- Number of longitude points
    UNIQUE(run_id, forecast_hour, parameter)
);

The UNIQUE(run_id, forecast_hour, parameter) constraint ensures that each combination of forecast run, timestep, and meteorological parameter is stored exactly once. This enables idempotent re-ingestion: if an ingestion is interrupted and restarted, the service can safely re-insert rows without creating duplicates.

3.3 Compression

Before storage, each numpy array is converted to float32 and compressed using zlib (deflate). The compression is applied independently to the latitude vector, longitude vector, and data matrix for each (run, forecast_hour, parameter) tuple. This approach allows selective decompression: for operations that only need the grid geometry (e.g., determining spatial extent), the coordinate vectors can be decompressed without touching the data blob.

Compression Pipeline
numpy.ndarray (float64) → .astype(float32) → .tobytes() → zlib.compress() → BYTEA column

Typical compression ratios for meteorological grids range from 3:1 to 8:1, depending on the spatial smoothness of the field. Wave height grids, which exhibit large-scale spatial coherence, compress more effectively than current grids, which contain fine-scale eddies and sharp coastal gradients.

3.4 Stored Parameters

The ingestion service stores 14 distinct meteorological parameters across three data sources. Each parameter is stored as a separate row in weather_grid_data, enabling the retrieval layer to load only the parameters required by the current computation.

Parameter Source Description Unit
wind_uGFSEastward wind component at 10 mm/s
wind_vGFSNorthward wind component at 10 mm/s
wave_hsCMEMSCombined significant wave heightm
wave_tpCMEMSPeak wave periods
wave_dirCMEMSMean wave directiondeg
swell_hsCMEMSSwell significant wave heightm
swell_tpCMEMSSwell peak periods
swell_dirCMEMSSwell mean directiondeg
windwave_hsCMEMSWind-wave significant wave heightm
windwave_tpCMEMSWind-wave peak periods
windwave_dirCMEMSWind-wave mean directiondeg
current_uCMEMSEastward ocean surface currentm/s
current_vCMEMSNorthward ocean surface currentm/s
current_speedDerivedCurrent magnitude (derived from U/V)m/s

3.5 Run Supersession

To prevent unbounded growth of the database, the ingestion service implements a supersession mechanism. After each successful ingestion, the _supersede_old_runs() method marks all forecast runs older than 24 hours as 'superseded'. Superseded runs are excluded from the retrieval layer's run selection query, but their data remains in the database for potential historical analysis until explicitly purged by an administrative process.

4. Data Retrieval and Decompression

The retrieval layer is implemented in DbWeatherProvider (module db_weather_provider.py) and provides the interface between the compressed PostgreSQL storage and the temporal interpolation layer. Its primary responsibilities are: selecting the appropriate forecast run, matching forecast hours to target times, decompressing grid data, and cropping to the voyage bounding box. The class exposes three high-level methods — get_wind_from_db(), get_wave_from_db(), and get_current_from_db() — as well as a bulk retrieval method for multi-timestep pre-fetching.

4.1 Run Selection

The _find_latest_run(source) method selects the most recent forecast run for a given data source (e.g., 'CMEMS_wave' or 'GFS'). The selection criteria are:

  1. The run must have status = 'complete' (excluding in-progress and superseded runs).
  2. Among complete runs, the one with the most recent run_time is preferred.
  3. If multiple runs share the same run_time, the one with the most forecast_hours is selected, as it provides the greatest temporal coverage for multi-day voyages.

4.2 Forecast Hour Matching

Given a target time (e.g., the time at which a vessel will reach a particular waypoint), the _best_forecast_hour(run_id, target_time) method identifies the closest available forecast hour within the selected run. The method queries the distinct forecast hours stored for the given run and returns the one that minimises the absolute difference between the target time and the run initialisation time plus the forecast hour offset. This snapping behaviour is adequate for the retrieval layer; finer temporal resolution is handled by the interpolation layer described in Section 5.

4.3 Decompression

The _load_grid() method reads a single compressed grid from the database and reconstructs the original numpy arrays. The decompression path mirrors the compression pipeline in reverse:

Decompression Pipeline
BYTEA column → zlib.decompress() → numpy.frombuffer(dtype=float32) → reshape(rows, cols)

Two helper methods handle the dimensionality: _decompress_1d(blob) produces a flat vector (used for latitude and longitude coordinate arrays), while _decompress_2d(blob, rows, cols) produces a 2D matrix by reshaping the flat buffer according to the stored shape_rows and shape_cols metadata. The use of explicit shape metadata rather than inferring dimensions from the blob size provides a consistency check: if the decompressed buffer length does not match rows × cols, an error is raised.

4.4 Bounding Box Cropping

After decompression, the grid is cropped to the voyage bounding box using boolean masking. The _crop_grid() method constructs boolean index arrays for the latitude and longitude dimensions and applies them to extract the rectangular sub-grid corresponding to the voyage area. This approach handles both ascending and descending latitude orderings, which differ between data sources (GFS stores latitudes south-to-north; CMEMS stores them north-to-south).

Spatial Cropping via Boolean Masking
lat_mask = (lats >= lat_min) & (lats <= lat_max) lon_mask = (lons >= lon_min) & (lons <= lon_max) lats_cropped = lats[lat_mask] lons_cropped = lons[lon_mask] data_cropped = data[np.ix_(lat_mask, lon_mask)]

The np.ix_ construct generates an open mesh from the two boolean vectors, enabling extraction of the sub-matrix without flattening the 2D structure. For a Mediterranean voyage (30–50°N, 5°W–35°E) on a 0.5° global grid, cropping reduces the grid from 340 × 720 points to approximately 40 × 80 points — a 77-fold reduction in memory footprint and proportional speedup in subsequent interpolation operations.

4.5 Bulk Retrieval

For multi-day voyage optimization, the temporal interpolation layer requires grids at multiple forecast hours simultaneously. The get_grids_for_timeline() method provides a single-pass bulk retrieval interface that loads all requested parameters across all relevant forecast hours in one database round-trip. It returns a nested dictionary structured as parameter → {forecast_hour → (lats, lons, data)}, which the temporal provider consumes directly. A specialized variant, get_wave_grids_for_timeline(), pre-fetches all wave-related parameters (wave, swell, wind-wave decomposition) in a single coordinated call.

5. Temporal Weather Provisioning

The TemporalGridWeatherProvider (module temporal_weather_provider.py) is the final stage of the pipeline and the direct interface consumed by the voyage calculator and route optimizer. It receives the pre-fetched multi-timestep grids from DbWeatherProvider and exposes a single method — get_weather(lat, lon, time) — that returns a LegWeather dataclass containing all meteorological fields needed by the vessel performance model. The method performs trilinear interpolation across three axes: latitude, longitude, and forecast hour.

5.1 Internal Data Structure

The provider is constructed with a run_time (the forecast model initialisation time) and a grids dictionary with the structure parameter → {forecast_hour → (lats, lons, data)}. Parameters are grouped by type:

  • Vector parameters — Wind (wind_u, wind_v) and current (current_u, current_v) are stored as U/V component pairs. Interpolation is performed independently on each component, and the magnitude and direction are computed from the interpolated components.
  • Wave parameters — Combined wave (wave_hs, wave_tp, wave_dir), swell (swell_hs, swell_tp, swell_dir), and wind-wave (windwave_hs, windwave_tp, windwave_dir) are interpolated as scalar fields.

5.2 Trilinear Interpolation Algorithm

The core interpolation method, _interp_temporal(), brackets the requested forecast hour between the two nearest available timesteps (h0 and h1), performs bilinear spatial interpolation at each timestep independently, and then blends the two results with a linear temporal weight:

Temporal Linear Interpolation
\[ v_{\text{final}} = v(h_0) \cdot (1 - \alpha) + v(h_1) \cdot \alpha \] \[ \alpha = \frac{fh - h_0}{h_1 - h_0} \]
where \( fh \) = fractional forecast hour corresponding to the query time
\( h_0 \) = greatest available forecast hour \( \le fh \)
\( h_1 \) = smallest available forecast hour \( \ge fh \)

When the query time falls exactly on an available forecast hour (i.e., h0 = h1), the temporal blending reduces to a single spatial interpolation with α = 0. At each timestep, the spatial interpolation uses bilinear interpolation on the regular latitude-longitude grid, locating the four surrounding grid cells and weighting by the fractional position within the cell.

5.3 Vector Parameter Handling

For vector quantities (wind and current), interpolation is performed on the Cartesian components (U and V) rather than on magnitude and direction. This avoids the discontinuity and non-linearity that would arise from interpolating angular quantities directly. After interpolation, the provider computes derived scalars:

U/V to Speed and Direction Conversion
\[ \text{speed} = \sqrt{u^2 + v^2} \] \[ \text{direction} = (270° - \arctan2(v, u)) \bmod 360° \]

This convention produces meteorological direction (the direction from which the wind or current is coming), consistent with standard maritime practice where a 270° wind blows from the west.

5.4 NaN and Coastal Cell Handling

Coastal grid cells in ocean models frequently contain NaN values where the model grid intersects land. Rather than propagating NaN through the interpolation (which would cause the vessel model to fail), the provider implements a fallback strategy: any interpolated value that is NaN or infinite is replaced with 0.0. This is physically reasonable for coastal cells where the query point is near land: wind and current speeds approach zero in sheltered waters, and wave heights are attenuated by shallow-water effects. The fallback ensures that the optimizer never encounters undefined values while traversing coastal waypoints.

5.5 End-to-End Data Flow

The complete data flow from external API to route optimizer is summarized below:

+------------------+     +------------------+     +--------------------+
|   CMEMS / GFS    |     |   PostgreSQL DB  |     |  DbWeatherProvider |
|  (External API)  | --> |  weather_forecast |  -> |  (Decompress +    |
|                  |     |  _runs / _data   |     |   Crop to BBox)   |
+------------------+     +------------------+     +--------------------+
                                                          |
                                                          v
                          +----------------------------+--+-------------------+
                          |  TemporalGridWeatherProvider                      |
                          |  - Brackets forecast hours (h0, h1)              |
                          |  - Bilinear spatial interpolation at each        |
                          |  - Linear temporal blend: v = v0*(1-a) + v1*a    |
                          |  - U/V -> speed/direction conversion             |
                          +----------------------------+---------------------+
                                                       |
                                                       v
                          +----------------------------+---------------------+
                          |  VoyageCalculator / RouteOptimizer (A*)          |
                          |  - get_weather(lat, lon, time) -> LegWeather     |
                          |  - Sub-millisecond per query                     |
                          +--------------------------------------------------+

5.1 Data Provenance and Confidence

Each weather query carries a WeatherProvenance dataclass that records the lineage of the returned data. This metadata enables downstream consumers — particularly the Monte Carlo uncertainty engine (see Parametric Monte Carlo Simulation) — to weight their confidence in the weather values and adjust perturbation magnitudes accordingly.

The provenance dataclass contains four fields:

Field Type Description Example Values
source_type str Category of the data source "forecast", "hindcast", "climatology", "blended"
model_name str Name of the originating model "GFS", "CMEMS_wave", "CMEMS_current", "ERA5"
forecast_lead_hours float Hours between model init and query time 0.0 – 240.0
confidence str Qualitative confidence level "high", "medium", "low"

Confidence Levels

The confidence field is assigned based on the forecast lead time:

  • High (< 72 hours) — The forecast is within the range where NWP models exhibit skill significantly above climatology. Wind speed RMS errors are typically 1.5–2.5 m/s; significant wave height errors are 0.2–0.5 m.
  • Medium (72–120 hours) — The forecast remains useful for synoptic-scale features (major storm systems, prevailing wind patterns) but mesoscale details are increasingly unreliable. The Monte Carlo engine increases perturbation magnitudes in this range.
  • Low (> 120 hours or climatological data) — The forecast approaches climatological skill. For voyages extending beyond the 10-day forecast horizon (FORECAST_HORIZON_DAYS = 10), the system transitions to a blended mode that interpolates between the latest available forecast and climatological means over a 2-day blending window (BLEND_WINDOW_DAYS = 2).

The route optimizer uses the confidence level to weight uncertainty in its cost function. Routes that rely heavily on low-confidence weather data incur a penalty that biases the optimizer toward paths with more reliable forecasts — typically shorter routes or routes that avoid weather-sensitive ocean regions in the extended forecast period.

6. Caching and Performance

Performance is critical at every stage of the pipeline, as the end-to-end latency directly affects the user experience: a route optimization request should complete within a few seconds, not minutes. WindMar employs a multi-level caching strategy that minimises redundant computation and data transfer at each stage.

6.1 Redis Caching for API Responses

The frontend API layer uses Redis to cache serialized weather responses. When the frontend requests wave forecast frames, wind velocity data, or current visualization grids, the API first checks the Redis cache for a recent result matching the requested parameters and geographic bounds. Cache entries are keyed by data source, forecast hour, and bounding box coordinates, with a time-to-live that aligns with the forecast model update cycle (typically 6 hours for GFS, 12 hours for CMEMS). This caching layer eliminates redundant database queries when multiple users view the same geographic region or when the frontend re-requests data after a page reload.

6.2 Pre-fetch Strategy for Multi-Day Voyages

When a route optimization is initiated, the TemporalGridWeatherProvider is constructed with all forecast hours that the voyage will span. The get_grids_for_timeline() bulk retrieval method loads all parameters across all relevant forecast hours in a single database transaction, rather than issuing individual queries as each waypoint is evaluated. For a 5-day voyage with 14 parameters and 41 forecast hours, this reduces the number of database round-trips from potentially thousands (one per node expansion) to a single bulk query.

6.3 Performance Characteristics

The following table summarizes the typical latency at each pipeline stage, measured on a standard development machine with PostgreSQL running locally:

Stage Operation Typical Latency
Acquisition Full GFS 41-frame download ~82 seconds (rate-limited)
Acquisition CMEMS wave forecast download 30–120 seconds (varies)
Ingestion Compress and store 41 timesteps × 14 params 5–15 seconds
Retrieval Bulk load all grids for a voyage < 100 ms
Interpolation Single get_weather() call < 1 ms
Redis cache Cached API response lookup < 5 ms

The key insight is the three-orders-of-magnitude gap between the acquisition stage (minutes) and the retrieval/interpolation stages (milliseconds). The database acts as an impedance-matching layer that absorbs the slow, bursty acquisition process and presents a fast, uniform interface to the optimizer. Once data has been ingested, route optimizations can run repeatedly against the same forecast data without incurring any network latency, and each A* node expansion pays only the sub-millisecond cost of trilinear interpolation against in-memory grids.

6.4 Pipeline Summary

From the user's perspective, the pipeline operates transparently. A background ingestion task fetches and stores the latest forecasts on a scheduled cadence. When the user requests a route optimization, the system selects the most recent complete forecast run, loads the relevant grids into memory, constructs a TemporalGridWeatherProvider, and passes it to the A* optimizer. The optimizer calls get_weather(lat, lon, time) at each candidate waypoint, receiving interpolated conditions in sub-millisecond time. The total wall-clock time for a route optimization is dominated by the A* graph search itself, not by weather data retrieval.

References

  1. CMEMS, “Global Ocean Waves Analysis and Forecast — Product User Manual,” Copernicus Marine Environment Monitoring Service, 2024.
  2. CMEMS, “Global Ocean Physics Analysis and Forecast — Product User Manual,” Copernicus Marine Environment Monitoring Service, 2024.
  3. NCEP/NOAA, “GFS Model Documentation and GRIB2 Format Specification,” National Centers for Environmental Prediction, 2023.
  4. Hersbach, H. et al., “The ERA5 Global Reanalysis,” Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020.
  5. PostgreSQL Global Development Group, “PostgreSQL Documentation — Large Objects and Binary Data,” 2024.
  6. Hoyer, S. and Hamman, J., “xarray: N-D labeled arrays and datasets in Python,” Journal of Open Research Software, vol. 5, no. 1, 2017.