Data Ingestion and Restitution Pipeline
Architecture of the weather data pipeline from acquisition through database storage to route-ready interpolated grids
This article describes the data engineering architecture that underpins WindMar's weather-aware route optimization. It covers how meteorological and oceanographic data are acquired from external providers, compressed and stored in PostgreSQL, retrieved on demand, and delivered to the A* optimizer as temporally interpolated grid fields. For the definition of individual weather parameters, see Weather Fields. For details on the upstream data sources themselves, see Weather Data Acquisition.
Abstract
Weather-aware maritime route optimization requires the ability to serve spatially and temporally varying meteorological fields — wind, waves, swell, wind-wave decomposition, and ocean currents — to a graph-search algorithm that evaluates thousands of candidate route segments per optimization pass. Each evaluation demands weather conditions at a specific geographic coordinate and future time, placing stringent requirements on retrieval latency: the data must be available in sub-millisecond time per query, yet the underlying forecast grids are large, multi-parameter, and arrive from remote APIs with response times measured in seconds to minutes. This article presents the four-stage pipeline that WindMar implements to bridge this gap: (1) acquisition from GFS and CMEMS forecast services, (2) zlib compression and storage in a PostgreSQL schema optimised for bulk retrieval, (3) decompression with spatial cropping via boolean masking, and (4) trilinear interpolation across latitude, longitude, and forecast hour to produce a continuous weather field for the voyage calculator. The pipeline ingests 14 distinct meteorological parameters across up to 41 forecast timesteps, achieving typical retrieval latencies below 100 milliseconds and per-point interpolation costs below 1 millisecond.
1. Introduction
The central engineering challenge in weather-aware route optimization is the mismatch between the time scale at which weather data can be fetched from external services and the time scale at which the optimizer needs to query it. A single CMEMS wave forecast download for the global domain can take 30 seconds to several minutes depending on network conditions and server load. A single GFS GRIB2 file download requires approximately 2 seconds, with NOAA imposing rate limits on consecutive requests. The A* pathfinding algorithm, by contrast, expands hundreds to thousands of nodes per second, and each node expansion requires weather conditions at the corresponding position and time. Querying an external API for each node expansion would make the optimization infeasible.
WindMar resolves this mismatch through a database-backed ingestion pipeline that decouples the slow, IO-bound acquisition process from the fast, CPU-bound optimization process. Weather data is fetched once, compressed, and stored in PostgreSQL. The optimizer then reads from the database, benefiting from local disk latency rather than network latency. An additional temporal interpolation layer sits between the database and the optimizer, providing a continuous weather field from discrete forecast timesteps. This architecture also provides offline capability: once data has been ingested, routes can be optimized without any network connectivity.
The pipeline consists of four stages arranged in a linear data flow:
- Acquisition — Fetching raw forecast grids from GFS (wind), CMEMS (waves, currents), and ERA5 (fallback) via their respective APIs and client libraries.
- Compression and Storage — Converting
numpyarrays to zlib-compressed binary blobs and writing them to PostgreSQL alongside forecast run metadata. - Retrieval and Decompression — Selecting the latest complete forecast run, loading the relevant parameter grids, decompressing them, and cropping to the voyage bounding box.
- Temporal Interpolation — Interpolating between adjacent forecast timesteps in three dimensions (latitude, longitude, time) to produce a weather value at any arbitrary position and time within the forecast horizon.
Each stage is implemented as a distinct Python class with a well-defined interface, enabling
independent testing and replacement. The following sections describe each stage in detail, with
references to the source modules: copernicus.py (acquisition),
weather_ingestion.py (compression and storage), db_weather_provider.py
(retrieval), and temporal_weather_provider.py (interpolation).
2. Data Acquisition Layer
The acquisition layer is implemented in CopernicusDataProvider (module
copernicus.py) and is responsible for fetching raw forecast grids from three
external data sources. Each source serves a distinct set of meteorological parameters and
uses a different access protocol, but all produce the same output format: a dictionary mapping
parameter names to numpy arrays on a regular latitude-longitude grid, packaged
in a WeatherData dataclass.
2.1 Three Ingestion Paths
The acquisition layer implements three distinct ingestion paths, each tailored to the upstream data source:
- Wind (GFS) — The Global Forecast System provides 10-metre U and V
wind components via the NOAA NOMADS GRIB filter. Files are downloaded as GRIB2, parsed
with
cfgribandxarray, and converted to numpy arrays. Each file is approximately 80 KB for a Mediterranean-scale subregion. NOMADS imposes a rate limit that WindMar respects with a 2-second delay between consecutive requests, yielding ~82 seconds for a full 41-file forecast sequence. - Wave forecast (CMEMS) — The Copernicus Marine Environment Monitoring
Service provides significant wave height, peak period, direction, and swell/wind-wave
decomposition via the
copernicusmarinePython library. The wave ingestion path supports multi-timestep downloads, fetching forecast hours 0 through 120 in 3-hour increments (41 timesteps) in a single API call that returns anxarray.Dataset. - Current forecast (CMEMS) — The same Copernicus infrastructure provides ocean surface current U and V components. Like wave data, current forecasts support multi-timestep ingestion across the 0–120h forecast horizon.
2.2 Bounding Box and Temporal Windowing
All three paths apply spatial and temporal windowing at the API level to minimise download volume. The bounding box is specified in the acquisition request and restricts the returned grid to the geographic extent of the planned voyage, with a configurable margin. Temporal windowing limits the forecast hours to those relevant for the voyage duration, although the default behaviour is to download the full 0–120h range to support both the current optimization and any subsequent re-optimizations within the same forecast cycle.
2.3 Fallback Hierarchy
When the primary data source is unavailable — due to API outages, credential configuration errors, or network interruptions — the acquisition layer follows a fallback hierarchy to ensure the optimizer always has weather data available:
Wind: DB cache (previous run) --> GFS live (NOMADS) --> ERA5 live (CDS) Waves: DB cache (previous run) --> CMEMS live Currents: DB cache (previous run) --> CMEMS live
The database cache is always tried first, as it provides the lowest latency and does not depend on external network connectivity. If the latest cached run is older than the supersession threshold (24 hours), the system falls back to a live API call. For wind data, ERA5 reanalysis serves as a tertiary fallback, providing gap-free historical coverage at the cost of a ~5-day data lag.
2.4 Data Source Comparison
| Source | Protocol | Spatial Resolution | Temporal Resolution | Parameters | Typical Download |
|---|---|---|---|---|---|
| GFS (NOMADS) | HTTP GRIB filter | 0.25° (~28 km) | 3h steps, 0–120h | wind_u, wind_v | ~80 KB per file, ~3.3 MB total |
| CMEMS Waves | copernicusmarine SDK |
0.083° (~9 km) | 3h steps, 0–120h | wave_hs, wave_tp, wave_dir, swell_*, windwave_* | ~50–200 MB (global) |
| CMEMS Currents | copernicusmarine SDK |
0.083° (~9 km) | 3h steps, 0–120h | current_u, current_v | ~20–80 MB (global) |
| ERA5 (fallback) | CDS API | 0.25° (~31 km) | Hourly (reanalysis) | wind_u, wind_v | Variable (~5–50 MB) |
3. Ingestion and Compression
The ingestion layer is implemented in WeatherIngestionService (module
weather_ingestion.py) and is responsible for transforming raw forecast grids
from the acquisition layer into compressed binary records in PostgreSQL. The service
operates on a global grid spanning latitudes −85° to 85° and longitudes
−179.75° to 179.75° at a uniform resolution of 0.5°, although subsets
of this grid may be ingested when bounding-box windowing is applied at the acquisition
stage.
3.1 Orchestration
The top-level ingest_all() method orchestrates the ingestion of wave and
current data. Wind data is not ingested through this path because GFS downloads are
already fast enough to be cached on the filesystem (via the GRIB cache described in
Weather Data Acquisition) and served directly to
the frontend forecast timeline. Wave and current data, by contrast, arrive as large
xarray datasets that benefit from database-backed compression and indexed retrieval.
For single-snapshot ingestion, ingest_waves() and ingest_currents()
fetch the current analysis (forecast hour 0) and store a single grid per parameter. For
multi-timestep forecast ingestion, ingest_wave_forecast_frames() and
ingest_current_forecast_frames() accept a dictionary mapping forecast hours
to WeatherData objects and store all 41 timesteps (0, 3, 6, …, 120h)
under a single forecast run identifier.
3.2 Database Schema
The ingestion service writes to two PostgreSQL tables that together form a two-level
hierarchy: forecast runs and grid data. The weather_forecast_runs table
records metadata about each ingestion event, while weather_grid_data stores
the actual compressed arrays with a foreign key to the parent run.
-- Forecast run metadata
CREATE TABLE weather_forecast_runs (
id SERIAL PRIMARY KEY,
source VARCHAR(50) NOT NULL, -- 'CMEMS_wave', 'CMEMS_current', 'GFS', 'ERA5'
run_time TIMESTAMP NOT NULL, -- Model initialisation time
status VARCHAR(20) NOT NULL, -- 'ingesting', 'complete', 'superseded'
grid_resolution FLOAT, -- 0.5 (degrees)
lat_min FLOAT,
lat_max FLOAT,
lon_min FLOAT,
lon_max FLOAT,
forecast_hours INTEGER, -- Number of timesteps ingested
ingested_at TIMESTAMP DEFAULT NOW()
);
-- Compressed grid arrays
CREATE TABLE weather_grid_data (
id SERIAL PRIMARY KEY,
run_id INTEGER REFERENCES weather_forecast_runs(id),
forecast_hour INTEGER NOT NULL, -- 0, 3, 6, ..., 120
parameter VARCHAR(50) NOT NULL, -- 'wave_hs', 'wind_u', etc.
lats BYTEA NOT NULL, -- zlib-compressed float32 1D array
lons BYTEA NOT NULL, -- zlib-compressed float32 1D array
data BYTEA NOT NULL, -- zlib-compressed float32 2D array (flattened)
shape_rows INTEGER NOT NULL, -- Number of latitude points
shape_cols INTEGER NOT NULL, -- Number of longitude points
UNIQUE(run_id, forecast_hour, parameter)
);
The UNIQUE(run_id, forecast_hour, parameter) constraint ensures that each
combination of forecast run, timestep, and meteorological parameter is stored exactly
once. This enables idempotent re-ingestion: if an ingestion is interrupted and restarted,
the service can safely re-insert rows without creating duplicates.
3.3 Compression
Before storage, each numpy array is converted to float32 and
compressed using zlib (deflate). The compression is applied independently to the latitude
vector, longitude vector, and data matrix for each (run, forecast_hour, parameter) tuple.
This approach allows selective decompression: for operations that only need the grid
geometry (e.g., determining spatial extent), the coordinate vectors can be decompressed
without touching the data blob.
Typical compression ratios for meteorological grids range from 3:1 to 8:1, depending on the spatial smoothness of the field. Wave height grids, which exhibit large-scale spatial coherence, compress more effectively than current grids, which contain fine-scale eddies and sharp coastal gradients.
3.4 Stored Parameters
The ingestion service stores 14 distinct meteorological parameters across three data
sources. Each parameter is stored as a separate row in weather_grid_data,
enabling the retrieval layer to load only the parameters required by the current
computation.
| Parameter | Source | Description | Unit |
|---|---|---|---|
wind_u | GFS | Eastward wind component at 10 m | m/s |
wind_v | GFS | Northward wind component at 10 m | m/s |
wave_hs | CMEMS | Combined significant wave height | m |
wave_tp | CMEMS | Peak wave period | s |
wave_dir | CMEMS | Mean wave direction | deg |
swell_hs | CMEMS | Swell significant wave height | m |
swell_tp | CMEMS | Swell peak period | s |
swell_dir | CMEMS | Swell mean direction | deg |
windwave_hs | CMEMS | Wind-wave significant wave height | m |
windwave_tp | CMEMS | Wind-wave peak period | s |
windwave_dir | CMEMS | Wind-wave mean direction | deg |
current_u | CMEMS | Eastward ocean surface current | m/s |
current_v | CMEMS | Northward ocean surface current | m/s |
current_speed | Derived | Current magnitude (derived from U/V) | m/s |
3.5 Run Supersession
To prevent unbounded growth of the database, the ingestion service implements a
supersession mechanism. After each successful ingestion, the
_supersede_old_runs() method marks all forecast runs older than 24 hours
as 'superseded'. Superseded runs are excluded from the retrieval layer's
run selection query, but their data remains in the database for potential historical
analysis until explicitly purged by an administrative process.
4. Data Retrieval and Decompression
The retrieval layer is implemented in DbWeatherProvider (module
db_weather_provider.py) and provides the interface between the compressed
PostgreSQL storage and the temporal interpolation layer. Its primary responsibilities
are: selecting the appropriate forecast run, matching forecast hours to target times,
decompressing grid data, and cropping to the voyage bounding box. The class exposes
three high-level methods — get_wind_from_db(),
get_wave_from_db(), and get_current_from_db() — as well
as a bulk retrieval method for multi-timestep pre-fetching.
4.1 Run Selection
The _find_latest_run(source) method selects the most recent forecast run
for a given data source (e.g., 'CMEMS_wave' or 'GFS'). The
selection criteria are:
- The run must have
status = 'complete'(excluding in-progress and superseded runs). - Among complete runs, the one with the most recent
run_timeis preferred. - If multiple runs share the same
run_time, the one with the mostforecast_hoursis selected, as it provides the greatest temporal coverage for multi-day voyages.
4.2 Forecast Hour Matching
Given a target time (e.g., the time at which a vessel will reach a particular waypoint),
the _best_forecast_hour(run_id, target_time) method identifies the closest
available forecast hour within the selected run. The method queries the distinct forecast
hours stored for the given run and returns the one that minimises the absolute difference
between the target time and the run initialisation time plus the forecast hour offset.
This snapping behaviour is adequate for the retrieval layer; finer temporal resolution
is handled by the interpolation layer described in Section 5.
4.3 Decompression
The _load_grid() method reads a single compressed grid from the database
and reconstructs the original numpy arrays. The decompression path mirrors the compression
pipeline in reverse:
Two helper methods handle the dimensionality: _decompress_1d(blob) produces
a flat vector (used for latitude and longitude coordinate arrays), while
_decompress_2d(blob, rows, cols) produces a 2D matrix by reshaping the
flat buffer according to the stored shape_rows and shape_cols
metadata. The use of explicit shape metadata rather than inferring dimensions from the
blob size provides a consistency check: if the decompressed buffer length does not match
rows × cols, an error is raised.
4.4 Bounding Box Cropping
After decompression, the grid is cropped to the voyage bounding box using boolean
masking. The _crop_grid() method constructs boolean index arrays for
the latitude and longitude dimensions and applies them to extract the rectangular
sub-grid corresponding to the voyage area. This approach handles both ascending and
descending latitude orderings, which differ between data sources (GFS stores latitudes
south-to-north; CMEMS stores them north-to-south).
The np.ix_ construct generates an open mesh from the two boolean vectors,
enabling extraction of the sub-matrix without flattening the 2D structure. For a
Mediterranean voyage (30–50°N, 5°W–35°E) on a 0.5° global
grid, cropping reduces the grid from 340 × 720 points to approximately 40 ×
80 points — a 77-fold reduction in memory footprint and proportional speedup in
subsequent interpolation operations.
4.5 Bulk Retrieval
For multi-day voyage optimization, the temporal interpolation layer requires grids at
multiple forecast hours simultaneously. The get_grids_for_timeline() method
provides a single-pass bulk retrieval interface that loads all requested parameters across
all relevant forecast hours in one database round-trip. It returns a nested dictionary
structured as parameter → {forecast_hour → (lats, lons, data)},
which the temporal provider consumes directly. A specialized variant,
get_wave_grids_for_timeline(), pre-fetches all wave-related parameters
(wave, swell, wind-wave decomposition) in a single coordinated call.
5. Temporal Weather Provisioning
The TemporalGridWeatherProvider (module temporal_weather_provider.py)
is the final stage of the pipeline and the direct interface consumed by the voyage
calculator and route optimizer. It receives the pre-fetched multi-timestep grids from
DbWeatherProvider and exposes a single method —
get_weather(lat, lon, time) — that returns a LegWeather
dataclass containing all meteorological fields needed by the vessel performance model.
The method performs trilinear interpolation across three axes: latitude, longitude, and
forecast hour.
5.1 Internal Data Structure
The provider is constructed with a run_time (the forecast model
initialisation time) and a grids dictionary with the structure
parameter → {forecast_hour → (lats, lons, data)}. Parameters
are grouped by type:
- Vector parameters — Wind (
wind_u,wind_v) and current (current_u,current_v) are stored as U/V component pairs. Interpolation is performed independently on each component, and the magnitude and direction are computed from the interpolated components. - Wave parameters — Combined wave (
wave_hs,wave_tp,wave_dir), swell (swell_hs,swell_tp,swell_dir), and wind-wave (windwave_hs,windwave_tp,windwave_dir) are interpolated as scalar fields.
5.2 Trilinear Interpolation Algorithm
The core interpolation method, _interp_temporal(), brackets the requested
forecast hour between the two nearest available timesteps (h0 and
h1), performs bilinear spatial interpolation at each timestep independently,
and then blends the two results with a linear temporal weight:
\( h_0 \) = greatest available forecast hour \( \le fh \)
\( h_1 \) = smallest available forecast hour \( \ge fh \)
When the query time falls exactly on an available forecast hour (i.e., h0 = h1), the temporal blending reduces to a single spatial interpolation with α = 0. At each timestep, the spatial interpolation uses bilinear interpolation on the regular latitude-longitude grid, locating the four surrounding grid cells and weighting by the fractional position within the cell.
5.3 Vector Parameter Handling
For vector quantities (wind and current), interpolation is performed on the Cartesian components (U and V) rather than on magnitude and direction. This avoids the discontinuity and non-linearity that would arise from interpolating angular quantities directly. After interpolation, the provider computes derived scalars:
This convention produces meteorological direction (the direction from which the wind or current is coming), consistent with standard maritime practice where a 270° wind blows from the west.
5.4 NaN and Coastal Cell Handling
Coastal grid cells in ocean models frequently contain NaN values where the model grid intersects land. Rather than propagating NaN through the interpolation (which would cause the vessel model to fail), the provider implements a fallback strategy: any interpolated value that is NaN or infinite is replaced with 0.0. This is physically reasonable for coastal cells where the query point is near land: wind and current speeds approach zero in sheltered waters, and wave heights are attenuated by shallow-water effects. The fallback ensures that the optimizer never encounters undefined values while traversing coastal waypoints.
5.5 End-to-End Data Flow
The complete data flow from external API to route optimizer is summarized below:
+------------------+ +------------------+ +--------------------+
| CMEMS / GFS | | PostgreSQL DB | | DbWeatherProvider |
| (External API) | --> | weather_forecast | -> | (Decompress + |
| | | _runs / _data | | Crop to BBox) |
+------------------+ +------------------+ +--------------------+
|
v
+----------------------------+--+-------------------+
| TemporalGridWeatherProvider |
| - Brackets forecast hours (h0, h1) |
| - Bilinear spatial interpolation at each |
| - Linear temporal blend: v = v0*(1-a) + v1*a |
| - U/V -> speed/direction conversion |
+----------------------------+---------------------+
|
v
+----------------------------+---------------------+
| VoyageCalculator / RouteOptimizer (A*) |
| - get_weather(lat, lon, time) -> LegWeather |
| - Sub-millisecond per query |
+--------------------------------------------------+
5.1 Data Provenance and Confidence
Each weather query carries a WeatherProvenance dataclass that records
the lineage of the returned data. This metadata enables downstream consumers —
particularly the Monte Carlo uncertainty engine (see
Parametric Monte Carlo Simulation) — to weight
their confidence in the weather values and adjust perturbation magnitudes accordingly.
The provenance dataclass contains four fields:
| Field | Type | Description | Example Values |
|---|---|---|---|
source_type |
str |
Category of the data source | "forecast", "hindcast", "climatology", "blended" |
model_name |
str |
Name of the originating model | "GFS", "CMEMS_wave", "CMEMS_current", "ERA5" |
forecast_lead_hours |
float |
Hours between model init and query time | 0.0 – 240.0 |
confidence |
str |
Qualitative confidence level | "high", "medium", "low" |
Confidence Levels
The confidence field is assigned based on the forecast lead time:
- High (< 72 hours) — The forecast is within the range where NWP models exhibit skill significantly above climatology. Wind speed RMS errors are typically 1.5–2.5 m/s; significant wave height errors are 0.2–0.5 m.
- Medium (72–120 hours) — The forecast remains useful for synoptic-scale features (major storm systems, prevailing wind patterns) but mesoscale details are increasingly unreliable. The Monte Carlo engine increases perturbation magnitudes in this range.
- Low (> 120 hours or climatological data) — The forecast
approaches climatological skill. For voyages extending beyond the 10-day forecast
horizon (
FORECAST_HORIZON_DAYS = 10), the system transitions to a blended mode that interpolates between the latest available forecast and climatological means over a 2-day blending window (BLEND_WINDOW_DAYS = 2).
The route optimizer uses the confidence level to weight uncertainty in its cost function. Routes that rely heavily on low-confidence weather data incur a penalty that biases the optimizer toward paths with more reliable forecasts — typically shorter routes or routes that avoid weather-sensitive ocean regions in the extended forecast period.
6. Caching and Performance
Performance is critical at every stage of the pipeline, as the end-to-end latency directly affects the user experience: a route optimization request should complete within a few seconds, not minutes. WindMar employs a multi-level caching strategy that minimises redundant computation and data transfer at each stage.
6.1 Redis Caching for API Responses
The frontend API layer uses Redis to cache serialized weather responses. When the frontend requests wave forecast frames, wind velocity data, or current visualization grids, the API first checks the Redis cache for a recent result matching the requested parameters and geographic bounds. Cache entries are keyed by data source, forecast hour, and bounding box coordinates, with a time-to-live that aligns with the forecast model update cycle (typically 6 hours for GFS, 12 hours for CMEMS). This caching layer eliminates redundant database queries when multiple users view the same geographic region or when the frontend re-requests data after a page reload.
6.2 Pre-fetch Strategy for Multi-Day Voyages
When a route optimization is initiated, the TemporalGridWeatherProvider
is constructed with all forecast hours that the voyage will span. The
get_grids_for_timeline() bulk retrieval method loads all parameters
across all relevant forecast hours in a single database transaction, rather than
issuing individual queries as each waypoint is evaluated. For a 5-day voyage with
14 parameters and 41 forecast hours, this reduces the number of database round-trips
from potentially thousands (one per node expansion) to a single bulk query.
6.3 Performance Characteristics
The following table summarizes the typical latency at each pipeline stage, measured on a standard development machine with PostgreSQL running locally:
| Stage | Operation | Typical Latency |
|---|---|---|
| Acquisition | Full GFS 41-frame download | ~82 seconds (rate-limited) |
| Acquisition | CMEMS wave forecast download | 30–120 seconds (varies) |
| Ingestion | Compress and store 41 timesteps × 14 params | 5–15 seconds |
| Retrieval | Bulk load all grids for a voyage | < 100 ms |
| Interpolation | Single get_weather() call |
< 1 ms |
| Redis cache | Cached API response lookup | < 5 ms |
The key insight is the three-orders-of-magnitude gap between the acquisition stage (minutes) and the retrieval/interpolation stages (milliseconds). The database acts as an impedance-matching layer that absorbs the slow, bursty acquisition process and presents a fast, uniform interface to the optimizer. Once data has been ingested, route optimizations can run repeatedly against the same forecast data without incurring any network latency, and each A* node expansion pays only the sub-millisecond cost of trilinear interpolation against in-memory grids.
6.4 Pipeline Summary
From the user's perspective, the pipeline operates transparently. A background
ingestion task fetches and stores the latest forecasts on a scheduled cadence.
When the user requests a route optimization, the system selects the most recent
complete forecast run, loads the relevant grids into memory, constructs a
TemporalGridWeatherProvider, and passes it to the A* optimizer.
The optimizer calls get_weather(lat, lon, time) at each candidate
waypoint, receiving interpolated conditions in sub-millisecond time. The total
wall-clock time for a route optimization is dominated by the A* graph search
itself, not by weather data retrieval.
References
- CMEMS, “Global Ocean Waves Analysis and Forecast — Product User Manual,” Copernicus Marine Environment Monitoring Service, 2024.
- CMEMS, “Global Ocean Physics Analysis and Forecast — Product User Manual,” Copernicus Marine Environment Monitoring Service, 2024.
- NCEP/NOAA, “GFS Model Documentation and GRIB2 Format Specification,” National Centers for Environmental Prediction, 2023.
- Hersbach, H. et al., “The ERA5 Global Reanalysis,” Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020.
- PostgreSQL Global Development Group, “PostgreSQL Documentation — Large Objects and Binary Data,” 2024.
- Hoyer, S. and Hamman, J., “xarray: N-D labeled arrays and datasets in Python,” Journal of Open Research Software, vol. 5, no. 1, 2017.