Icechunk-ERA5: a daily updating, performance-optimized ARCO data cube, with 86 years of 43 surface and pressure-level variables. Available now on the Earthmover Data Marketplace .

Icechunk Adopted by the National Weather Service: Earthmover Joins Booz Allen on NWS CIRRUS

Icechunk Adopted by the National Weather Service: Earthmover Joins Booz Allen on NWS CIRRUS
Joe Hamman
Joe Hamman

CTO & Co-founder

We’re excited to share that Earthmover has teamed up with Booz Allen Hamilton on NWS CIRRUS — NOAA’s new cloud-based operational data platform for the National Weather Service. Icechunk, Earthmover’s open-source tensor storage engine, will serve as a core data format for the operational data lake.

The US National Weather Service (NWS) plays a critical role in our nation’s economy, emergency management, and resilience, providing forecasts with operational reach far beyond the weather apps on our phones and the TV forecast on our local news. NWS forecasts are a core element of our national infrastructure. They are incorporated into the operations of federal agencies as diverse as the FAA, DoD, USCG, FEMA, USDA, EPA, NASA, and the Department of the Interior, alongside every US state and territorial government, for use cases that range from real-time emergency warnings and disaster response to multi-week energy and agricultural planning. In addition, the NWS has estimated that private sector entities generate tens of billions in commercial value annually on top of their forecasts.

At the core of the current NWS system is a decades-old software platform called the Advanced Weather Interactive Processing System (AWIPS), which brings together all of the data that hundreds of NWS forecasters need to do this work. Earlier this year, NOAA announced that it had awarded contracts for two complementary cloud systems that together will replace the legacy AWIPS infrastructure NWS forecasters have relied on since the 1990s:

  • NWS HIVE (Hydrometeorological Interactive Virtual Environment) — the new application environment where forecasters analyze data and issue forecast products.
  • NWS CIRRUS (Centralized Integrated Real-Time Repository for Unified Services) — the centralized, cloud-based repository for NWS-owned and partner data.

Booz Allen was recently selected to lead the development of the NWS CIRRUS platform, and Earthmover is proud to be a key part of the team that will transform how weather data is processed, analyzed, and delivered using cloud technology, advanced data engineering, and AI. Phase 1 is targeted for early 2027, with full decommissioning of legacy AWIPS in early 2028.

The Data Underneath the Forecast

Many NWS operations run on data represented as multidimensional arrays (a.k.a. tensors). Numerical model output, satellite imagery, radar volumes, and observation networks are all multidimensional, time-evolving arrays. Volumes are growing fast, and access patterns span everything from a single forecaster pulling a regional subset to AI models training on the entire global historical archive — all while the data has to be available continuously to operations, partners, and the broader weather enterprise.

That specific combination — high-volume tensor data, diverse access patterns, continuous updates, and high availability requirements — necessitates a purpose-built infrastructure layer. The proposed architecture for NWS CIRRUS embodies the best practices for modern cloud-native weather data architecture, utilizing a single system that ingests, versions, and serves operational weather data to forecasters, partners, and downstream applications with Icechunk at its core.

Solutions Landscape

The data layer for operational weather isn’t a new problem. NWS and the broader community have been solving it for decades, and modern cloud architectures offer two reasonable starting points, one focused on traditional HPC data systems and the other focused on tabular data applications. Both fall short of what CIRRUS needs because they were built to solve entirely different problems.

Today’s operational weather data ecosystem is built on file formats like NetCDF, GRIB, and HDF5. These formats are the lingua franca of the field, with mature tooling, deep community expertise, and decades of operational track record — they’re what AWIPS runs on today, and they will remain a part of how weather data is exchanged between agencies and downstream users for the foreseeable future. But those formats were designed for POSIX file systems, before cloud object storage was widely used. As a result, using these file formats with cloud object storage shows poor performance when accessed directly from cloud storage, no archive-wide notion of atomic updates or versioning, and access patterns that encourage users to download files locally before working with them.

The data engineering world, meanwhile, has built genuinely impressive cloud-native infrastructure around tabular data lakes. Open table formats like Apache Iceberg and Delta Lake, layered on Parquet and cloud object storage like AWS, Azure, and Google, have transformed analytics for log data, business records, and other structured datasets. The trouble is that weather data isn’t tabular. Flattening multidimensional arrays into rows breaks the spatial and temporal locality that makes weather queries efficient, and turns common operations like visualizing data on a map, regional subsetting, and full-archive AI training into expensive joins or full table scans. What modern weather services need is the cloud-native, transactional foundation of a modern data lake — applied to tensors instead of tables.

Why NOAA Chose Icechunk

Earthmover built Icechunk to address the core challenges of operational weather data: overcoming the performance limits of legacy file formats and providing a cloud-native, transactional foundation specifically for high-volume tensor data.

Icechunk is an open-source transactional storage engine for Zarr. It adds the capabilities operational systems need and that file-based archives can’t provide:

Icechunk unlocks the cloud in an operational setting for NWS, moving beyond the limitations of file formats designed for file systems, and supporting a wide range of applications from HIVE to open data programs downstream of NWS. The same Zarr/Icechunk reads that work inside CIRRUS will work for a researcher in a Jupyter notebook, a private forecaster training an AI model, or a warning service in production.

A Foundation for AI-Era Weather Operations

NWS has been explicit about the operating model it is building toward: AI and ML throughout the forecast workflow (see AIGFS for a recent example), federated access across field offices, an untethered forecaster workforce, and faster, richer data delivery to the public and private weather enterprise. Underneath all of that, the data layer has to support concurrent reads and writes on petabyte-scale tensor data, with strong consistency, on cloud object storage. That’s what Icechunk does, and it’s what we’ve spent the last several years building and hardening for production.

Looking Ahead

We’ll share more about Earthmover’s role in CIRRUS as Phase 1 progresses. In the meantime, you can learn more about Icechunk, explore the Earthmover platform, or reach out to talk about what cloud-native tensor storage could mean for your team or agency.


Cover image credit: “NOAA National Severe Storms Laboratory”, licensed under CC BY 2.0. No changes were made.

Joe Hamman
Joe Hamman

CTO & Co-founder