Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .

Blog

Articles, announcements, and case studies from the Earthmover team.

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

tl;dr: Technical details of how we do surgery on python wheels in order to do cross version compatibility testing. We're getting ready to release Icechunk V2 — the next evolution of our tensor storage engine. People run real workloads on Icechunk V1. They're not all going to upgrade on the same day.

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Ditch the Data Pipeline: A Snow Alert Bot in an Afternoon

Ditch the Data Pipeline: A Snow Alert Bot in an Afternoon

As a distributed team (of weather and climate geeks), we love finding ways to connect. So when a nationwide winter storm rolled through, we thought it would be fun to set up a Slack channel that tells us who it's snowing for right now. Here's how we built that in a single afternoon — without the pai

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Matt Iannucci
Matt Iannucci

Engineering

Announcing the Earthmover Data Marketplace

Announcing the Earthmover Data Marketplace

At Earthmover, we’re on a mission to empower people to use scientific data to solve humanity’s greatest challenges. With today’s launch of the Earthmover Data Marketplace, we’re taking a huge step towards that goal. We’re launching the marketplace with a fantastic group of data providers, including

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Earthmover is building the cloud platform for scientific data, focusing on weather, climate and geospatial use cases. In these domains, tensors, not tables, are the ideal data model. We have devoted major engineering effort for the past year to Icechunk, our open-source transactional tensor storage

Sebastian Galkin
Sebastian Galkin

Staff Engineer

I/O-Maxing Tensors in the Cloud

I/O-Maxing Tensors in the Cloud

The critical role of I/O in data science and AI/ML For both analytics and AI workloads, fast I/O is the foundation of good performance. Most of these workloads involve fluxing a large amount of data from storage into RAM, and then to the CPU or GPU. In the cloud, where data reside on object storage,

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Scientific Data Visualization with Xarray and Napari

Scientific Data Visualization with Xarray and Napari

This blog was also published on the xarray blog and the napari blog. TL;DR Making Napari and Xarray work better together will enhance data visualization for both biology and geosciences. This has been long desired by the community but has not yet been implemented. At the SciPy 2025 sprints we formed

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Building the Future of Scientific Data at the Zarr Summit

Building the Future of Scientific Data at the Zarr Summit

Next week we are convening the Zarr community in Rome, Italy for a week of fast-paced collaboration and conversation. Given the recent acceleration on Zarr adoption across major data providers in Weather Forecasting, Earth Observation and Bioimaging, this in-person event is critical for aligning sta

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Plotting NYC heatwaves during NYC Climate Week

Plotting NYC heatwaves during NYC Climate Week

Calculating climate risk metrics from ERA5 using Arraylake, Icechunk, and Xarray. Just show me the notebook! It’s here. Intro This year at NYC climate week it’s been pretty hot and humid outside. This felt ironically appropriate while we ran a participatory workshop “Open Data in Applied Risk Analys

Tom Nicholas
Tom Nicholas

Software Engineer

Earthmover’s $7.2M Seed Round led by Lowercarbon Capital

Earthmover’s $7.2M Seed Round led by Lowercarbon Capital

Today we announced Earthmover’s seed round fundraise, led by the amazing folks at Lowercarbon Capital, with participation from Costanoa Ventures (our pre-seed investor) and Preston-Werner Ventures (GitHub co-founder Tom Preston-Werner). Between Lowercarbon’s deep understanding of our target customer

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

The 3 Key Optimizations That Cut the Cost of AI Weather Forecasts by 90%

The 3 Key Optimizations That Cut the Cost of AI Weather Forecasts by 90%

Note: this is the blog post version of a webinar we gave last month, summarized for brevity. At Earthmover, we’re closely following the AI revolution in weather forecasting. Until recently, the only way to make an accurate global-scale weather forecast was to run an expensive, physics-based numerica

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Bring reliability, scalability, and version control to your Zarr datasets, without giving up performance. Zarr is a powerful protocol for storing large-scale, multi-dimensional arrays. It's fast, scalable, and cloud-native, which is why it's used across a variety of domains like climate science and

Lindsey Nield
Lindsey Nield

Software Engineer

Radar DataTree: Transforming thousands of scans into a single cohesive model

Radar DataTree: Transforming thousands of scans into a single cohesive model

From structure to scale, radar needs a model that organizes complete collections as time-aware, cloud-native datasets. In our second post, we looked at how new standards and open-source tools are transforming weather radar from raw binary blobs into structured, metadata-rich datasets. FM-301—an offi

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Earthmover Sponsors Ocean Hack Week: Empowering the Open Science Community

Earthmover Sponsors Ocean Hack Week: Empowering the Open Science Community

Earthmover is proud to announce its sponsorship of Ocean Hack Week 2025, continuing a tradition of supporting hackweeks that have long served as vibrant spaces for the Pangeo Community to meet, share ideas, and innovate on solving real-world scientific challenges. Earthmover founders Ryan and Joe ha

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

From Files to Datasets: FM-301 and the Future of Radar Interoperability

From Files to Datasets: FM-301 and the Future of Radar Interoperability

At Earthmover, we’re interested in weather radar data for two reasons: - First, radar data are uniquely valuable for our customers thanks to their ability to characterize precipitation, atmospheric turbulence, and phenomena like tornadoes and hurricanes in real time with fine spatial and temporal re

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took the form of the Icechunk project, an open source package and specification enabling database-style t

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Meet the Earthmover Team at SciPy 2025 in Tacoma!

Meet the Earthmover Team at SciPy 2025 in Tacoma!

The Earthmover team is heading to the SciPy Conference on July 7-11 in Tacoma, Washington. We'll be there to talk about our recent open-source work on Xarray, Zarr, and Icechunk – including leading a tutorial, giving talks and presenting posters – and to spread the word about the Earthmover Platform

Joe Hamman
Joe Hamman

CTO & Co-founder

The Untapped Promise of Weather Radar Data

The Untapped Promise of Weather Radar Data

Weather radar is one of the most powerful observational tools in atmospheric science. Every few minutes, it captures reflectivity, among other variables, in a sample volume that is rotated forming a high-resolution, four-dimensional dataset (x,y,z,t) that tracks storms in real time, revealing fine-s

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Ergonomic seasonal grouping and resampling in Xarray

Ergonomic seasonal grouping and resampling in Xarray

At Earthmover, we contribute to maintaining and driving forward a range of community open-source projects including Xarray and Zarr. The following post, cross-posted from the Xarray developer blog, describes new API for seasonal aggregation in Xarray. TL;DR Two new Grouper objects - SeasonGrouper an

Deepak Cherian
Deepak Cherian

Forward Deployed Engineer

Announcing Fine-Grained Access Controls

Announcing Fine-Grained Access Controls

Today, we're thrilled to announce a significant enhancement to Arraylake: fine-grained, repository-level permissions! This highly anticipated feature empowers you with unprecedented control over who can access your valuable array data, and how. By popular request As teams and organizations grow in s

Brian Davis
Brian Davis

Software Engineer

Xarray for Biology

Xarray for Biology

This was originally published on the Xarray blog: https://xarray.dev/blog/xarray-biology Hi! I'm Ian, a multimodal microscopist, and the new "Xarray Community Developer." I am funded by the Chan Zuckerberg Institute to support the use of Xarray in biological and biomedical applications. I believe Xa

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Everything you need to know about Icechunk garbage collection

Everything you need to know about Icechunk garbage collection

We will talk about two powerful Icechunk operations: expiration and garbage collection. They are related, so we usually refer to both under the name of garbage collection or simply GC. We will explain what each of them does, why you may want to use them, and how to do it safely and effectively. The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Why scientists, data engineers, and developers are turning to Zarr Often the biggest bottleneck in your workflow isn’t your code or your hardware, but the way your data is stored. Data formats can limit–or unlock–what you’re able to do with your data. In modern science and data-intensive computing,

Lindsey Nield
Lindsey Nield

Software Engineer

Icechunk: Efficient storage of versioned array data

Icechunk: Efficient storage of versioned array data

We recently got an interesting question in Icechunk's community Slack channel (thank you Iury Simoes-Sousa for motivating this post): I'm new to Icechunk. How is the storage managed for redundant information between different versions of a data repository? Icechunk keeps your data versioned, allowin

Sebastian Galkin
Sebastian Galkin

Staff Engineer

TensorOps: Scientific Data Doesn't Have to Hurt

TensorOps: Scientific Data Doesn't Have to Hurt

Okay friends, it's time to take the Data Pain Survey: - Does it take your team three weeks or more to repurpose an existing dataset for a new data product? - Do you have team-A and team-B versions of the same dataset? - Do you miss delivery dates because you cannot estimate the work necessary to ser

Brian Davis
Brian Davis

Software Engineer

Zarr takes Cloud-Native Geospatial by storm

Zarr takes Cloud-Native Geospatial by storm

Our takeaways from the Cloud-Native Geospatial conference on Zarr's surging adoption and its impact on the future of Earth Observation data. Our team just returned from an action-packed week at the Cloud-Native Geospatial conference in beautiful Snowbird, Utah, and the key takeaway was unmistakable:

Joe Hamman
Joe Hamman

CTO & Co-founder

Meet the Earthmover Team at the Cloud Native Geospatial Conference 2025!

Meet the Earthmover Team at the Cloud Native Geospatial Conference 2025!

The Earthmover team is heading to Snowbird, Utah for the Cloud Native Geospatial (CNG) Conference 2025 this week. For our part, we’ll be there to talk about ​​our recent open-source work on Xarray, Zarr, and Icechunk – including leading a 3-hour workshop – and to spread the word about the Earthmover

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Learning about Icechunk consistency with a clichéd but instructive example

Learning about Icechunk consistency with a clichéd but instructive example

In this post we'll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consistency. For this, we'll use a commonplace example: bank account transfers. This is not a problem you wou

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What is Cloud-Optimized Scientific Data?

Fundamentals: What is Cloud-Optimized Scientific Data?

Why naively lifting scientific data to the cloud falls flat. Scientific formats predate the cloud There are exabytes of scientific data out in the wild, with more being generated every year. At Earthmover we believe the best place for it to reside is in the cloud, in object storage. Cloud platforms

Tom Nicholas
Tom Nicholas

Software Engineer

Announcing Flux: The API Layer for Geospatial Data Delivery

Announcing Flux: The API Layer for Geospatial Data Delivery

TLDR Earthmover’s new product–Flux–adds a whole new layer to our platform. Flux allows you to serve geospatial data from Arraylake via standard API protocols–including WMS (web map service), EDR (environmental data retrieval), and DAP–enabling frictionless integration with GIS applications, web appl

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Exploring Icechunk scalability: untangling S3's prefix story

Exploring Icechunk scalability: untangling S3's prefix story

We at Earthmover recently released the Icechunk tensor storage engine, a novel cloud-optimized storage format and library for large-scale array data. Built on Rust’s tokio async runtime, Icechunk delivers impressive gains in performance over today’s array storage engines (e.g. Zarr V2, netCDF). The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: Tensors vs. Tables

Fundamentals: Tensors vs. Tables

At Earthmover, we are building the cloud platform for multidimensional array data (a.k.a. tensors). While the need for such a platform is obvious to practitioners in weather, climate, and geospatial data science (not to mention AI, where tensors reign supreme), folks from the mainstream data world o

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

How Our Customers Use NOAA Data

How Our Customers Use NOAA Data

NOAA is under threat The US National Oceanic and Atmospheric Administration (NOAA) is one of the world’s leading scientific agencies. NOAA has a three part mission: 1. To understand and predict changes in climate, weather, ocean and coasts. 2. To share that knowledge and information with others. 3.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Accelerating Xarray with Zarr-Python 3

Accelerating Xarray with Zarr-Python 3

zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long

Davis Bennet
Davis Bennet

Software Engineer

Zarr-Python 3 is here!

Zarr-Python 3 is here!

Note: This post was originally published on the Zarr developer blog. After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enh

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Announcing Icechunk!

Announcing Icechunk!

TLDR We are excited to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that en

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Vector data cubes in Xarray

Vector data cubes in Xarray

This is a blog version of a webinar that took place on August 27, 2024. Here’s a video of that webinar: Geospatial datasets representing information about real-world features such as points, lines, and polygons are increasingly large, complex, and multidimensional. They are naturally represented as

Emma Marshall
Emma Marshall

Software Engineer

Case Study: ALIVE at The University of Wisconsin-Madison

Case Study: ALIVE at The University of Wisconsin-Madison

Background The University of Wisconsin-Madison is home to a research team called Advanced Baseline Imager Live Imaging of Vegetated Ecosystems (ALIVE). The team, working remotely and led by Prof. Paul Stoy, PhD, is building a gradient-boosting regression model using geostationary satellites to estim

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

A Serverless Approach to Building Planetary-Scale EO Datacubes in Zarr

A Serverless Approach to Building Planetary-Scale EO Datacubes in Zarr

This is a blog version of a webinar that took place on April 16, 2024. Here’s a video of that webinar: Earth Observation satellites generate massive volumes of data about our planet, and these data are vital for confronting global challenges. Satellite imagery is commonly distributed as individual “

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Toward Zarr-Python 3.0

Toward Zarr-Python 3.0

Note: This post was originally published on the Zarr developer blog. We released Zarr-Python 2.18.0 this week. Although this release was quite light in terms of user-facing changes, it represents the beginning of a new phase for the project. In this post, we’ll walk through our plan for Zarr-Python

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Case Study: Sylvera

Case Study: Sylvera

Situation Overview Sylvera rates projects in the voluntary carbon market with the goal of enabling their customers to invest in the most meaningful initiatives. In order to produce these ratings, Sylvera relies on satellite imagery from providers such as Copernicus, USGS, and NASA. Prior to adopting

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Cloud native data loaders for machine learning using Zarr and Xarray

Cloud native data loaders for machine learning using Zarr and Xarray

We set up a high-performance PyTorch dataloader using data stored as Zarr in the cloud Machine learning has become essential in the utilization of weather, climate, and geospatial data. Sophisticated models such as GraphCast, ClimaX, and Clay are emerging within these domains. The advancement of the

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Earthmover and Pangeo at AGU 2023

Earthmover and Pangeo at AGU 2023

December is here, and that means that thousands of Earth System Scientists are getting ready for the annual pilgrimage to AGU! At my first AGU in 2011, I was a fresh-faced grad student, excited to present my latest research on modeling Southern Ocean circulation. Since then, my relationship with thi

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Arraylake Now Available in Private Beta

Arraylake Now Available in Private Beta

At Earthmover, we believe that scientific data are key to solving humanity’s greatest challenges. And we know that scientists today are struggling with tools that don’t understand scientific data formats and data models. For the past year, we’ve been hard at work building a platform to transform how

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Earthmover is hiring

Earthmover is hiring

Note: the two founding engineer positions have now been filled. Checkout our jobs page for current opportunities. How do we best utilize software and data to tackle our planet’s most urgent challenges? Being part of the answer to this question is why I am so excited about what we’re building at Eart

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Why we started Earthmover

Why we started Earthmover

Earthmover is an early-stage startup building a platform for scientific data analytics in the cloud. Earthmover, an early-stage startup, is building a platform for scientific data analytics in the cloud. Our mission is to empower our customers to use scientific data to address our planet’s most urge

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder