Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .
Tag

#zarr

13 posts

I/O-Maxing Tensors in the Cloud

I/O-Maxing Tensors in the Cloud

The critical role of I/O in data science and AI/ML For both analytics and AI workloads, fast I/O is the foundation of good performance. Most of these workloads involve fluxing a large amount of data from storage into RAM, and then to the CPU or GPU. In the cloud, where data reside on object storage,

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Building the Future of Scientific Data at the Zarr Summit

Building the Future of Scientific Data at the Zarr Summit

Next week we are convening the Zarr community in Rome, Italy for a week of fast-paced collaboration and conversation. Given the recent acceleration on Zarr adoption across major data providers in Weather Forecasting, Earth Observation and Bioimaging, this in-person event is critical for aligning sta

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Bring reliability, scalability, and version control to your Zarr datasets, without giving up performance. Zarr is a powerful protocol for storing large-scale, multi-dimensional arrays. It's fast, scalable, and cloud-native, which is why it's used across a variety of domains like climate science and

Lindsey Nield
Lindsey Nield

Software Engineer

Radar DataTree: Transforming thousands of scans into a single cohesive model

Radar DataTree: Transforming thousands of scans into a single cohesive model

From structure to scale, radar needs a model that organizes complete collections as time-aware, cloud-native datasets. In our second post, we looked at how new standards and open-source tools are transforming weather radar from raw binary blobs into structured, metadata-rich datasets. FM-301—an offi

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took the form of the Icechunk project, an open source package and specification enabling database-style t

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Why scientists, data engineers, and developers are turning to Zarr Often the biggest bottleneck in your workflow isn’t your code or your hardware, but the way your data is stored. Data formats can limit–or unlock–what you’re able to do with your data. In modern science and data-intensive computing,

Lindsey Nield
Lindsey Nield

Software Engineer

Zarr takes Cloud-Native Geospatial by storm

Zarr takes Cloud-Native Geospatial by storm

Our takeaways from the Cloud-Native Geospatial conference on Zarr's surging adoption and its impact on the future of Earth Observation data. Our team just returned from an action-packed week at the Cloud-Native Geospatial conference in beautiful Snowbird, Utah, and the key takeaway was unmistakable:

Joe Hamman
Joe Hamman

CTO & Co-founder

Learning about Icechunk consistency with a clichéd but instructive example

Learning about Icechunk consistency with a clichéd but instructive example

In this post we'll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consistency. For this, we'll use a commonplace example: bank account transfers. This is not a problem you wou

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What is Cloud-Optimized Scientific Data?

Fundamentals: What is Cloud-Optimized Scientific Data?

Why naively lifting scientific data to the cloud falls flat. Scientific formats predate the cloud There are exabytes of scientific data out in the wild, with more being generated every year. At Earthmover we believe the best place for it to reside is in the cloud, in object storage. Cloud platforms

Tom Nicholas
Tom Nicholas

Software Engineer

Accelerating Xarray with Zarr-Python 3

Accelerating Xarray with Zarr-Python 3

zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long

Davis Bennet
Davis Bennet

Software Engineer

Zarr-Python 3 is here!

Zarr-Python 3 is here!

Note: This post was originally published on the Zarr developer blog. After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enh

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Announcing Icechunk!

Announcing Icechunk!

TLDR We are excited to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that en

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Toward Zarr-Python 3.0

Toward Zarr-Python 3.0

Note: This post was originally published on the Zarr developer blog. We released Zarr-Python 2.18.0 this week. Although this release was quite light in terms of user-facing changes, it represents the beginning of a new phase for the project. In this post, we’ll walk through our plan for Zarr-Python

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder