Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .
Sebastian Galkin

Sebastian Galkin

Staff Engineer

5 posts

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Earthmover is building the cloud platform for scientific data, focusing on weather, climate and geospatial use cases. In these domains, tensors, not tables, are the ideal data model. We have devoted major engineering effort for the past year to Icechunk, our open-source transactional tensor storage

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Everything you need to know about Icechunk garbage collection

Everything you need to know about Icechunk garbage collection

We will talk about two powerful Icechunk operations: expiration and garbage collection. They are related, so we usually refer to both under the name of garbage collection or simply GC. We will explain what each of them does, why you may want to use them, and how to do it safely and effectively. The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Icechunk: Efficient storage of versioned array data

Icechunk: Efficient storage of versioned array data

We recently got an interesting question in Icechunk's community Slack channel (thank you Iury Simoes-Sousa for motivating this post): I'm new to Icechunk. How is the storage managed for redundant information between different versions of a data repository? Icechunk keeps your data versioned, allowin

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Learning about Icechunk consistency with a clichéd but instructive example

Learning about Icechunk consistency with a clichéd but instructive example

In this post we'll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consistency. For this, we'll use a commonplace example: bank account transfers. This is not a problem you wou

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Exploring Icechunk scalability: untangling S3's prefix story

Exploring Icechunk scalability: untangling S3's prefix story

We at Earthmover recently released the Icechunk tensor storage engine, a novel cloud-optimized storage format and library for large-scale array data. Built on Rust’s tokio async runtime, Icechunk delivers impressive gains in performance over today’s array storage engines (e.g. Zarr V2, netCDF). The

Sebastian Galkin
Sebastian Galkin

Staff Engineer