Skip to content

chunk grid object should track array shape #3736

@d-v-b

Description

@d-v-b

summary

With the rectilinear chunk grid incoming, we need array shape information on the internal chunk grid object. So I propose adding it.

background

Our ChunkGrid class pretty narrowly models the chunk_grid field in zarr v3 array metadata. This means the ChunkGrid does not have any information about the shape of the array. In array metadata that information is stored in the top-level shape attribute.

For the RegularChunkGrid this wasn't a big deal because we can imagine the regular chunks extending to infinity. But for the RectilinearChunkGrid, the shape of the array is actually coupled to the chunk grid definition -- you shouldn't be able to create a chunk grid that fails to span the shape of the array (I need to check if we made this illegal in the spec...).

proposed changes

In #3534, there are a lot of methods on the chunk grid that take an array_shape parameter, and each time this paramter has to be validated against the attributes of the RectilinearChunkGrid. It's far simpler to define the array_shape once up front and bind it to an attribute on RectilinearChunkGrid instances. So that's what I propose. We add an array_shape parameter to chunk grid construction and use it internally.

mild complication

With the simple change I propose, we will have an awkward situation with serialization: RegularChunkGrid(array_shape=(10,), chunk_shape=(1,)).to_dict() will return {"name": "regular", "configuration": {"chunk_shape": (1,)}}, i.e. no array_shape information. That sucks. So my proposal is to widen the type of the return value of the to_dict method to something like this:

{
    "shape": (10,),
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": (1,)}}
 }

i.e., a fragment of a v3 array metadata document that includes the array shape. happy to amend this. but the basic idea is that IMO we need to put the array shape somewhere in the output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions