-
-
Notifications
You must be signed in to change notification settings - Fork 383
Description
summary
With the rectilinear chunk grid incoming, we need array shape information on the internal chunk grid object. So I propose adding it.
background
Our ChunkGrid class pretty narrowly models the chunk_grid field in zarr v3 array metadata. This means the ChunkGrid does not have any information about the shape of the array. In array metadata that information is stored in the top-level shape attribute.
For the RegularChunkGrid this wasn't a big deal because we can imagine the regular chunks extending to infinity. But for the RectilinearChunkGrid, the shape of the array is actually coupled to the chunk grid definition -- you shouldn't be able to create a chunk grid that fails to span the shape of the array (I need to check if we made this illegal in the spec...).
proposed changes
In #3534, there are a lot of methods on the chunk grid that take an array_shape parameter, and each time this paramter has to be validated against the attributes of the RectilinearChunkGrid. It's far simpler to define the array_shape once up front and bind it to an attribute on RectilinearChunkGrid instances. So that's what I propose. We add an array_shape parameter to chunk grid construction and use it internally.
mild complication
With the simple change I propose, we will have an awkward situation with serialization: RegularChunkGrid(array_shape=(10,), chunk_shape=(1,)).to_dict() will return {"name": "regular", "configuration": {"chunk_shape": (1,)}}, i.e. no array_shape information. That sucks. So my proposal is to widen the type of the return value of the to_dict method to something like this:
{
"shape": (10,),
"chunk_grid": {"name": "regular", "configuration": {"chunk_shape": (1,)}}
}i.e., a fragment of a v3 array metadata document that includes the array shape. happy to amend this. but the basic idea is that IMO we need to put the array shape somewhere in the output.