Conversation
xarray/backends/zarr.py
Outdated
| # while dask chunks can be variable sized | ||
| # https://dask.pydata.org/en/latest/array-design.html#chunks | ||
| if var_chunks and not enc_chunks: | ||
| if zarr_format == 3: |
There was a problem hiding this comment.
this check is probably not sufficient
I just pushed tags to my fork! |
|
thanks, I've changed the example back to using your fork |
xarray/backends/zarr.py
Outdated
| if any(len(set(chunks[:-1])) > 1 for chunks in var_chunks): | ||
| raise ValueError( | ||
| "Zarr requires uniform chunk sizes except for final chunk. " | ||
| "Zarr v2 requires uniform chunk sizes except for final chunk. " | ||
| f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. " | ||
| "Consider rechunking using `chunk()`." | ||
| ) | ||
| if any((chunks[0] < chunks[-1]) for chunks in var_chunks): | ||
| raise ValueError( | ||
| "Final chunk of Zarr array must be the same size or smaller " | ||
| "Final chunk of a Zarr v2 array must be the same size or smaller " |
There was a problem hiding this comment.
This is not correct - it's unfortunately not as simple as "Zarr V3 supports variable-length chunking but Zarr V2 doesn't".
|
@keewis - zarr-developers/zarr-python#3534 is approaching a mergable state. Curious if you want to take another pass through this PR before we merge it and provide any feedback. |
|
sure. Do you know if there's a specific tell on whether rectilinear chunks are available? So far I've been using I've posted a comment to the zarr PR (which doesn't seem to really affect the code here). Finally, I still need to figure out how to change |
| f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. " | ||
| "Consider rechunking using `chunk()`." | ||
| "Consider rechunking using `chunk()`, or switching to the " | ||
| "zarr v3 format with zarr-python>=3.2." |
There was a problem hiding this comment.
I still struggle with accurately expressing the prerequisites for rectilinear chunk support. Maybe this is fine, but we could also ask for "rectilinear chunk support"?
| "zarr v3 format with zarr-python>=3.2." | |
| "zarr v3 format with enabled rectilinear chunk support." |
| dask = { git = "https://github.com/dask/dask" } | ||
| distributed = { git = "https://github.com/dask/distributed" } | ||
| zarr = { git = "https://github.com/zarr-developers/zarr-python" } | ||
| zarr = { git = "https://github.com/jhamman/zarr-python", branch = "feature/rectilinear-chunk-grid" } |
There was a problem hiding this comment.
revert before merging:
| zarr = { git = "https://github.com/jhamman/zarr-python", branch = "feature/rectilinear-chunk-grid" } | |
| zarr = { git = "https://github.com/zarr-developers/zarr-python" } |
|
@keewis would you be willing to push xarray tags to your fork (https://github.com/keewis/xarray/tags)? This would enable trying out your PR's integration in VirtualiZarr |
done |
maxrjones
left a comment
There was a problem hiding this comment.
Some rough suggestions from trying this out in virtual-tiff/VirtualiZarr regarding using chunk_grid.chunk_shape rather than .chunks
| chunks = tuple(zarr_array.chunks) | ||
| preferred_chunks = dict(zip(dimensions, chunks, strict=True)) |
There was a problem hiding this comment.
| chunks = tuple(zarr_array.chunks) | |
| preferred_chunks = dict(zip(dimensions, chunks, strict=True)) | |
| chunk_grid = zarr_array.metadata.chunk_grid | |
| if has_chunk_grid_support and isinstance(chunk_grid, RegularChunkGrid): | |
| chunks = chunk_grid.chunk_shape | |
| preferred_chunks = dict(zip(dimensions, chunks, strict=True)) | |
| elif has_chunk_grid_support: | |
| # RectilinearChunkGrid or other non-regular grids — store the | |
| # full chunk_grid and skip preferred_chunks since there's no | |
| # single chunk size per dimension | |
| chunks = chunk_grid | |
| preferred_chunks = {} | |
| else: | |
| # Fallback for older zarr-python without chunk_grid support | |
| chunks = tuple(zarr_array.chunks) | |
| preferred_chunks = dict(zip(dimensions, chunks, strict=True)) |
This suggestion adds support for RectilinearChunkGrids, which do not implement .chunks. It relies on private API, so probably worth advocating for Regular/RectilinearChunkGrid to be made public after the main PR lands
| try: | ||
| from zarr import RectilinearChunks, RegularChunks # noqa: F401 | ||
|
|
||
| has_variable_chunk_support = True | ||
| except ImportError: | ||
| has_variable_chunk_support = False |
There was a problem hiding this comment.
| try: | |
| from zarr import RectilinearChunks, RegularChunks # noqa: F401 | |
| has_variable_chunk_support = True | |
| except ImportError: | |
| has_variable_chunk_support = False | |
| try: | |
| from zarr.core.chunk_grids import RegularChunkGrid | |
| has_chunk_grid_support = True | |
| except ImportError: | |
| has_chunk_grid_support = False |
Used for the variable chunk grid support later on, see note there about making it public API
| # while dask chunks can be variable sized | ||
| # https://dask.pydata.org/en/latest/array-design.html#chunks | ||
| if var_chunks and not enc_chunks: | ||
| if zarr_format == 3 and has_variable_chunk_support: |
There was a problem hiding this comment.
| if zarr_format == 3 and has_variable_chunk_support: | |
| if zarr_format == 3 and has_chunk_grid_support: |
whats-new.rstBuilding on top of zarr-developers/zarr-python#3534, this is a draft PR that allows writing variable-sized chunks to
zarr.To see this in action, try:
At the moment, this requires
safe_chunks=Falsebecause I didn't change the chunk alignment machinery, yet.cc @d-v-b, @jhamman, @dcherian