Implement updates tracking OWID grapher#6
Open
xrendan wants to merge 569 commits intoBuildCanada:masterfrom
Open
Implement updates tracking OWID grapher#6xrendan wants to merge 569 commits intoBuildCanada:masterfrom
xrendan wants to merge 569 commits intoBuildCanada:masterfrom
Conversation
Looks like a dead code we didn't need for a while.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Bumps [axios](https://github.com/axios/axios) from 1.13.0 to 1.13.5. - [Release notes](https://github.com/axios/axios/releases) - [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md) - [Commits](axios/axios@v1.13.0...v1.13.5) --- updated-dependencies: - dependency-name: axios dependency-version: 1.13.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Add ETL dataset dimensions to Algolia search index This PR adds ETL dataset dimensions to the Algolia search index to enable faceting by dataset namespace, version, name and producer. These dimensions are extracted from variable catalog paths and added to chart, explorer, and multi-dimensional view records. Producers come from the DB. The changes: - Extract dataset namespace, version, and name from catalog paths - Extract producers from the DB - Add these dimensions to all chart types in the search index - Configure Algolia to include these new fields in attributes for faceting - Add unit tests for the catalog path parsing function ## Testing guidance Verify that chart records in the Algolia index now include the new fields: - `datasetNamespaces` - `datasetVersions` - `datasetProducts` - `dataProducers` The easiest is to use the preview feature added in the next PR up the stack #5940. - [x] Does the staging experience have sign-off from product stakeholders?
## Context
This PR indexes the full dataset product name.
It refactors how dataset dimensions (namespace, version, product, producers) are extracted and stored for charts and variables. It introduces database views to centralize and standardize this logic, replacing the previous approach of parsing catalog paths and making multiple queries.
```sql
SELECT
SUM(datasetNamespace IS NULL AND catalogPath IS NOT NULL) AS namespace_missing_catalog_present,
SUM(datasetVersion IS NULL AND catalogPath IS NOT NULL) AS version_missing_catalog_present,
SUM(datasetProduct IS NULL AND catalogPath IS NOT NULL) AS product_missing_catalog_present
FROM dataset_dimensions_by_variable;
```
| namespace_missing_catalog_present | version_missing_catalog_present | product_missing_catalog_present |
| --- | --- | --- |
| 0 | 0 | 0 |
\-> The catalog path never provides data that is missing in other fields.
The reverse is not true:
```sql
SELECT
SUM(datasetNamespace IS NOT NULL AND catalogPath IS NULL) AS namespace_present_catalog_missing,
SUM(datasetVersion IS NOT NULL AND catalogPath IS NULL) AS version_present_catalog_missing,
SUM(datasetProduct IS NOT NULL AND catalogPath IS NULL) AS product_present_catalog_missing
FROM dataset_dimensions_by_variable;
```
| namespace_present_catalog_missing | version_present_catalog_missing | product_present_catalog_missing |
| --- | --- | --- |
| 154106 | 25643 | 154106 |
_So, with this change, we are getting more coverage for these dataset dimensions across records. Note that I didn't check to what extent these vars with missing catalog paths are actually making it into records (currently or in the future). It might well be that the vast majority are not used in charts at this point, so this side exploration is to be taken with a grain of salt._
📝 datasetNamespace still points to short names: namespace descriptions are not consistently present, only their shortnames are.
### About indexing time
I tried a few different things to optimize but always ended up with more complexity (which is not without a cost).
On the other hand, +2min out of what is effectively a week-long job is irrelevant. The only real consequence would be for "emergency" re-indexing, which is not a thing.
So I favored less complexity over an optimization that has no real business value at this point.
## Testing guidance
1. Deploy the migration to create the new database views
2. Verify that Algolia indexing still works correctly for:
- Charts
- Explorer views
- Multi-dimensional views
3. Check that dataset dimensions are correctly displayed in search results
4. Check that the full product name is indexed
- [x] Does the staging experience have sign-off from product stakeholders?
## Context
This PR adds API endpoints to preview Algolia search index records for charts, explorers, and multi-dim visualizations. These endpoints allow developers to see exactly what data will be indexed for search without having to run the full indexing process.
## Technical Details
The PR refactors the Algolia indexing utilities to:
1. Create shared context objects for bulk indexing operations
2. Support filtering by specific IDs for preview purposes (non-optimized but perf acceptable for debugging purposes)
3. Expose record generation functions through new API endpoints
Note: I started with bigger ambitions to optimize both the single preview and bulk generation paths, but bailed at the extent of the changes needed for a strict ETL/ELT/EtLT pipeline. I left the shared context in more as a code structure improvement than a performance optimization (which is negligible compared to other parts of the indexing process).
Also, the branch name is misleading - this is only doing preview, and not indexing.
New API endpoints added:
- `/charts/:chartId/records` - Preview search records for a specific chart
- `/explorers/:slug/records` - Preview search records for a specific explorer
- `/multi-dims/:id/records` - Preview search records for a specific multi-dimensional visualization
## Testing guidance
Examples:
- For charts: http://staging-site-index-single-chart/admin/api/charts/930/records
- For explorers:
- grapher-based: http://staging-site-index-single-chart/admin/api/explorers/co2/records (grapher views excluded, like in the bulk indexing code, so returns empty)
- indicator-based: http://staging-site-index-single-chart/admin/api/explorers/influenza/records
- csv-based: http://staging-site-index-single-chart/admin/api/explorers/inequality/records
- For multi-dims: http://staging-site-index-single-chart/admin/api/multi-dims/5238/records
- [ ] Does the staging experience have sign-off from product stakeholders?
Add a new CF Pages Function at /api/detect-country that uses the existing regions data as the source of truth. Update the client to use a relative URL and remove the old redirect. Closes #6101
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Context This PR refactors the stacked chart components (StackedAreaChart and StackedBarChart) to improve code organization and consistency. The changes consolidate hover state management, simplify prop passing between components, and extract placement logic into utility functions. Key technical changes: - Moved chart placement logic from individual components to utility functions (`toPlacedStackedAreaSeries` and `toPlacedStackedBarSeries`) - Consolidated hover state management using the existing `getHoverStateForSeries` utility - Simplified component interfaces by pre-computing placed series data and hover states - Removed redundant hover tracking properties and methods - Updated type definitions to better represent placed chart elements ## Testing guidance Test stacked area and bar charts to ensure hover interactions and visual rendering work correctly: - [ ] Verify hover effects work on chart areas/bars, legends, and labels - [ ] Check that focus states display properly when hovering over different chart elements - [ ] Confirm tooltips appear correctly when hovering over chart elements - [ ] Test both entity-based and column-based series strategies - [ ] Verify charts render correctly in both full-size and thumbnail views - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints
## Context This PR introduces a new `Emphasis` enum to replace the previous focus/hover state logic for visual styling across chart components. The change consolidates opacity and styling decisions into a centralized system that maps interaction states to visual emphasis levels (Default, Highlighted, Muted). ## Testing guidance Test various chart types to ensure proper visual feedback during interactions: - [ ] Hover over chart elements (bars, lines, areas) and verify opacity changes work correctly - [ ] Test legend interactions - hovering should highlight/mute appropriate chart elements - [ ] Verify focus states still work properly when selecting entities - [ ] Check that connector lines in vertical labels update color appropriately - [ ] Test discrete bar charts, line charts, stacked charts, slope charts, and scatter plots - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints - [ ] Changes to HTML were checked for accessibility concerns
## Context This PR refactors the stacked discrete bar chart implementation to improve focus functionality and code organization. The main changes include: - Enabling focus support for stacked discrete bar charts by removing the restriction in `EditorFeatures.tsx` - Introducing a new `focusableSeriesNames` getter that includes both chart series and selected entity names for stacked discrete bar charts - Consolidating the chart rendering logic by removing the separate `StackedDiscreteBars` component and integrating it directly into `StackedDiscreteBarChart` - Adding proper focus and hover state management with emphasis resolution - Introducing new type definitions for better code organization (`StackedDiscreteBarChartConstants.ts`) ## Testing guidance Step-by-step instructions on how to test this change: 1. Navigate to a stacked discrete bar chart in the admin interface 2. Verify that the focus functionality is now available in the editor data tab 3. Test focusing on individual series (both columns and entities) and verify visual emphasis changes 4. Test hover interactions on legend items and bar segments 5. Verify tooltips display correctly when hovering over bars 6. Test sorting functionality to ensure bars animate smoothly when reordered 7. Check that total value labels display correctly when enabled - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints - [ ] Changes to HTML were checked for accessibility concerns
## Context This PR removes unused CSS class names and cleans up redundant code across chart components. The changes include removing className attributes from SVG elements in DiscreteBarChart, ScatterPlotChart, MarimekkoChart, and StackedBarChart components, as well as simplifying conditional rendering logic in StackedBars and StackedDiscreteBarRow components. ## Testing guidance Step-by-step instructions on how to test this change: 1. Verify that all chart types (discrete bar, scatter plot, marimekko, stacked bar) render correctly 2. Check that chart interactions (hover, tooltips, mouse events) still function properly 3. Ensure that missing data points are now properly filtered out in stacked bar charts 4. Confirm that Figma IDs are still generated correctly for chart elements - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints
…6133) ## Context Follow-up to #6132. PR #6132 (already merged) attempted to mitigate the issue by disabling HTTP caching on the R2 URL fetch path. This PR is a separate follow-up branch created from `master` and switches grapher config reads in CF functions from HTTP requests to R2 bindings. ## Description This PR updates Cloudflare Functions grapher config loading to use R2 bucket bindings instead of fetching from `https://grapher-configs*.owid.io/...`. What changed: - `functions/_common/grapherTools.ts` - Replace URL-based `fetch()` reads with `R2Bucket.get()` reads. - Preserve conditional fetch behavior using `If-None-Match` (`onlyIf`) so `304` can still be returned. - Keep primary + fallback lookup behavior, now via primary/fallback R2 bindings. - Preserve key construction with configured path prefixes (`GRAPHER_CONFIG_R2_BUCKET_PATH`, `GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH`). - `functions/_common/env.ts` - Add R2 binding env types for grapher configs. - Keep URL env vars present for possible rollback/experimentation. - `wrangler.jsonc` - Add `GRAPHER_CONFIG_R2_BUCKET` and `GRAPHER_CONFIG_R2_BUCKET_FALLBACK` bindings for default/preview/production. - Set grapher-config bindings to `"remote": true` for local dev so Wrangler reads from real buckets instead of empty local emulation. - Keep existing URL vars in config. Expected effect: - Avoid Cloudflare internal HTTP cache behavior on the R2 URL fetch path. - Read config objects directly through R2 bindings while preserving status semantics (`200`/`304`/`404`). ## Screenshots / Videos / Diagrams None (no UI changes). ## Testing guidance 1. Start local CF functions (`yarn startLocalCloudflareFunctions`). 2. Request a known config endpoint, e.g. `/grapher/life-expectancy.config.json`. 3. Verify config is served from binding-backed lookup and returns expected status codes. 4. Optionally test with `If-None-Match` to verify `304` behavior. Automated checks run: - `yarn fixPrettierChanged > /dev/null 2>&1 && yarn typecheck` - `yarn testLintChanged` - `yarn testPrettierChanged` - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] Google Analytics events were adapted to fit the changes in this PR - [ ] Changes to CSS/HTML were checked on Desktop and Mobile Safari at all three breakpoints - [ ] Changes to HTML were checked for accessibility concerns
- add a worker-runtime e2e test suite for /grapher/:slug.config.json - seed local Miniflare R2 buckets with a real life-expectancy config fixture - cover primary bucket fetch, ETag 304, fallback bucket, 404, and nocache behavior - use a dedicated wrangler e2e config with local-only R2 bindings (no remote: true)
- simplify test worker context by always injecting a test ASSETS stub - add a test-only r2-has-key endpoint for assertions - strengthen fallback test to assert key is absent in primary and present in fallback before fetch
## Context This PR adds end-to-end testing infrastructure for the grapher config R2 functionality using Wrangler's local development capabilities. ## Testing guidance Run the new e2e tests to verify R2 bucket functionality: ```bash yarn test:functions:e2e-r2 ``` The tests verify: - Grapher config retrieval from primary R2 bucket - ETag-based caching with 304 responses for unchanged configs - Fallback to secondary R2 bucket when primary bucket misses - 404 responses when configs are missing from both buckets - Cache control headers when nocache parameter is present
## Context This PR refactors the chart legend and labeling system by splitting the overloaded `hideLegend` property and renaming `isDisplayedAlongsideComplementaryTable` to `useMinimalLabeling` for better semantic clarity. The main issues addressed: - `hideLegend` was controlling both entity name labels (line endpoints) and categorical legends (color swatches), causing bugs where line chart settings would affect map legends - `isDisplayedAlongsideComplementaryTable` had an unclear name that described context rather than functionality ## Screenshots / Videos / Diagrams No UI changes - this is a refactoring that maintains existing visual behavior while improving the underlying architecture. ## Testing guidance - [ ] Verify that line charts with `hideLegend: true` still hide endpoint labels correctly - [ ] Verify that stacked bar charts with `hideLegend: true` still hide categorical legends - [ ] Verify that map charts are no longer affected by line chart legend settings - [ ] Test thumbnail generation with `imMinimal=1` parameter to ensure minimal labeling works - [ ] Check that faceted charts properly suppress legends in child components - [ ] Verify scatter plot size legends appear correctly in minimal thumbnail mode - [ ] Does the change work in the archive? - [ ] Does the staging experience have sign-off from product stakeholders? ## Checklist ### Before merging - [ ] The DB types in the ETL have been updated (if applicable for the renamed property)
## Context Closes #6101 — the detect-country request previously hit `detect-country.owid.io`, a separate Cloudflare Worker. This looked suspicious to reviewers and the code wasn't inspectable alongside the rest of the site. This PR moves country detection into a CF Pages Function at `/api/detect-country`, using the existing `regions.data.ts` as the direct source of truth (no more separate `country_codes.json` that needs periodic syncing from the ETL). **Deployment sequence:** This PR should be deployed **first**, then [owid/cloudflare-workers#25](owid/cloudflare-workers#25) (redirect old worker), then [owid/ops#405](owid/ops#405) (update docs). ### Changes 1. **New endpoint** `functions/api/detect-country/index.ts` — reads `request.cf?.country` (alpha-2), looks it up in the regions data via `getParentRegions()`, returns `{ country: { code, name, short_code, slug, regions } | null }` 2. **Client URL** `packages/@ourworldindata/utils/src/Util.ts` — changed from `https://detect-country.owid.io` to `/api/detect-country` (relative URL works across all environments) 3. **Removed redirect** `baker/redirects.ts` — `/detect-country → detect-country.owid.io` no longer needed 4. **Updated workflow** `.github/workflows/update-regions.yml` — removed TODO checkbox about updating the old worker separately ### Data verification The new implementation produces **identical output** to the old `country_codes.json` (251 entries, 0 differences). The only non-semantic difference is the ordering of the parent regions array. Verified with this comparison script: <details> <summary>compareJson.ts</summary> ```typescript /** * Compare old and new country codes JSON files using remeda's isDeepEqual. * * Usage: yarn tsx --tsconfig tsconfig.tsx.json functions/api/detect-country/compareJson.ts */ import { isDeepEqual } from "remeda" import fs from "fs" interface CountryResponse { code: string name: string short_code: string slug: string regions: string[] | null } type CountryMap = Record<string, CountryResponse> const normalize = (data: CountryMap): CountryMap => Object.fromEntries( Object.entries(data).map(([k, v]) => [ k, { ...v, regions: v.regions?.slice().sort() ?? null }, ]) ) const oldData: CountryMap = normalize( JSON.parse(fs.readFileSync("/tmp/old_country_codes.json", "utf-8")) ) const newData: CountryMap = normalize( JSON.parse(fs.readFileSync("/tmp/new_country_codes.json", "utf-8")) ) if (isDeepEqual(oldData, newData)) { console.log( `Identical (${Object.keys(oldData).length} entries, ignoring region array order)` ) } else { console.log("Differences found:") for (const key of new Set([ ...Object.keys(oldData), ...Object.keys(newData), ])) { if (!isDeepEqual(oldData[key], newData[key])) { console.log(` ${key}:`, oldData[key], "→", newData[key]) } } } ``` Output: `Identical (251 entries, ignoring region array order)` </details> ## Testing guidance - Deploy to a staging/preview environment and hit `/api/detect-country` — should return JSON with your country info - https://move-detect-country-to-graph.owid.pages.dev/api/detect-country - Locally, `request.cf?.country` is simulated by Wrangler - Verify that grapher components using `getUserCountryInformation()` still work (entity selector, map zoom, donate form, autocomplete) - although they won't be promoting the user's country until the function is deployed to production 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Post-process the main archive to produce a Wikipedia-specific copy with GTM scripts removed and archive URLs rewritten, for embedding in Wikipedia without third-party tracking.
…es (#6195) ## Context Supersedes #6177. Simplified rewrite on a fresh branch. Creates a secondary publishing target of the archive for Wikipedia ([#6099](#6099)). Wikipedia requires that embedded content does not include third-party tracking. A new post-processing script (`createWikipediaArchive.ts`) takes the main archive output and produces a Wikipedia-specific copy: - **HTML files**: regex-based stripping of GTM `<script>` tags (matching `googletagmanager` or `Google Tag Manager`), then rewrites `ARCHIVE_BASE_URL` → `WIKIPEDIA_ARCHIVE_BASE_URL`. - **Non-HTML files**: Hard-linked from the main archive (no processing needed). ### Simplifications vs #6177 The previous PR (#6177) selectively filtered files: it skipped posts, images, and videos, and only processed data pages (grapher/explorer HTML). This PR deliberately keeps full parity with the main archive — every file is included. This is simpler (no `isDataPageHtml()`, `findPostFiles()`, `SKIP_DIRS` logic) and avoids the Wikipedia archive silently diverging from the main archive over time. The only difference between the two archives is the absence of GTM scripts and the rewritten base URL. ### Changes | File | Change | | --- | --- | | `baker/archival/createWikipediaArchive.ts` | **New** — CLI script: strips GTM from HTML, rewrites archive URLs, hard-links non-HTML | | `baker/archival/createWikipediaArchive.test.ts` | **New** — 13 unit tests for `stripGtmScripts` and `rewriteArchiveUrls` | | `settings/clientSettings.ts` | Exports `ARCHIVE_BASE_URL` (client-side, used in `<link rel="archives">`) | | `settings/serverSettings.ts` | Added `WIKIPEDIA_ARCHIVE_BASE_URL` (server-only) | | `.env.archive` | Added `WIKIPEDIA_ARCHIVE_BASE_URL` | | `Makefile` | Added `wikipedia-archive` target | | `baker/archival/README.md` | Added Wikipedia archive documentation section | | `devTools/backpopulateWikipediaArchive.sh` | **New** — One-off script to back-populate Wikipedia archive R2 bucket from main archive | | `features/wikipedia-archive.feature` | **New** — BDD feature: asserts production archive makes GTM requests while Wikipedia archive does not | | `features/wikipedia-archive.steps.ts` | **New** — Playwright step definitions | | `playwright.config.ts` | Added webServer entries for archive/wikipedia-archive on ports 8764/8765 | | `.gitignore` | Added `wikipedia-archive/*` | > **Ops companion PR:** owid/ops#408 (merge grapher PR first) ## Testing guidance - [x] `yarn test run baker/archival/createWikipediaArchive.test.ts` — 13 tests pass - [x] `yarn typecheck` — passes - [x] No hardcoded `archive.ourworldindata.org` URLs in source code (only in db docs descriptions) - [x] `make wikipedia-archive` — spot-check output: `grep -r "googletagmanager" wikipedia-archive/` returns zero matches - [x] Backpopulate script tested against R2: downloaded 643 HTML files, processed (0 GTM refs, 0 bare archive URLs), uploaded successfully - [x] BDD tests (`yarn testBdd`): 36/36 passed — Wikipedia archive scenarios verify GTM presence in production archive and absence in Wikipedia archive across 3 browsers ### Post-deploy task - [ ] Deploy owid/ops#408 - [ ] Pause/cancel deploy-content builds - [ ] Run `devTools/backpopulateWikipediaArchive.sh` on the production server once after first deploy to populate the Wikipedia archive R2 bucket from the existing main archive - [ ] Resume deploy-content builds 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Links to issues, Figma, Slack, and a technical introduction to the work.
Screenshots / Videos / Diagrams
Add if relevant, i.e. might not be necessary when there are no UI changes.
Testing guidance
Step-by-step instructions on how to test this change
Reminder to annotate the PR diff with design notes, alternatives you considered, and any other helpful context.
Checklist
(delete all that do not apply)
Before merging
If DB migrations exists:
After merging