Skip to content

Feature/SOF-7843 TE NB#271

Open
VsevolodX wants to merge 15 commits intomainfrom
feature/SOF-7843
Open

Feature/SOF-7843 TE NB#271
VsevolodX wants to merge 15 commits intomainfrom
feature/SOF-7843

Conversation

@VsevolodX
Copy link
Member

@VsevolodX VsevolodX commented Feb 26, 2026

Summary by CodeRabbit

  • New Features
    • Added utilities to deduplicate materials and workflows and log reuse or creation.
    • Added a flexible job-creation flow that supports collection-based or per-material job creation with optional compute settings.
  • Examples
    • Added a comprehensive notebook demonstrating end-to-end setup: auth, material/workflow handling, compute selection, job submission, polling, and results visualization.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds client-side helpers for material and workflow deduplication and a dual-mode job creation helper in utils/api.py; adds a new Jupyter notebook demonstrating end-to-end workflow creation, submission, polling, and results visualization using these utilities and the Mat3ra API.

Changes

Cohort / File(s) Summary
API Utilities Enhancement
utils/api.py
Adds get_or_create_material() and get_or_create_workflow() for client-side deduplication (hash checks, possible deletion of newly created duplicates) and create_job() supporting collection-based and per-material job creation; updates imports to include MaterialEndpoints, WorkflowEndpoints, and Optional typing.
Example Jupyter Notebook
other/materials_designer/workflows/total_energy.ipynb
New notebook demonstrating environment setup, auth, client init, material load/save, workflow composition (optional relaxation), compute selection, job creation/submission (collection or embedded workflow), polling, and result retrieval/visualization.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MaterialAPI as Material Endpoint
    participant WorkflowAPI as Workflow Endpoint
    participant JobAPI as Job Endpoint
    participant Server as Server

    rect rgba(100,150,200,0.5)
    Note over Client,MaterialAPI: Material deduplication
    Client->>MaterialAPI: get_or_create_material(material, owner_id)
    MaterialAPI->>Server: query by material.hash + owner
    alt exists
        Server-->>MaterialAPI: existing material
        MaterialAPI-->>Client: return existing (log reuse)
    else not exists
        MaterialAPI->>Server: create material
        Server-->>MaterialAPI: created material
        MaterialAPI-->>Client: return created (log create)
    end
    end

    rect rgba(150,100,200,0.5)
    Note over Client,WorkflowAPI: Workflow create + dedupe
    Client->>WorkflowAPI: get_or_create_workflow(workflow, owner_id)
    WorkflowAPI->>Server: create workflow
    Server-->>WorkflowAPI: created workflow (with hash)
    WorkflowAPI->>Server: query by hash for duplicates
    alt duplicate found
        Server-->>WorkflowAPI: original workflow
        WorkflowAPI->>Server: delete newly created workflow
        WorkflowAPI-->>Client: return original (log dedupe)
    else no duplicate
        WorkflowAPI-->>Client: return created (log create)
    end
    end

    rect rgba(200,150,100,0.5)
    Note over Client,JobAPI: Dual-mode job creation
    Client->>JobAPI: create_job(materials, workflow_or_id, project_id, save_to_collection,...)
    alt save_to_collection = true
        JobAPI->>Server: create_by_ids(material_ids, workflow_id, project_id)
        Server-->>JobAPI: jobs list
    else save_to_collection = false
        loop per material
            JobAPI->>JobAPI: build job payload (embed workflow, material ref, compute)
            JobAPI->>Server: create job
            Server-->>JobAPI: job created
        end
    end
    JobAPI-->>Client: return created jobs (log outcomes)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • timurbazhirov

"🐰 I hopped through code with eager feet,
Materials matched, workflows tidy and neat,
Jobs bundled or single, whichever you choose,
I nibbled duplicates, left nothing to lose,
Now pipelines hum — a carrot for you to eat!"

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feature/SOF-7843 TE NB' is vague and uses abbreviated acronyms (TE NB) without clear context about what changes are being made. Replace with a descriptive title that explains the main change, such as 'Add material and workflow deduplication utilities and total energy notebook example'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/SOF-7843

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
other/materials_designer/workflows/band_gap.ipynb (1)

459-470: Consider moving the conditional import to the top of the cell.

The get_or_create_workflow import is conditionally done inside the if block. While functional, this can be confusing. Consider importing at the top of the cell unconditionally for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/band_gap.ipynb` around lines 459 - 470,
Move the conditional import of get_or_create_workflow out of the if block and
place it at the top of the cell alongside the existing import of
dict_to_namespace to make imports consistent; update the cell so
get_or_create_workflow is imported unconditionally (while leaving the runtime
conditional on SAVE_WF_TO_COLLECTION that sets workflow_id_or_dict and uses
get_or_create_workflow, dict_to_namespace, and workflow.to_dict() unchanged).
utils/api.py (1)

179-188: Consider adding a Union type hint for workflow_id_or_dict.

The parameter accepts either a str (workflow ID) or a dict (workflow config), but the type hint is implicit Any. Adding an explicit type improves readability and IDE support.

💡 Suggested type hint improvement
+from typing import List, Optional, Union
+
 def create_job(
     jobs_endpoint: JobEndpoints,
     materials: List[dict],
-    workflow_id_or_dict,
+    workflow_id_or_dict: Union[str, dict],
     project_id: str,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utils/api.py` around lines 179 - 188, The parameter workflow_id_or_dict in
create_job should be explicitly typed as accepting either a string or a dict to
improve readability and IDE support; update the signature to use a Union (e.g.,
Union[str, dict] or Union[str, Mapping[str, Any]]) and add any required typing
imports (Union, Mapping/Dict, Any) at the top of the module so the function
annotation reflects both allowed types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@other/materials_designer/workflows/band_gap.ipynb`:
- Around line 508-516: The code assumes clusters is non-empty when doing cluster
= clusters[0]; add an explicit check (e.g., if not clusters:) before that line
and handle the empty case by raising a clear error (ValueError or RuntimeError)
or logging and aborting, supplying a helpful message like "no clusters
available" so downstream code that constructs Compute(cluster=cluster,
queue=QueueName.D, ppn=2) doesn't get an IndexError; update any surrounding
setup/exception handling to propagate this meaningful error.

In `@utils/api.py`:
- Around line 153-176: The function get_or_create_workflow can race when two
processes create identical workflows; wrap the endpoint.delete(created["_id"])
call in a try/except that catches deletion errors (e.g., not-found or conflict)
so a second process that already removed the duplicate won't raise, and after
catching an exception re-query duplicates (using endpoint.list with {"hash":
created["hash"], "owner._id": owner_id}) to return the surviving original if
present; ensure the exception path logs or silently continues and ultimately
returns the existing workflow rather than propagating the delete error.

---

Nitpick comments:
In `@other/materials_designer/workflows/band_gap.ipynb`:
- Around line 459-470: Move the conditional import of get_or_create_workflow out
of the if block and place it at the top of the cell alongside the existing
import of dict_to_namespace to make imports consistent; update the cell so
get_or_create_workflow is imported unconditionally (while leaving the runtime
conditional on SAVE_WF_TO_COLLECTION that sets workflow_id_or_dict and uses
get_or_create_workflow, dict_to_namespace, and workflow.to_dict() unchanged).

In `@utils/api.py`:
- Around line 179-188: The parameter workflow_id_or_dict in create_job should be
explicitly typed as accepting either a string or a dict to improve readability
and IDE support; update the signature to use a Union (e.g., Union[str, dict] or
Union[str, Mapping[str, Any]]) and add any required typing imports (Union,
Mapping/Dict, Any) at the top of the module so the function annotation reflects
both allowed types.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e444cf8 and 06d71d0.

📒 Files selected for processing (3)
  • other/materials_designer/workflows/band_gap.ipynb
  • pyproject.toml
  • utils/api.py

Comment on lines +508 to 516
"from mat3ra.ide.compute import Compute, QueueName\n",
"\n",
"CLUSTER_NAME = get_cluster_name()\n",
"cluster = clusters[0]\n",
"\n",
"compute = client.jobs.build_compute_config(\n",
" cluster=CLUSTER_NAME\n",
"compute = Compute(\n",
" cluster=cluster,\n",
" queue=QueueName.D,\n",
" ppn=2\n",
")"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

mat3ra.ide.compute QueueName enum D value

💡 Result:

In mat3ra.ide.compute.QueueName, the enum member D corresponds to the Debug queue, i.e. its value is the string "D". [1], [2]


🏁 Script executed:

# Search for any references to mat3ra or QueueName in the repository
rg -i "queuename" --type py --type ipynb -A 2 -B 2

Repository: Exabyte-io/api-examples

Length of output: 96


🏁 Script executed:

# Check if there are any imports or documentation about mat3ra in the repo
rg "mat3ra" --type py --type ipynb -A 1 -B 1 | head -50

Repository: Exabyte-io/api-examples

Length of output: 96


🏁 Script executed:

# Read the notebook file to understand the actual context
cat "other/materials_designer/workflows/band_gap.ipynb" | head -600

Repository: Exabyte-io/api-examples

Length of output: 16598


Add validation for empty clusters list.

Line 510 accesses clusters[0] without checking if the list is non-empty. If no clusters are available, this will raise an IndexError with an unhelpful message.

🛡️ Suggested validation
+if not clusters:
+    raise ValueError("No clusters available. Please check your account configuration.")
 cluster = clusters[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/band_gap.ipynb` around lines 508 - 516,
The code assumes clusters is non-empty when doing cluster = clusters[0]; add an
explicit check (e.g., if not clusters:) before that line and handle the empty
case by raising a clear error (ValueError or RuntimeError) or logging and
aborting, supplying a helpful message like "no clusters available" so downstream
code that constructs Compute(cluster=cluster, queue=QueueName.D, ppn=2) doesn't
get an IndexError; update any surrounding setup/exception handling to propagate
this meaningful error.

Comment on lines +153 to +176
def get_or_create_workflow(endpoint: WorkflowEndpoints, workflow, owner_id: str) -> dict:
"""
Creates the workflow on the server, then uses the server-assigned hash to check for
pre-existing duplicates. If a duplicate exists, deletes the new entry and returns the
original. The server is the authoritative source for structural deduplication.

Args:
endpoint (WorkflowEndpoints): Workflow endpoint from the API client.
workflow: mat3ra-wode Workflow object with a .to_dict() method.
owner_id (str): Account ID under which to search and create.

Returns:
dict: The workflow dict (existing or newly created).
"""
# TODO: calculate the hash in wode, client side
created = endpoint.create(workflow.to_dict(), owner_id=owner_id)
duplicates = endpoint.list({"hash": created["hash"], "owner._id": owner_id})
if len(duplicates) > 1:
original = next(w for w in duplicates if w["_id"] != created["_id"])
endpoint.delete(created["_id"])
print(f"♻️ Workflow already exists: {original['_id']}")
return original
print(f"✅ Workflow created: {created['_id']}")
return created
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential race condition with concurrent workflow creation.

If two processes call get_or_create_workflow simultaneously with identical workflows, both will create entries, both will detect duplicates, and both may attempt to delete the other's entry. This could result in neither workflow surviving or unexpected errors.

For typical notebook usage this is low-risk, but consider adding a try/except around the delete call to handle cases where the "duplicate" was already deleted by another process.

🛡️ Suggested defensive handling
     if len(duplicates) > 1:
         original = next(w for w in duplicates if w["_id"] != created["_id"])
-        endpoint.delete(created["_id"])
+        try:
+            endpoint.delete(created["_id"])
+        except Exception:
+            pass  # Already deleted by concurrent process
         print(f"♻️  Workflow already exists: {original['_id']}")
         return original
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utils/api.py` around lines 153 - 176, The function get_or_create_workflow can
race when two processes create identical workflows; wrap the
endpoint.delete(created["_id"]) call in a try/except that catches deletion
errors (e.g., not-found or conflict) so a second process that already removed
the duplicate won't raise, and after catching an exception re-query duplicates
(using endpoint.list with {"hash": created["hash"], "owner._id": owner_id}) to
return the surviving original if present; ensure the exception path logs or
silently continues and ultimately returns the existing workflow rather than
propagating the delete error.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
pyproject.toml (1)

17-19: Migrate from git-pinned dependencies to published releases.

Both mat3ra-ide and mat3ra-api-client have stable releases available (latest: 2025.11.19-0 and 2026.2.23-0 respectively). Replace the git-pinned commits with versioned PyPI releases to improve dependency management, enable better tooling support, and simplify maintenance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 17 - 19, Replace the git-URL pinned dependencies
in pyproject.toml for mat3ra-ide and mat3ra-api-client with their published PyPI
release versions: change the entries that currently read "mat3ra-ide @
git+https://github.com/Exabyte-io/ide.git@15dbf8c8..." and "mat3ra-api-client @
git+https://github.com/Exabyte-io/api-client.git@b309650..." to standard
versioned requirements using the package names mat3ra-ide==2025.11.19-0 and
mat3ra-api-client==2026.2.23-0 (or the canonical version specifier supported by
your build backend), then run your dependency resolver / poetry lock to verify
and update lockfile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@other/materials_designer/workflows/total_energy.ipynb`:
- Around line 52-58: Update the duplicate section heading: locate the markdown
cell whose source contains "### 1.1. Set parameters and configurations for the
workflow and job" (cell id "3") and change the heading token from "1.1" to "1.2"
so it reads "### 1.2. Set parameters and configurations for the workflow and
job".
- Around line 449-461: The code assigns cluster = next((c for c in clusters if
c.fqdn == CLUSTER_NAME), clusters[0] if clusters else None) and then
instantiates Compute(...) and prints cluster.fqdn which will raise if clusters
is empty; modify the logic to detect an empty clusters list before creating the
Compute object (e.g., if not clusters: raise a meaningful exception or exit with
a clear error), or provide a fallback/validation that ensures cluster is not
None, then pass that validated cluster into Compute and only call cluster.fqdn
in the print after the check; update references in this block (clusters,
CLUSTER_NAME, cluster, Compute) accordingly.
- Around line 225-234: The code assumes a default project exists and directly
indexes projects[0], which can raise an IndexError; update the block that calls
client.projects.list(...) and assigns project_id to first element (projects[0])
to first check if the returned projects list is non-empty (e.g., if not
projects: log/raise a clear error mentioning ACCOUNT_ID and that no default
project was found, or handle creation/selection flow), and only then set
project_id = projects[0]["_id"] and print the message; reference the variables
client.projects.list, projects, project_id, and ACCOUNT_ID when making the
change.
- Around line 119-133: The notebook hardcodes OIDC_ACCESS_TOKEN (and sets it
into os.environ), which must be removed; replace the hardcoded value with a
non-secret placeholder and load the token from the environment
(os.environ.get("OIDC_ACCESS_TOKEN")) or prompt the user to input it at runtime,
and stop writing a real token into source—update the lines that define
OIDC_ACCESS_TOKEN and the subsequent os.environ assignment so they read from the
environment or an interactive prompt and include a clear placeholder/comment
instructing users to set OIDC_ACCESS_TOKEN before running.

---

Nitpick comments:
In `@pyproject.toml`:
- Around line 17-19: Replace the git-URL pinned dependencies in pyproject.toml
for mat3ra-ide and mat3ra-api-client with their published PyPI release versions:
change the entries that currently read "mat3ra-ide @
git+https://github.com/Exabyte-io/ide.git@15dbf8c8..." and "mat3ra-api-client @
git+https://github.com/Exabyte-io/api-client.git@b309650..." to standard
versioned requirements using the package names mat3ra-ide==2025.11.19-0 and
mat3ra-api-client==2026.2.23-0 (or the canonical version specifier supported by
your build backend), then run your dependency resolver / poetry lock to verify
and update lockfile.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06d71d0 and d1abb3c.

📒 Files selected for processing (4)
  • config.yml
  • other/materials_designer/workflows/band_gap.ipynb
  • other/materials_designer/workflows/total_energy.ipynb
  • pyproject.toml

Comment on lines +225 to +234
"cell_type": "code",
"execution_count": null,
"id": "15",
"metadata": {},
"outputs": [],
"source": [
"projects = client.projects.list({\"isDefault\": True, \"owner._id\": ACCOUNT_ID})\n",
"project_id = projects[0][\"_id\"]\n",
"print(f\"✅ Using project: {projects[0]['name']} ({project_id})\")"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation for empty projects list.

Line 232 accesses projects[0] without checking if a default project exists. If no default project is configured, this will raise an IndexError.

🛡️ Suggested fix
 projects = client.projects.list({"isDefault": True, "owner._id": ACCOUNT_ID})
+if not projects:
+    raise ValueError("No default project found. Please create a default project in your account.")
 project_id = projects[0]["_id"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 225 -
234, The code assumes a default project exists and directly indexes projects[0],
which can raise an IndexError; update the block that calls
client.projects.list(...) and assigns project_id to first element (projects[0])
to first check if the returned projects list is non-empty (e.g., if not
projects: log/raise a clear error mentioning ACCOUNT_ID and that no default
project was found, or handle creation/selection flow), and only then set
project_id = projects[0]["_id"] and print the message; reference the variables
client.projects.list, projects, project_id, and ACCOUNT_ID when making the
change.

Comment on lines +449 to +461
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"# Select first available cluster or use specified name\n",
"cluster = next((c for c in clusters if c.fqdn == CLUSTER_NAME), clusters[0] if clusters else None)\n",
"\n",
"compute = Compute(\n",
" cluster=cluster,\n",
" queue=QUEUE_NAME,\n",
" ppn=PPN\n",
")\n",
"\n",
"print(f\"Using cluster: {cluster.fqdn}, queue: {QUEUE_NAME}, ppn: {PPN}\")"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty clusters list gracefully.

Line 453 sets cluster to None when no clusters are available, but the code continues to create a Compute object and access cluster.fqdn on line 461, which would raise an AttributeError.

🛡️ Suggested fix
 # Select first available cluster or use specified name
 cluster = next((c for c in clusters if c.fqdn == CLUSTER_NAME), clusters[0] if clusters else None)

+if not cluster:
+    raise ValueError("No clusters available. Please check your account configuration.")
+
 compute = Compute(
     cluster=cluster,
     queue=QUEUE_NAME,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"# Select first available cluster or use specified name\n",
"cluster = next((c for c in clusters if c.fqdn == CLUSTER_NAME), clusters[0] if clusters else None)\n",
"\n",
"compute = Compute(\n",
" cluster=cluster,\n",
" queue=QUEUE_NAME,\n",
" ppn=PPN\n",
")\n",
"\n",
"print(f\"Using cluster: {cluster.fqdn}, queue: {QUEUE_NAME}, ppn: {PPN}\")"
"source": [
"from mat3ra.ide.compute import Compute\n",
"\n",
"# Select first available cluster or use specified name\n",
"cluster = next((c for c in clusters if c.fqdn == CLUSTER_NAME), clusters[0] if clusters else None)\n",
"\n",
"if not cluster:\n",
" raise ValueError(\"No clusters available. Please check your account configuration.\")\n",
"\n",
"compute = Compute(\n",
" cluster=cluster,\n",
" queue=QUEUE_NAME,\n",
" ppn=PPN\n",
")\n",
"\n",
"print(f\"Using cluster: {cluster.fqdn}, queue: {QUEUE_NAME}, ppn: {PPN}\")"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 449 -
461, The code assigns cluster = next((c for c in clusters if c.fqdn ==
CLUSTER_NAME), clusters[0] if clusters else None) and then instantiates
Compute(...) and prints cluster.fqdn which will raise if clusters is empty;
modify the logic to detect an empty clusters list before creating the Compute
object (e.g., if not clusters: raise a meaningful exception or exit with a clear
error), or provide a fallback/validation that ensures cluster is not None, then
pass that validated cluster into Compute and only call cluster.fqdn in the print
after the check; update references in this block (clusters, CLUSTER_NAME,
cluster, Compute) accordingly.

@VsevolodX VsevolodX changed the base branch from main to feature/SOF-7836 February 27, 2026 19:25
Base automatically changed from feature/SOF-7836 to main February 28, 2026 02:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
other/materials_designer/workflows/total_energy.ipynb (3)

418-425: ⚠️ Potential issue | 🟠 Major

Validate cluster selection explicitly before creating Compute.

Line 419 can yield None when no clusters are available, and it also silently falls back to clusters[0] when CLUSTER_NAME is set but not found. That can fail later or run on the wrong cluster.

Suggested fix
 # Select first available cluster or use specified name
-cluster = next((c for c in clusters if c["hostname"] == CLUSTER_NAME), clusters[0] if clusters else None)
+if not clusters:
+    raise ValueError("No clusters available for this account.")
+
+if CLUSTER_NAME:
+    cluster = next((c for c in clusters if c["hostname"] == CLUSTER_NAME), None)
+    if cluster is None:
+        raise ValueError(f"Cluster '{CLUSTER_NAME}' was not found in available clusters.")
+else:
+    cluster = clusters[0]

 compute = Compute(
     cluster=cluster,
     queue=QUEUE_NAME,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 418 -
425, The current cluster selection expression can return None or silently pick
the wrong cluster; update the logic around variable cluster (used to construct
Compute) to explicitly validate presence: if CLUSTER_NAME is set, search
clusters for a matching c["hostname"] and raise a clear error if not found;
otherwise if clusters is non-empty pick clusters[0]; if clusters is empty raise
an error before instantiating Compute so you never pass None. Ensure the updated
logic refers to the same symbols (cluster, clusters, CLUSTER_NAME) and performs
explicit checks and raises informative exceptions prior to calling
Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN).

197-199: ⚠️ Potential issue | 🟠 Major

Guard against missing default project before indexing.

Lines 197-199 assume projects is non-empty. If no default project exists, this raises IndexError and stops the notebook.

Suggested fix
 projects = client.projects.list({"isDefault": True, "owner._id": ACCOUNT_ID})
+if not projects:
+    raise ValueError(f"No default project found for account {ACCOUNT_ID}.")
 project_id = projects[0]["_id"]
 print(f"✅ Using project: {projects[0]['name']} ({project_id})")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 197 -
199, The code assumes projects = client.projects.list({"isDefault": True,
"owner._id": ACCOUNT_ID}) returns at least one entry; guard against an empty
list by checking projects before accessing projects[0], and handle the
missing-default case (e.g., raise a clear error, log a descriptive message and
exit, or prompt/create a default project) so project_id and the print statement
do not raise IndexError; update the logic around the projects variable,
project_id and the print(f"✅ Using project: ...") to perform this existence
check and branch accordingly.

57-57: ⚠️ Potential issue | 🟡 Minor

Fix subsection numbering sequence in Section 1.

Line 57 repeats 1.1, while Line 102 is already 1.2. The headings should be sequential.

Suggested fix
-    "### 1.1. Set parameters and configurations for the workflow and job"
+    "### 1.2. Set parameters and configurations for the workflow and job"

-    "### 1.2. API Configuration (optional, for local development)"
+    "### 1.3. API Configuration (optional, for local development)"

Also applies to: 102-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` at line 57, There is a
duplicate/misaligned subsection number in Section 1: find the markdown cell
containing the header text "### 1.1. Set parameters and configurations for the
workflow and job" and rename its numeric prefix so the Section 1 headings are
strictly sequential (so that the following header that currently reads "### 1.2"
becomes the correct next subsection); update the header text only (preserve the
rest of the line) to match the proper sequence (e.g., adjust "1.1" to the
correct preceding or following number to avoid duplication).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@other/materials_designer/workflows/total_energy.ipynb`:
- Around line 463-465: The code assumes job_response[0] exists after calling
create_job; first check that job_response is not empty (and is the expected
type) before accessing job_response[0]; if empty or unexpected, raise or log a
clear error including the full job_response and any create_job diagnostics,
otherwise proceed to set job_dict = job_response[0], job =
dict_to_namespace(job_dict) and job_id = job._id so that failures surface the
real cause instead of an IndexError.

---

Duplicate comments:
In `@other/materials_designer/workflows/total_energy.ipynb`:
- Around line 418-425: The current cluster selection expression can return None
or silently pick the wrong cluster; update the logic around variable cluster
(used to construct Compute) to explicitly validate presence: if CLUSTER_NAME is
set, search clusters for a matching c["hostname"] and raise a clear error if not
found; otherwise if clusters is non-empty pick clusters[0]; if clusters is empty
raise an error before instantiating Compute so you never pass None. Ensure the
updated logic refers to the same symbols (cluster, clusters, CLUSTER_NAME) and
performs explicit checks and raises informative exceptions prior to calling
Compute(cluster=cluster, queue=QUEUE_NAME, ppn=PPN).
- Around line 197-199: The code assumes projects =
client.projects.list({"isDefault": True, "owner._id": ACCOUNT_ID}) returns at
least one entry; guard against an empty list by checking projects before
accessing projects[0], and handle the missing-default case (e.g., raise a clear
error, log a descriptive message and exit, or prompt/create a default project)
so project_id and the print statement do not raise IndexError; update the logic
around the projects variable, project_id and the print(f"✅ Using project: ...")
to perform this existence check and branch accordingly.
- Line 57: There is a duplicate/misaligned subsection number in Section 1: find
the markdown cell containing the header text "### 1.1. Set parameters and
configurations for the workflow and job" and rename its numeric prefix so the
Section 1 headings are strictly sequential (so that the following header that
currently reads "### 1.2" becomes the correct next subsection); update the
header text only (preserve the rest of the line) to match the proper sequence
(e.g., adjust "1.1" to the correct preceding or following number to avoid
duplication).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1abb3c and f1ec0c7.

📒 Files selected for processing (2)
  • other/materials_designer/workflows/total_energy.ipynb
  • utils/api.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • utils/api.py

Comment on lines +463 to +465
"job_dict = job_response[0]\n",
"job = dict_to_namespace(job_dict)\n",
"job_id = job._id\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Check create_job response before reading the first item.

Line 463 assumes job_response[0] always exists. If creation returns an empty list, the notebook fails with IndexError and hides the real failure cause.

Suggested fix
-job_dict = job_response[0]
+if not job_response:
+    raise RuntimeError("Job creation returned no jobs. Check request payload and API response.")
+job_dict = job_response[0]
 job = dict_to_namespace(job_dict)
 job_id = job._id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 463 -
465, The code assumes job_response[0] exists after calling create_job; first
check that job_response is not empty (and is the expected type) before accessing
job_response[0]; if empty or unexpected, raise or log a clear error including
the full job_response and any create_job diagnostics, otherwise proceed to set
job_dict = job_response[0], job = dict_to_namespace(job_dict) and job_id =
job._id so that failures surface the real cause instead of an IndexError.

@@ -0,0 +1,560 @@
{
Copy link
Member

@timurbazhirov timurbazhirov Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers don't match with the headers below there's 8 headers


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to explain about how to get material into this notebook

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
other/materials_designer/workflows/total_energy.ipynb (3)

186-188: ⚠️ Potential issue | 🟡 Minor

Guard projects[0] with an empty-list check.

Line 187 assumes a default project always exists; this can raise IndexError and hide the real setup problem.

🛡️ Proposed fix
 projects = client.projects.list({"isDefault": True, "owner._id": ACCOUNT_ID})
+if not projects:
+    raise ValueError(f"No default project found for ACCOUNT_ID={ACCOUNT_ID}.")
 project_id = projects[0]["_id"]
 print(f"✅ Using project: {projects[0]['name']} ({project_id})")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 186 -
188, Guard the access to projects[0] returned by client.projects.list by
checking if the projects list is empty before assigning project_id or printing;
update the code around the projects variable and project_id assignment (the
client.projects.list call and subsequent use of projects[0]) to raise or log a
clear error (or provide fallback) when no default project is found so an
IndexError cannot occur and the setup issue is surfaced.

407-414: ⚠️ Potential issue | 🟠 Major

Handle empty clusters and avoid silent fallback when CLUSTER_NAME is invalid.

Line 407 can yield None when no clusters exist, and Compute(...) is still constructed. Also, when CLUSTER_NAME is set but unmatched, it silently falls back to the first cluster, which is risky.

🛠️ Proposed fix
 # Select first available cluster or use specified name
-cluster = next((c for c in clusters if c["hostname"] == CLUSTER_NAME), clusters[0] if clusters else None)
+if not clusters:
+    raise ValueError("No clusters available for the selected account.")
+
+if CLUSTER_NAME:
+    cluster = next((c for c in clusters if c["hostname"] == CLUSTER_NAME), None)
+    if cluster is None:
+        raise ValueError(f"Cluster '{CLUSTER_NAME}' not found.")
+else:
+    cluster = clusters[0]

 compute = Compute(
     cluster=cluster,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 407 -
414, The code silently constructs Compute even when no clusters exist or when
CLUSTER_NAME is set but not found; change the logic around the cluster selection
so you first validate clusters and CLUSTER_NAME: if clusters is empty raise an
error (do not construct Compute), then find the matching cluster name using the
existing generator expression, and if CLUSTER_NAME was provided but no match is
found raise an informative error instead of falling back to clusters[0]; only
when CLUSTER_NAME is empty/None may you default to clusters[0]; finally pass the
validated cluster into Compute(...) (refer to the cluster variable,
CLUSTER_NAME, clusters list, and Compute constructor).

451-453: ⚠️ Potential issue | 🟡 Minor

Validate create_job output before indexing first item.

Line 451 assumes job_response[0] always exists. Empty/unexpected responses should raise a clear error with context.

🛡️ Proposed fix
-job_dict = job_response[0]
+if not isinstance(job_response, list) or not job_response:
+    raise RuntimeError(f"Job creation returned invalid response: {job_response}")
+job_dict = job_response[0]
 job = dict_to_namespace(job_dict)
 job_id = job._id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 451 -
453, Validate the output of create_job (job_response) before indexing; check
that job_response is a non-empty sequence and that job_response[0] is a
mapping/object, and if not raise a clear error (including the create_job context
and actual job_response value) instead of blindly doing job_dict =
job_response[0]; update the block around job_response, dict_to_namespace and
job._id to perform this validation and throw a descriptive exception mentioning
create_job/job_response so callers can debug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@other/materials_designer/workflows/total_energy.ipynb`:
- Around line 164-168: The code unconditionally dereferences selected_account
after calling client.get_account(name=ORGANIZATION_NAME), which will raise
AttributeError if the lookup fails; update the block that calls
client.get_account (and the subsequent ACCOUNT_ID = selected_account.id / print)
to validate the returned selected_account (check for None or missing id/name),
handle the failure by logging or raising a clear error and exiting early, and
only set ACCOUNT_ID and print the selected_account.name when selected_account is
valid.

---

Duplicate comments:
In `@other/materials_designer/workflows/total_energy.ipynb`:
- Around line 186-188: Guard the access to projects[0] returned by
client.projects.list by checking if the projects list is empty before assigning
project_id or printing; update the code around the projects variable and
project_id assignment (the client.projects.list call and subsequent use of
projects[0]) to raise or log a clear error (or provide fallback) when no default
project is found so an IndexError cannot occur and the setup issue is surfaced.
- Around line 407-414: The code silently constructs Compute even when no
clusters exist or when CLUSTER_NAME is set but not found; change the logic
around the cluster selection so you first validate clusters and CLUSTER_NAME: if
clusters is empty raise an error (do not construct Compute), then find the
matching cluster name using the existing generator expression, and if
CLUSTER_NAME was provided but no match is found raise an informative error
instead of falling back to clusters[0]; only when CLUSTER_NAME is empty/None may
you default to clusters[0]; finally pass the validated cluster into Compute(...)
(refer to the cluster variable, CLUSTER_NAME, clusters list, and Compute
constructor).
- Around line 451-453: Validate the output of create_job (job_response) before
indexing; check that job_response is a non-empty sequence and that
job_response[0] is a mapping/object, and if not raise a clear error (including
the create_job context and actual job_response value) instead of blindly doing
job_dict = job_response[0]; update the block around job_response,
dict_to_namespace and job._id to perform this validation and throw a descriptive
exception mentioning create_job/job_response so callers can debug.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 190b36e and cc17865.

📒 Files selected for processing (1)
  • other/materials_designer/workflows/total_energy.ipynb

Comment on lines +164 to +168
"if ORGANIZATION_NAME:\n",
" selected_account = client.get_account(name=ORGANIZATION_NAME)\n",
"\n",
"ACCOUNT_ID = selected_account.id\n",
"print(f\"✅ Selected account ID: {ACCOUNT_ID}, name: {selected_account.name}\")"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate organization lookup result before using selected_account.id.

Line 167 dereferences selected_account unconditionally. If ORGANIZATION_NAME is invalid/inaccessible, this will fail with AttributeError.

🛠️ Proposed fix
 selected_account = client.my_account

 if ORGANIZATION_NAME:
     selected_account = client.get_account(name=ORGANIZATION_NAME)
+    if not selected_account:
+        raise ValueError(f"Organization '{ORGANIZATION_NAME}' not found or not accessible.")

 ACCOUNT_ID = selected_account.id
 print(f"✅ Selected account ID: {ACCOUNT_ID}, name: {selected_account.name}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@other/materials_designer/workflows/total_energy.ipynb` around lines 164 -
168, The code unconditionally dereferences selected_account after calling
client.get_account(name=ORGANIZATION_NAME), which will raise AttributeError if
the lookup fails; update the block that calls client.get_account (and the
subsequent ACCOUNT_ID = selected_account.id / print) to validate the returned
selected_account (check for None or missing id/name), handle the failure by
logging or raising a clear error and exiting early, and only set ACCOUNT_ID and
print the selected_account.name when selected_account is valid.

@@ -0,0 +1,540 @@
{
Copy link
Member

@timurbazhirov timurbazhirov Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #6.    # # Example: Change model parameters

Add move explanation regarding what/how to use this


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kgrid modification is not done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split to 4.3 Modify Model Parameters, 4.4 Tune/Adjust Method Parameters

@@ -0,0 +1,609 @@
{
Copy link
Member

@timurbazhirov timurbazhirov Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why empty?


Reply via ReviewNB

list[dict]: List of created job dicts.
"""
if save_to_collection:
return jobs_endpoint.create_by_ids(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense - we can just use jobs_endpoint directly

@@ -0,0 +1,609 @@
{
Copy link
Member

@timurbazhirov timurbazhirov Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always save the workflow when hash-related changes are introduced


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove SAVE_WF_TO_COLLECTION

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants