diff --git a/TSG/EnvironmentValidator/Networking/Troubleshoot-Network-Test-StorageConnections-ConnectivityCheck.md b/TSG/EnvironmentValidator/Networking/Troubleshoot-Network-Test-StorageConnections-ConnectivityCheck.md index 40511c7a..bc56e69a 100644 --- a/TSG/EnvironmentValidator/Networking/Troubleshoot-Network-Test-StorageConnections-ConnectivityCheck.md +++ b/TSG/EnvironmentValidator/Networking/Troubleshoot-Network-Test-StorageConnections-ConnectivityCheck.md @@ -252,7 +252,7 @@ In converged deployments, the Storage Connections validator will create a tempor 4. If any ping fails, check the following: - - That the VLANs are correctly configured on the TOR switches. In a converged deployment, both storage VLANs should be configured on the interface. + - That the VLANs are correctly configured on the TOR switches. In a converged deployment, each storage VLAN should be configured on its respective ToR switch (Storage VLAN A on ToR-A, Storage VLAN B on ToR-B). - That physical NICs are connected to the correct ports on the TOR switches. - That no VLANs are configured on the physical NICs. - That no firewall rules or other configuration are blocking APIPA traffic. diff --git a/TSG/Networking/Top-Of-Rack-Switch/Overview-Azure-Local-Deployment-Pattern.md b/TSG/Networking/Top-Of-Rack-Switch/Overview-Azure-Local-Deployment-Pattern.md index 59ac33bc..2b0fed79 100644 --- a/TSG/Networking/Top-Of-Rack-Switch/Overview-Azure-Local-Deployment-Pattern.md +++ b/TSG/Networking/Top-Of-Rack-Switch/Overview-Azure-Local-Deployment-Pattern.md @@ -72,7 +72,7 @@ A high-performance design utilizing dedicated NICs for management/compute and st ![Switched with 2 ToRs](images/AzureLocalPhysicalNetworkDiagram_Switched.png) **Fully Converged Deployment** -A balanced design where all traffic types (management, compute, storage) share the same physical NICs through VLAN segmentation. This pattern minimizes hardware footprint while maintaining high scalability. **Both storage VLANs must be configured on both ToR switches** because SET (Switch Embedded Teaming) may route either storage VLAN through either physical NIC. +A balanced design where all traffic types (management, compute, storage) share the same physical NICs through VLAN segmentation. This pattern minimizes hardware footprint while maintaining high scalability. The **recommended** configuration uses **one storage VLAN per ToR switch**: Storage VLAN A on ToR-A (mapped to one physical NIC) and Storage VLAN B on ToR-B (mapped to the other physical NIC). In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path. ![Fully-Converged with 2 ToRs](images/AzureLocalPhysicalNetworkDiagram_FullyConverged.png) @@ -82,11 +82,11 @@ A balanced design where all traffic types (management, compute, storage) share t | Deployment Pattern | Host NIC Configuration | ToR Switch VLAN Configuration | Primary Use Cases | |---------------------|------------------------|-------------------------------|-------------------| | **Switchless** | 2 NICs to switches (M+C traffic) + (Nāˆ’1) direct inter-node NICs (S traffic) | Trunk ports with M, C VLANs only; no storage VLANs on ToRs | Edge deployments, remote sites, cost-sensitive environments | -| **Switched** | 4 NICs per host: 2 for M+C traffic, 2 dedicated for storage | M and C VLANs on both ToRs; S1 VLAN on ToR1 only, S2 VLAN on ToR2 only (dedicated storage NICs) | Enterprise deployments requiring dedicated storage performance and traffic isolation | -| **Fully Converged** | 2 NICs per host carrying all traffic types (M+C+S) via VLAN segmentation | Both storage VLANs (S1, S2) on both ToRs (required for SET) | General-purpose deployments balancing performance, simplicity, and hardware efficiency | +| **Switched** | 4 NICs per host: 2 for M+C traffic, 2 dedicated for storage | M and C VLANs on both ToRs; S1 VLAN on ToR-A only, S2 VLAN on ToR-B only (dedicated storage NICs) | Enterprise deployments requiring dedicated storage performance and traffic isolation | +| **Fully Converged** | 2 NICs per host carrying all traffic types (M+C+S) via VLAN segmentation | S1 VLAN on ToR-A only, S2 VLAN on ToR-B only (recommended) | General-purpose deployments balancing performance, simplicity, and hardware efficiency | > [!NOTE] -> **Storage VLAN Configuration**: Storage VLANs can be configured as either **Layer 3 (L3) networks with IP subnets** or **Layer 2 (L2) networks without IP subnets**. **Layer 2 configuration is recommended** because it simplifies VLAN tagging, allowing Azure Local hosts to use any IP addresses without hardcoding subnet configurations on the switch or requiring predefined IP ranges. Since Azure Local nodes handle storage traffic tagging, ensure these VLANs are configured as **tagged VLANs on trunk ports** across all ToR switches. +> **Storage VLAN Configuration**: Storage VLANs can be configured as either **Layer 3 (L3) networks with IP subnets** or **Layer 2 (L2) networks without IP subnets**. **Layer 2 configuration is recommended** because it simplifies VLAN tagging, allowing Azure Local hosts to use any IP addresses without hardcoding subnet configurations on the switch or requiring predefined IP ranges. For the recommended deployment patterns in this document, storage VLANs must be configured as **tagged VLANs on trunk ports only on their respective ToR switches**, and **must not be tagged across all ToR switches** unless you are intentionally implementing a non-recommended, legacy, or special-case design that explicitly requires global storage VLAN reachability. --- @@ -131,27 +131,20 @@ This tool is designed to automate the generation of Azure Local switch configura ### Q: How should Storage VLANs be configured across ToR switches? **A:** -Storage VLAN configuration depends on the **deployment pattern**: +The recommended baseline design uses **one storage VLAN per ToR switch** for both Switched and Fully Converged deployments: | Deployment Pattern | ToR VLAN Configuration | Why | |-------------------|------------------------|-----| -| **Switched** | S1 on ToR1 only, S2 on ToR2 only | Dedicated storage NICs connect to specific ToRs | -| **Fully Converged** | Both S1 & S2 on both ToRs | SET may route either storage VLAN through either physical NIC | +| **Switched** | S1 on ToR-A only, S2 on ToR-B only | Dedicated storage NICs connect to specific ToRs | +| **Fully Converged** | S1 on ToR-A only, S2 on ToR-B only | Each storage VLAN is mapped to one physical NIC; failover occurs automatically | -**Switched Deployment (One Storage VLAN per ToR):** -- Each host has **dedicated storage NICs** (4 NICs total) -- Storage NIC1 connects to ToR1 → only needs VLAN 711 -- Storage NIC2 connects to ToR2 → only needs VLAN 712 -- This reduces MC-LAG utilization and optimizes RDMA performance +**Storage VLAN Configuration:** +- Storage VLAN A is configured only on ToR-A and mapped to one physical NIC +- Storage VLAN B is configured only on ToR-B and mapped to the other physical NIC +- In failure scenarios (NIC or ToR failure), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact -**Fully Converged Deployment (Both Storage VLANs on Both ToRs):** -- Each host has only **2 NICs** shared for all traffic -- SET (Switch Embedded Teaming) handles vNIC-to-pNIC mapping -- SET may route either storage VLAN through either physical NIC -- **Both ToRs must carry both storage VLANs** to support SET's flexibility - -> [!IMPORTANT] -> In Fully Converged deployments, configuring only one storage VLAN per ToR will cause connectivity issues when SET routes a storage vNIC to a physical NIC connected to a ToR that doesn't have that VLAN configured. +> [!NOTE] +> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed there is no meaningful resiliency or failover benefit from this configuration, and it increases complexity without improving availability. ### Q: Are **DCB (Data Center Bridging)** features like **PFC** and **ETS** required for RDMA in Azure Local deployments? diff --git a/TSG/Networking/Top-Of-Rack-Switch/Reference-TOR-Fully-Converged-Storage.md b/TSG/Networking/Top-Of-Rack-Switch/Reference-TOR-Fully-Converged-Storage.md index 66776d0f..d84e382f 100644 --- a/TSG/Networking/Top-Of-Rack-Switch/Reference-TOR-Fully-Converged-Storage.md +++ b/TSG/Networking/Top-Of-Rack-Switch/Reference-TOR-Fully-Converged-Storage.md @@ -27,7 +27,7 @@ This document provides a comprehensive reference for implementing a fully conver - [Quality of Service (QoS)](#quality-of-service-qos) - [BGP Routing](#bgp-routing) - [Frequently Asked Questions](#frequently-asked-questions) - - [Q: Why must both Storage VLANs be on both ToR switches in Fully Converged?](#q-why-must-both-storage-vlans-be-on-both-tor-switches-in-fully-converged) + - [Q: How should Storage VLANs be configured in Fully Converged deployments?](#q-how-should-storage-vlans-be-configured-in-fully-converged-deployments) - [Additional Resources](#additional-resources) - [Official Documentation](#official-documentation) - [Technical Deep Dives](#technical-deep-dives) @@ -44,7 +44,7 @@ Azure Local's fully converged network design provides a unified approach to hand The fully converged physical network architecture integrates **management**, **compute**, and **storage** traffic over the same physical Ethernet interfaces. This design minimizes hardware footprint while maximizing scalability and deployment simplicity. -**Key Design Principle**: In Fully Converged deployments, **both storage VLANs must be configured on both ToR switches**. This is because each host has only 2 NICs (shared for all traffic), and SET (Switch Embedded Teaming) may route either storage VLAN through either physical NIC based on its load balancing algorithm. +**Key Design Principle**: In Fully Converged deployments, the **recommended** baseline design uses **one storage VLAN per ToR switch**: Storage VLAN A is configured only on TOR-A and mapped to one physical NIC, while Storage VLAN B is configured only on TOR-B and mapped to the other physical NIC. In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact. Configuring both storage VLANs on both ToR switches is also supported but optional. ## Architecture Components @@ -82,7 +82,7 @@ This section demonstrates a **fully converged Azure Local deployment** where man ### Design Characteristics - **Fully Converged**: All traffic types (Management, Compute, Storage) utilize the same physical links -- **Redundant Infrastructure**: Each node connects to both ToR1 and ToR2 for high availability +- **Redundant Infrastructure**: Each node connects to both TOR-A and TOR-B for high availability - **Switch Embedded Teaming**: Host-level NIC bonding provides fault tolerance and load balancing - **VLAN Segmentation**: Traffic isolation using IEEE 802.1Q VLAN tagging @@ -103,22 +103,22 @@ The following tables demonstrate physical connectivity between Azure Local nodes | Azure Local Node | Interface | ToR Switch | Interface | |------------------|-----------|------------|-------------| -| **Host1** | NIC A | ToR1 | Ethernet1/1 | -| **Host1** | NIC B | ToR2 | Ethernet1/1 | +| **Host1** | NIC A | TOR-A | Ethernet1/1 | +| **Host1** | NIC B | TOR-B | Ethernet1/1 | #### Host 2 | Azure Local Node | Interface | ToR Switch | Interface | |------------------|-----------|------------|-------------| -| **Host2** | NIC A | ToR1 | Ethernet1/2 | -| **Host2** | NIC B | ToR2 | Ethernet1/2 | +| **Host2** | NIC A | TOR-A | Ethernet1/2 | +| **Host2** | NIC B | TOR-B | Ethernet1/2 | #### Host 3 | Azure Local Node | Interface | ToR Switch | Interface | |------------------|-----------|------------|-------------| -| **Host3** | NIC A | ToR1 | Ethernet1/3 | -| **Host3** | NIC B | ToR2 | Ethernet1/3 | +| **Host3** | NIC A | TOR-A | Ethernet1/3 | +| **Host3** | NIC B | TOR-B | Ethernet1/3 | ### VLAN Architecture @@ -132,14 +132,14 @@ The fully converged design uses VLAN segmentation to isolate different traffic t | Storage 1 | SMB storage over RDMA (first path) | 711 | Tagged VLAN, L2 only (no SVI) | | Storage 2 | SMB storage over RDMA (second path) | 712 | Tagged VLAN, L2 only (no SVI) | -> [!IMPORTANT] -> **Storage VLAN Design Pattern for Fully Converged**: In Fully Converged deployments, **both storage VLANs (711 and 712) must be configured on both ToR switches**. This is because: +> [!NOTE] +> **Storage VLAN Design Pattern for Fully Converged**: The **recommended** baseline design uses **one storage VLAN per ToR switch**: > -> - Each host has only **2 NICs** connecting to both ToRs (no dedicated storage NICs) -> - **SET (Switch Embedded Teaming)** handles vNIC-to-pNIC mapping at the host level -> - SET may route either storage VLAN through either physical NIC based on its load balancing algorithm +> - Storage VLAN 711 is configured only on TOR-A and mapped to one physical NIC +> - Storage VLAN 712 is configured only on TOR-B and mapped to the other physical NIC +> - In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path > -> This differs from **Switched** deployments where dedicated storage NICs connect to specific ToRs, allowing one storage VLAN per ToR. +> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed no meaningful resiliency benefit from this configuration. ### Top-of-Rack Switch Configuration @@ -168,7 +168,7 @@ This section provides configuration guidance using **Cisco Nexus 93180YC-FX3 (NX - **VLAN 712 (Storage 2)**: Layer 2 only VLAN (no SVI), tagged on trunk ports for RDMA traffic > [!NOTE] -> In Fully Converged deployments, **both storage VLANs must be configured on both ToR switches** because SET handles vNIC-to-pNIC mapping at the host level and may route either storage VLAN through either physical NIC. +> In Fully Converged deployments, the recommended design uses **one storage VLAN per ToR switch**: Storage VLAN 711 on TOR-A only, Storage VLAN 712 on TOR-B only. This simplifies configuration while automatic failover handles NIC or ToR failures. > [!IMPORTANT] > Storage VLANs 711 and 712 should **NOT** be permitted on the ToR-to-ToR peer-link (vPC peer-link, MLAG inter-switch trunk, or any L2 interconnect between ToR switches). Storage traffic must flow directly from host to ToR to destination host to maintain optimal RDMA performance. Allowing storage VLANs on peer links can cause performance degradation. @@ -181,7 +181,7 @@ This section provides configuration guidance using **Cisco Nexus 93180YC-FX3 (NX ##### Sample NX-OS Configuration -**ToR1 Configuration:** +**TOR-A Configuration:** ```console vlan 7 name Management_7 @@ -189,8 +189,6 @@ vlan 201 name Compute_201 vlan 711 name Storage_711 -vlan 712 - name Storage_712 interface Vlan7 description Management @@ -213,7 +211,7 @@ interface Ethernet1/1-3 switchport switchport mode trunk switchport trunk native vlan 7 - switchport trunk allowed vlan 7,201,711,712 + switchport trunk allowed vlan 7,201,711 priority-flow-control mode on send-tlv spanning-tree port type edge trunk mtu 9216 @@ -221,14 +219,12 @@ interface Ethernet1/1-3 no shutdown ``` -**ToR2 Configuration:** +**TOR-B Configuration:** ```console vlan 7 name Management_7 vlan 201 name Compute_201 -vlan 711 - name Storage_711 vlan 712 name Storage_712 @@ -253,7 +249,7 @@ interface Ethernet1/1-3 switchport switchport mode trunk switchport trunk native vlan 7 - switchport trunk allowed vlan 7,201,711,712 + switchport trunk allowed vlan 7,201,712 priority-flow-control mode on send-tlv spanning-tree port type edge trunk mtu 9216 @@ -262,8 +258,8 @@ interface Ethernet1/1-3 ``` > [!NOTE] -> - Both ToR switches have **identical VLAN configurations** (7, 201, 711, 712) in Fully Converged deployments -> - SET at the host level handles vNIC-to-pNIC mapping to optimize storage traffic paths +> - TOR-A has Storage VLAN 711 only, TOR-B has Storage VLAN 712 only (one storage VLAN per ToR) +> - In failure scenarios, SMB/RDMA traffic automatically fails over to the remaining path > - QoS policies and routing design (e.g., uplinks, BGP/OSPF, default gateway) will be introduced in a separate document @@ -326,7 +322,7 @@ Host4 c$ Administrator Administrator 3.1.1 2 > [!NOTE] > **SMB Multichannel Validation Key Points:** -> - Both storage VLANs (711 and 712) are operational with RDMA enabled +> - Storage VLANs 711 and 712 are operational with RDMA enabled (each mapped to its respective ToR) > - `RdmaConnectionCount = 2` confirms RDMA is being used for storage traffic > - `TcpConnectionCount = 0` shows no fallback to regular TCP > - SMB 3.1.1 dialect is being used for optimal performance @@ -345,7 +341,7 @@ Confirm that storage VLANs 711 and 712 are allowed on the trunk to the host: ```console # Verify VLANs are allowed on the interface trunk -ToR1# show interface ethernet 1/3 trunk +TOR-A# show interface ethernet 1/3 trunk Port Native Status Port Vlan Channel @@ -364,7 +360,7 @@ Check MAC address table entries for storage VLANs. The example below shows one p ```console # Check per-VLAN MAC table entries across the ToR -ToR1# show mac address-table vlan 711 +TOR-A# show mac address-table vlan 711 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, @@ -373,7 +369,7 @@ Legend: ---------+-----------------+--------+---------+------+----+------------------ * 711 0015.5dc8.2006 dynamic 0 F F Eth1/3 -ToR1# show mac address-table vlan 712 +TOR-A# show mac address-table vlan 712 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, @@ -405,26 +401,24 @@ For BGP routing configuration and best practices in Azure Local deployments: ## Frequently Asked Questions -### Q: Why must both Storage VLANs be on both ToR switches in Fully Converged? +### Q: How should Storage VLANs be configured in Fully Converged deployments? **A:** -In Fully Converged deployments, **both storage VLANs (711 and 712) must be configured on both ToR switches**. This is required because: +The recommended baseline design uses **one storage VLAN per ToR switch** for Fully Converged deployments: -1. **Only 2 NICs per host**: Each host connects one NIC to ToR1 and one to ToR2 -2. **SET handles traffic routing**: Switch Embedded Teaming maps storage vNICs to physical NICs at the host level -3. **Either VLAN through either NIC**: SET's load balancing may route Storage VLAN 711 or 712 through either physical NIC +- Storage VLAN A (711) is configured only on TOR-A and mapped to one physical NIC +- Storage VLAN B (712) is configured only on TOR-B and mapped to the other physical NIC +- In failure scenarios (NIC or ToR failure), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact -**How it differs from Switched deployment:** +**Storage VLAN Configuration:** | Deployment Pattern | Storage NICs | ToR VLAN Config | Why | |-------------------|--------------|-----------------|-----| -| **Fully Converged** | Shared (2 NICs total) | Both VLANs on both ToRs | SET may route either VLAN through either NIC | -| **Switched** | Dedicated (4 NICs total) | One VLAN per ToR | Each storage NIC connects to a specific ToR | - -**Key Point:** The "one storage VLAN per ToR" optimization applies to **Switched** deployments where dedicated storage NICs connect to specific ToRs. In Fully Converged, SET's flexibility requires both VLANs on both switches. +| **Fully Converged** | Shared (2 NICs total) | S1 on TOR-A only, S2 on TOR-B only | One storage VLAN per NIC; failover occurs automatically | +| **Switched** | Dedicated (4 NICs total) | S1 on TOR-A only, S2 on TOR-B only | Each storage NIC connects to a specific ToR | > [!NOTE] -> SET uses vNIC-to-pNIC affinity mapping to optimize traffic paths, but the switches must still be configured to carry both storage VLANs to handle any mapping SET chooses. +> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed there is no meaningful resiliency or failover benefit from this configuration, and it increases complexity without improving availability. ## Additional Resources