Skip to content

[upstream test] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init#5671

Open
ujfalusi wants to merge 2 commits intothesofproject:topic/sof-devfrom
ujfalusi:peter/sof/pr/ipc_sdw_resume_upstream
Open

[upstream test] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init#5671
ujfalusi wants to merge 2 commits intothesofproject:topic/sof-devfrom
ujfalusi:peter/sof/pr/ipc_sdw_resume_upstream

Conversation

@ujfalusi
Copy link
Collaborator

@ujfalusi ujfalusi commented Feb 16, 2026

Test for https://lore.kernel.org/linux-sound/20260214064054.19961-1-cole@unwrap.rs/T/#t series

Cover letter:

Two fixes for SOF IPC4 reliability issues observed on Lenovo ThinkPad
P16 Gen 3 (Arrow Lake-S, CS42L43 + CS35L56 over SoundWire):

  1. Replace the broken delayed_ipc_tx_msg mechanism with a bounded retry
    loop. The old deferred dispatch silently drops messages during D0i3
    transitions, causing 500ms+ hangs per IPC chunk.

  2. Add a platform ops callback (dai_link_hw_ready) so Intel HDA
    platforms can wait for SoundWire slave initialization before ALH
    copier setup. Without this, the DSP enters an unrecoverable wedged
    state when userspace opens a PCM before slaves finish re-enumerating
    after resume.

Tested on ThinkPad P16 Gen 3 with repeated suspend/resume cycles
and concurrent audio playback.

Cole Leavitt (2):
ASoC: SOF: Replace IPC TX busy deferral with bounded retry
ASoC: SOF: Add platform ops callback for DAI link hardware readiness

sound/soc/sof/intel/cnl.c | 17 ++---------
sound/soc/sof/intel/hda-common-ops.c | 1 +
sound/soc/sof/intel/hda-ipc.c | 17 ++---------
sound/soc/sof/intel/hda.c | 44 ++++++++++++++++++++++++++++
sound/soc/sof/intel/hda.h | 14 ++++-----
sound/soc/sof/intel/mtl.c | 17 ++---------
sound/soc/sof/ipc4-topology.c | 8 +++++
sound/soc/sof/ipc4.c | 17 +++++++++--
sound/soc/sof/sof-priv.h | 3 ++
9 files changed, 83 insertions(+), 55 deletions(-)

base-commit: 2687c84

2.52.0

The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
deferred message was supposed to be dispatched from the IRQ handler
when the DSP acknowledged the previous message.

This mechanism silently drops messages during D0i3 power transitions
because the IRQ handler never fires while the DSP is in a low-power
state. The caller then hangs in wait_event_timeout() for up to 500ms
per IPC chunk, causing multi-second audio stalls under CPU load.

Fix this by making the platform send_msg functions return -EBUSY
immediately when the TX register is busy (safe since they execute
under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
the tx_mutex (a sleepable context). The retry loop attempts up to 50
iterations with 100-200us delays, bounding the maximum busy-wait to
approximately 10ms instead of the previous 500ms timeout.

Also remove the now-dead delayed_ipc_tx_msg field from
sof_intel_hda_dev, the dispatch code, and the ack_received tracking
variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
mtl_ipc_irq_thread, cnl_ipc4_irq_thread).

Signed-off-by: Cole Leavitt <cole@unwrap.rs>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
After suspend/resume (D3->D0), the SOF firmware is reloaded fresh and
pipelines are recreated lazily when userspace opens a PCM. However,
SoundWire slave re-enumeration runs asynchronously via a 100ms delayed
work item (SDW_INTEL_DELAYED_ENUMERATION_MS). If userspace attempts to
play audio before SoundWire slaves finish re-enumerating, the firmware
returns error 9 (resource not found) when creating ALH copier modules,
leaving the DSP in an unrecoverable wedged state requiring reboot.

Add a new optional dai_link_hw_ready callback to struct snd_sof_dsp_ops
that allows platform-specific code to wait for DAI link hardware to
become ready before pipeline setup. The generic ipc4-topology.c calls
this callback (when set) in sof_ipc4_prepare_copier_module() before
configuring DAI copiers, maintaining SOF's platform abstraction.

The Intel HDA implementation (hda_sdw_dai_hw_ready) waits for all
attached SoundWire slaves to complete initialization using
wait_for_completion_interruptible_timeout() with a 2-second timeout.
This is safe for multiple waiters since the SoundWire subsystem uses
complete_all() for initialization_complete. Unattached slaves (declared
in ACPI but not physically present) are skipped to avoid false timeouts.

The function returns -ETIMEDOUT on timeout (instead of warn-and-continue)
to prevent the DSP from entering a wedged state. On non-resume paths the
completions are already done, so the wait returns immediately.

Link: thesofproject/sof#8662
Link: thesofproject/sof#9308
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants