fpga: d5005 import pci probe errata workaround#2
Open
trixirt wants to merge 54 commits intoOFS:fpga-ofs-devfrom
Open
fpga: d5005 import pci probe errata workaround#2trixirt wants to merge 54 commits intoOFS:fpga-ofs-devfrom
trixirt wants to merge 54 commits intoOFS:fpga-ofs-devfrom
Conversation
Each DFL functional block, e.g. AFU (Accelerated Function Unit) and FME (FPGA Management Engine), could implement more than one function within its region, but current driver only allows one user application to access it by exclusive open on device node. So this is not convenient and flexible for userspace applications, as they have to combine lots of different functions into one single application. This patch removes the limitation here to allow multiple opens to each feature device node for AFU and FME from userspace applications. If user still needs exclusive access to these device node, O_EXCL flag must be issued together with open. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Moritz Fischer <mdf@kernel.org>
pci_driver.sriov_configure should return negative value on error and number of enabled VFs on success. But now the driver returns 0 on success. The sriov configure still works but will cause a warning message: XX VFs requested; only 0 enabled This patch changes the return value accordingly. Cc: stable@vger.kernel.org Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Moritz Fischer <mdf@kernel.org>
…upport This patch adds description for performance reporting support for Device Feature List (DFL) based FPGA. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com>
This patch adds support for performance reporting private feature
for FPGA Management Engine (FME). Now it supports several different
performance counters, including 'basic', 'cache', 'fabric', 'vtd'
and 'vtd_sip'. It allows user to use standard linux tools to access
these performance counters.
e.g. List all events by "perf list"
perf list | grep fme
dfl_fme0/cache_read_hit/ [Kernel PMU event]
dfl_fme0/cache_read_miss/ [Kernel PMU event]
...
dfl_fme0/fab_mmio_read/ [Kernel PMU event]
dfl_fme0/fab_mmio_write/ [Kernel PMU event]
...
dfl_fme0/fab_port_mmio_read,portid=?/ [Kernel PMU event]
dfl_fme0/fab_port_mmio_write,portid=?/ [Kernel PMU event]
...
dfl_fme0/vtd_port_devtlb_1g_fill,portid=?/ [Kernel PMU event]
dfl_fme0/vtd_port_devtlb_2m_fill,portid=?/ [Kernel PMU event]
...
dfl_fme0/vtd_sip_iotlb_1g_hit/ [Kernel PMU event]
dfl_fme0/vtd_sip_iotlb_1g_miss/ [Kernel PMU event]
...
dfl_fme0/clock [Kernel PMU event]
...
e.g. check increased counter value after run one application using
"perf stat" command.
perf stat -e dfl_fme0/fab_mmio_read/,dfl_fme0/fab_mmio_write/ ./test
Performance counter stats for './test':
1 dfl_fme0/fab_mmio_read/
2 dfl_fme0/fab_mmio_write/
1.009496520 seconds time elapsed
Please note that fabric counters support both fab_* and fab_port_*, but
actually they are sharing one set of performance counters in hardware.
If user wants to monitor overall data events on fab_* then fab_port_*
can't be supported at the same time, see example below:
perf stat -e dfl_fme0/fab_mmio_read/,dfl_fme0/fab_port_mmio_write,portid=0/
Performance counter stats for 'system wide':
0 dfl_fme0/fab_mmio_read/
<not supported> dfl_fme0/fab_port_mmio_write,portid=0/
2.141064085 seconds time elapsed
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
DFL based FPGA devices could support interrupts for different purposes,
but current DFL framework only supports feature device enumeration with
given MMIO resources information via common DFL headers. This patch
introduces one new API dfl_fpga_enum_info_add_irq for low level bus
drivers (e.g. PCIe device driver) to pass its interrupt resources
information to DFL framework for enumeration, and also adds interrupt
enumeration code in framework to parse and assign interrupt resources
for enumerated feature devices and their own sub features.
With this patch, DFL framework enumerates interrupt resources for core
features, including PORT Error Reporting, FME (FPGA Management Engine)
Error Reporting and also AFU User Interrupts.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: early validating irq table for each feature in parse_feature_irq().
Some code improvement and minor fix for Hao's comments.
v3: put parse_feature_irqs() inside create_feature_instance()
some minor fixes and more comments
v4: no need to include asm/irq.h.
fail the dfl enumeration when irq parsing error happens.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Some DFL FPGA PCIe cards (e.g. Intel FPGA Programmable Acceleration
Card) support MSI-X based interrupts. This patch allows PCIe driver
to prepare and pass interrupt resources to DFL via enumeration API.
These interrupt resources could then be assigned to actual features
which use them.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: put irq resources init code inside cce_enumerate_feature_dev()
Some minor changes for Hao's comments.
v3: Some minor fix for Hao's comments for v2.
v4: Some minor fix for Hao's comments for v3.
FPGA user applications may be interested in interrupts generated by
DFL features. For example, users can implement their own FPGA
logics with interrupts enabled in AFU (Accelerated Function Unit,
dynamic region of DFL based FPGA). So user applications need to be
notified to handle these interrupts.
In order to allow userspace applications to monitor interrupts,
driver requires userspace to provide eventfds as interrupt
notification channels. Applications then poll/select on the eventfds
to get notified.
This patch introduces a generic helper functions to do eventfds binding
with given interrupts.
Sub feature drivers are expected to use XXX_GET_IRQ_NUM to query irq
info, and XXX_SET_IRQ to set eventfds for interrupts. This patch also
introduces helper functions for these 2 ioctls.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: use unsigned int instead of int for irq array indexes in
dfl_fpga_set_irq_triggers()
Improves comments for NULL fds param in dfl_fpga_set_irq_triggers()
v3: Improve comments of dfl_fpga_set_irq_triggers()
refines code for dfl_fpga_set_irq_triggers, delete local variable j
v4: Introduce 2 helper functions to help handle the XXX_GET_IRQ_NUM &
XXX_SET_IRQ ioctls for sub feature drivers.
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.
This patch adds interrupt support for port error reporting sub feature.
It follows the common DFL interrupt notification and handling mechanism,
implements two ioctl commands below for user to query number of irqs
supported, and set/unset interrupt triggers.
Ioctls:
* DFL_FPGA_PORT_ERR_GET_IRQ_NUM
get the number of irqs, which is used to determine whether/how many
interrupts error reporting feature supports.
* DFL_FPGA_PORT_ERR_SET_IRQ
set/unset given eventfds as error interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: use DFL_FPGA_PORT_ERR_GET_IRQ_NUM instead of
DFL_FPGA_PORT_ERR_GET_INFO
Delete flag field for DFL_FPGA_PORT_ERR_SET_IRQ param
v3: put_user() instead of copy_to_user()
improves comments
v4: use common functions to handle irq ioctls
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.
This patch adds interrupt support for fme global error reporting sub
feature. It follows the common DFL interrupt notification and handling
mechanism. And it implements two ioctls below for user to query
number of irqs supported, and set/unset interrupt triggers.
Ioctls:
* DFL_FPGA_FME_ERR_GET_IRQ_NUM
get the number of irqs, which is used to determine whether/how many
interrupts fme error reporting feature supports.
* DFL_FPGA_FME_ERR_SET_IRQ
set/unset given eventfds as fme error reporting interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: use DFL_FPGA_FME_ERR_GET_IRQ_NUM instead of
DFL_FPGA_FME_ERR_GET_INFO
Delete flags field for DFL_FPGA_FME_ERR_SET_IRQ
v3: put_user() instead of copy_to_user()
improves comments
v4: use common functions to handle irq ioctls
AFU (Accelerated Function Unit) is dynamic region of the DFL based FPGA,
and always defined by users. Some DFL based FPGA cards allow users to
implement their own interrupts in AFU. In order to support this,
hardware implements a new UINT (AFU Interrupt) private feature with
related capability register which describes the number of supported
AFU interrupts as well as the local index of the interrupts for
software enumeration, and from software side, driver follows the common
DFL interrupt notification and handling mechanism, and it implements
two ioctls below for user to query number of irqs supported and set/unset
interrupt triggers.
Ioctls:
* DFL_FPGA_PORT_UINT_GET_IRQ_NUM
get the number of irqs, which is used to determine how many interrupts
UINT feature supports.
* DFL_FPGA_PORT_UINT_SET_IRQ
set/unset eventfds as AFU interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
----
v2: use DFL_FPGA_PORT_UINT_GET_IRQ_NUM instead of
DFL_FPGA_PORT_UINT_GET_INFO
Delete flags field for DFL_FPGA_PORT_UINT_SET_IRQ
v3: put_user() instead of copy_to_user()
improves comments
v4: use common functions to handle irq ioctls
…rfaces. This patch adds introductions of interrupt related interfaces for FME error reporting, port error reporting and AFU user interrupts features. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> ---- v2: Update Documents cause change of irq ioctl interfaces. v3: No change v4: Update interrupt support part.
In early partial reconfiguration private feature, it only supports 32bit data width when writing data to hardware for PR. 512bit data width PR support is an important optimization for some specific solutions (e.g. XEON with FPGA integrated), it allows driver to use AVX512 instruction to improve the performance of partial reconfiguration. e.g. programming one 100MB bitstream image via this 512bit data width PR hardware only takes ~300ms, but 32bit revision requires ~3s per test result. Please note now this optimization is only done on revision 2 of this PR private feature which is only used in integrated solution that AVX512 is always supported. This revision 2 hardware doesn't support 32bit PR. Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Acked-by: Alan Tull <atull@kernel.org> Signed-off-by: Moritz Fischer <mdf@kernel.org>
This patch makes preparation for modularization of DFL sub feature
drivers.
Currently, if we need to support a new DFL sub feature, an entry should
be added to fme/port_feature_drvs[] in dfl-fme/port-main.c. And we need
to re-compile the whole DFL modules. That make the DFL drivers hard to be
extended.
Another consideration is that DFL may contain some IP blocks which are
already supported by kernel, most of them are supported by platform
device drivers. We could create platform devices for these IP blocks and
get them supported by these drivers.
An important issue is that platform device drivers usually requests mmio
resources on probe. But now dfl mmio is mapped in dfl bus driver (e.g.
dfl-pci) as a whole region. Then platform device drivers for sub features
can't request their own mmio resources again. This is what the patch
trying to resolve.
This patch changes the DFL enumeration. DFL bus driver will unmap mmio
resources after first step enumeration and pass enumeration info to DFL
framework. Then DFL framework will map the mmio resources again, do 2nd
step enumeration, and also unmap the mmio resources. In this way, sub
feature drivers could then request their own mmio resources as needed.
An exception is that mmio resource of FIU headers are still mapped in dfl
bus driver. The FIU headers have some fundamental functions (sriov set,
port enable/disable) needed for dfl bus devices and other sub features.
They should not be unmapped as long as dfl bus device is alive.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
----
v2: dfl-pci: move mapped bars out of cci_pci_ioremap_bar.
dfl: introduced a macro is_hdr_feature
delete feature->ioaddr = NULL in dfl_fpga_dev_feature_uinit()
rename ioremap_dfl_region -> dfl_map_iomem
refactor build_info_commit_dev & build_info_create_dev, make
binfo->start & ioaddr change out of these functions
some minor fixes
v3: merged Hao's code improvement patch.
v4: improves comments.
A new bus type "dfl" is introduced for private features which are not
initialized by DFL feature drivers (dfl-fme & dfl-afu drivers). So these
private features could be handled by separate driver modules.
DFL framework will create DFL devices on enumeration. DFL drivers could
be registered on this bus to match these DFL devices. They are matched by
dfl type & feature_id.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
----
v2: add struct dfl_fpga_cdev for struct dfl_sub_device
add driver_data for dflsub_device_id
v4: rename dflsub bus to dfl bus
dfl device naming change, delete .feature_id, dev_name(parent dev) +
feature_index is enough to make every name unique.
Add "type" & "feature_id" sysfs attrs for dfl devices.
Add documentation sys-bus-dfl for dfl device sysfs attrs
In order to support MODULE_DEVICE_TABLE() for dfl device driver, this patch moves struct dfl_device_id to mod_devicetable.h Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> ---- v4: rename dflsub bus to dfl bus
Device Feature List (DFL) is a linked list of feature headers within the device MMIO space. It is used by FPGA to enumerate multiple sub features within it. Each feature can be uniquely identified by DFL type and feature id, which can be read out from feature headers. A dfl bus helps DFL framework modularize DFL device drivers for different sub features. The dfl bus matches its devices and drivers by DFL type and feature id. This patch add dfl bus support to MODULE_DEVICE_TABLE() by adding info about struct dfl_device_id in devicetable-offsets.c and add a dfl entry point in file2alias.c. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> ---- v4: rename dflsub to dfl
Add PCIe Device ID for Intel FPGA PAC N3000. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
Add support for 32bit width data register, then it supports 32bit data width spi slave device and spi transfers. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
This patch introduced SPI core parameters in platform data, it allows passing these SPI core parameters via platform data. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
This patch introduces platform data for slave information, it allows spi-altera to add new spi devices once master registration is done. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
This patch adds support for regmap. It allows this driver to be compatible if low layer register access method is changed in some cases. Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
This allows other driver to reuse the name string for spi-altera platform device creation. Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
The patch moves dfl-bus related APIs to include/linux/fpga/dfl-bus.h Now the DFL sub feature drivers could be made as independent modules and put in different folders according to their functionality. In order for scattered sub feature drivers to include dfl bus APIs, move the dfl bus APIs to a new header file in the public folder. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com> Reviewed-by: Russ Weight <russell.h.weight@intel.com>
This patch adds support for the nios handshake private feature on Intel N3000 FPGA Card. This private feature provides a handshake interface to FPGA NIOS firmware, which receives retimer configuration command from host and executes via an internal SPI master. When nios finished the configuration, host takes over the ownership of the SPI master to control an Intel MAX10 BMC Chip on the SPI bus. For NIOS firmware handshake part, this driver requests the retimer configuration for NIOS with parameters from module param, and adds some sysfs nodes for user to query NIOS state. For SPI part, this driver adds a spi-altera platform device as well as the MAX10 BMC spi slave info. A spi-altera driver will be matched to handle following the SPI work. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com> Reviewed-by: Russ Weight <russell.h.weight@intel.com>
This patch implements the basic functions of the BMC chip for some Intel FPGA PCIe Acceleration Cards (PAC). The BMC is implemented using the intel max10 CPLD. This BMC chip is connected to FPGA by a SPI bus. To provide reliable register access from FPGA, an Avalon Memory-Mapped (Avmm) transaction protocol over the SPI bus is used between host and slave. This driver implements the basic register access with the regmap framework. The mfd cells array is empty now as a placeholder. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com> Reviewed-by: Russ Weight <russell.h.weight@intel.com>
The spi-altera driver was originally written with a 32 bit processor, where sizeof(unsigned long) is 4. On a 64 bit processor sizeof(unsigned long) is 8. Change the structure member to u32 to match the actual size of the control register. Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com> Reviewed-by: Xu Yilun <yilun.xu@intel.com>
This is to fix lkp cppcheck warnings:
drivers/fpga/dfl-pci.c:230:6: warning: The scope of the variable 'ret' can be reduced. [variableScope]
int ret = 0;
^
drivers/fpga/dfl-pci.c:230:10: warning: Variable 'ret' is assigned a value that is never used. [unreadVariable]
int ret = 0;
^
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
When putting the port in reset, driver must wait for the soft reset acknowledgment bit instead of the soft reset bit. Fixes: 47c1b19 (fpga: dfl: afu: add port ops support) Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com>
This patch adds hwmon functionality for Intel MAX10 BMC chip. The MAX10 BMC chip connects to a set of sensor chips to monitor current, voltage, thermal and power of different components on board. BMC firmware is responsible for sensor data sampling and recording in shared registers. Host driver reads the sensor data from these shared registers and exposes them to users as hwmon interfaces. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Reviewed-by: Wu Hao <hao.wu@intel.com>
Create two sysfs entries for exposing the MAC address and count from the MAX10 BMC register space. Signed-off-by: Russ Weight <russell.h.weight@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com>
bda2dd8 to
6205d51
Compare
rweight
pushed a commit
that referenced
this pull request
Mar 31, 2021
…t context commit 5ae5fbd upstream. Running "perf mem record" in powerpc platforms with selinux enabled resulted in soft lockup's. Below call-trace was seen in the logs: CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 NIP: c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000 REGS: c000007fffab7d60 TRAP: 0100 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x94/0x120 LR _raw_spin_lock_irqsave+0x90/0x120 Call Trace: 0xc00000000fd47260 (unreliable) skb_queue_tail+0x3c/0x90 audit_log_end+0x6c/0x180 common_lsm_audit+0xb0/0xe0 slow_avc_audit+0xa4/0x110 avc_has_perm+0x1c4/0x260 selinux_perf_event_open+0x74/0xd0 security_perf_event_open+0x68/0xc0 record_and_restart+0x6e8/0x7f0 perf_event_interrupt+0x22c/0x560 performance_monitor_exception0x4c/0x60 performance_monitor_common_virt+0x1c8/0x1d0 interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120 NIP: c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0 REGS: c00000000fd47860 TRAP: 0f00 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x38/0x120 LR skb_queue_tail+0x3c/0x90 interrupt: f00 0x38 (unreliable) 0xc00000000aae6200 audit_log_end+0x6c/0x180 audit_log_exit+0x344/0xf80 __audit_syscall_exit+0x2c0/0x320 do_syscall_trace_leave+0x148/0x200 syscall_exit_prepare+0x324/0x390 system_call_common+0xfc/0x27c The above trace shows that while the CPU was handling a performance monitor exception, there was a call to security_perf_event_open() function. In powerpc core-book3s, this function is called from perf_allow_kernel() check during recording of data address in the sample via perf_get_data_addr(). Commit da97e18 ("perf_event: Add support for LSM and SELinux checks") introduced security enhancements to perf. As part of this commit, the new security hook for perf_event_open() was added in all places where perf paranoid check was previously used. In powerpc core-book3s code, originally had paranoid checks in perf_get_data_addr() and power_pmu_bhrb_read(). So perf_paranoid_kernel() checks were replaced with perf_allow_kernel() in these PMU helper functions as well. The intention of paranoid checks in core-book3s was to verify privilege access before capturing some of the sample data. Along with paranoid checks, perf_allow_kernel() also does a security_perf_event_open(). Since these functions are accessed while recording a sample, we end up calling selinux_perf_event_open() in PMI context. Some of the security functions use spinlock like sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and if we end up in calling selinux hook functions in PMI handler, this could cause a dead lock. Since the purpose of this security hook is to control access to perf_event_open(), it is not right to call this in interrupt context. The paranoid checks in powerpc core-book3s were done at interrupt time which is also not correct. Reference commits: Commit cd1231d ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Commit bb19af8 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") We only allow creation of events that have already passed the privilege checks in perf_event_open(). So these paranoid checks are not needed at event time. As a fix, patch uses 'event->attr.exclude_kernel' check to prevent exposing kernel address for userspace only sampling. Fixes: cd1231d ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Cc: stable@vger.kernel.org # v4.17+ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rweight
pushed a commit
that referenced
this pull request
Mar 31, 2021
commit a7889c6 upstream. afs_listxattr() lists all the available special afs xattrs (i.e. those in the "afs.*" space), no matter what type of server we're dealing with. But OpenAFS servers, for example, cannot deal with some of the extra-capable attributes that AuriStor (YFS) servers provide. Unfortunately, the presence of the afs.yfs.* attributes causes errors[1] for anything that tries to read them if the server is of the wrong type. Fix the problem by removing afs_listxattr() so that none of the special xattrs are listed (AFS doesn't support xattrs). It does mean, however, that getfattr won't list them, though they can still be accessed with getxattr() and setxattr(). This can be tested with something like: getfattr -d -m ".*" /afs/example.com/path/to/file With this change, none of the afs.* attributes should be visible. Changes: ver #2: - Hide all of the afs.* xattrs, not just the ACL ones. Fixes: ae46578 ("afs: Get YFS ACLs and information through xattrs") Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1] Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1 Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rweight
pushed a commit
that referenced
this pull request
Mar 31, 2021
commit c7de87f upstream. [ This problem is in mainline, but only rt has the chops to be able to detect it. ] Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists. This is due to the definition of spin_lock_bh on rt: local_bh_disable(); rt_spin_lock(lock); which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a: spin_lock_bh(&serv->sv_lock); but there is one of the form: spin_lock(&serv->sv_lock); This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions. [ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] ====================================================== [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] ------------------------------------------------------ [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}: [ 109.442421] rt_spin_lock+0x2b/0xc0 [ 109.442428] svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430] svc_addsock+0x135/0x220 [ 109.442434] write_ports+0x4b3/0x620 [ 109.442438] nfsctl_transaction_write+0x45/0x80 [ 109.442440] vfs_write+0xff/0x420 [ 109.442444] ksys_write+0x4f/0xc0 [ 109.442446] do_syscall_64+0x33/0x40 [ 109.442450] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457] __lock_acquire+0x1264/0x20b0 [ 109.442463] lock_acquire+0xc2/0x400 [ 109.442466] rt_spin_lock+0x2b/0xc0 [ 109.442469] __local_bh_disable_ip+0xd9/0x270 [ 109.442471] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474] svc_close_list+0x60/0x90 [ 109.442476] svc_close_net+0x49/0x1a0 [ 109.442478] svc_shutdown_net+0x12/0x40 [ 109.442480] nfsd_destroy+0xc5/0x180 [ 109.442482] nfsd+0x1bc/0x270 [ 109.442483] kthread+0x194/0x1b0 [ 109.442487] ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493] CPU0 CPU1 [ 109.442494] ---- ---- [ 109.442495] lock(&serv->sv_lock); [ 109.442496] lock((softirq_ctrl.lock).lock); [ 109.442498] lock(&serv->sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status monitor for NFSv2/3 locking.. Fixes: 719f8bc ("svcrpc: fix xpt_list traversal locking on shutdown") Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rweight
pushed a commit
that referenced
this pull request
Mar 31, 2021
[ Upstream commit 4deb550 ] label err_eni_release is reachable when eni_start() fail. In eni_start() it calls dev->phy->start() in the last step, if start() fail we don't need to call phy->stop(), if start() is never called, we neither need to call phy->stop(), otherwise null-ptr-deref will happen. In order to fix this issue, don't call phy->stop() in label err_eni_release [ 4.875714] ================================================================== [ 4.876091] BUG: KASAN: null-ptr-deref in suni_stop+0x47/0x100 [suni] [ 4.876433] Read of size 8 at addr 0000000000000030 by task modprobe/95 [ 4.876778] [ 4.876862] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b49040c7 #2 [ 4.877290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd94 [ 4.877876] Call Trace: [ 4.878009] dump_stack+0x7d/0xa3 [ 4.878191] kasan_report.cold+0x10c/0x10e [ 4.878410] ? __slab_free+0x2f0/0x340 [ 4.878612] ? suni_stop+0x47/0x100 [suni] [ 4.878832] suni_stop+0x47/0x100 [suni] [ 4.879043] eni_do_release+0x3b/0x70 [eni] [ 4.879269] eni_init_one.cold+0x1152/0x1747 [eni] [ 4.879528] ? _raw_spin_lock_irqsave+0x7b/0xd0 [ 4.879768] ? eni_ioctl+0x270/0x270 [eni] [ 4.879990] ? __mutex_lock_slowpath+0x10/0x10 [ 4.880226] ? eni_ioctl+0x270/0x270 [eni] [ 4.880448] local_pci_probe+0x6f/0xb0 [ 4.880650] pci_device_probe+0x171/0x240 [ 4.880864] ? pci_device_remove+0xe0/0xe0 [ 4.881086] ? kernfs_create_link+0xb6/0x110 [ 4.881315] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0 [ 4.881594] really_probe+0x161/0x420 [ 4.881791] driver_probe_device+0x6d/0xd0 [ 4.882010] device_driver_attach+0x82/0x90 [ 4.882233] ? device_driver_attach+0x90/0x90 [ 4.882465] __driver_attach+0x60/0x100 [ 4.882671] ? device_driver_attach+0x90/0x90 [ 4.882903] bus_for_each_dev+0xe1/0x140 [ 4.883114] ? subsys_dev_iter_exit+0x10/0x10 [ 4.883346] ? klist_node_init+0x61/0x80 [ 4.883557] bus_add_driver+0x254/0x2a0 [ 4.883764] driver_register+0xd3/0x150 [ 4.883971] ? 0xffffffffc0038000 [ 4.884149] do_one_initcall+0x84/0x250 [ 4.884355] ? trace_event_raw_event_initcall_finish+0x150/0x150 [ 4.884674] ? unpoison_range+0xf/0x30 [ 4.884875] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.885150] ? unpoison_range+0xf/0x30 [ 4.885352] ? unpoison_range+0xf/0x30 [ 4.885557] do_init_module+0xf8/0x350 [ 4.885760] load_module+0x3fe6/0x4340 [ 4.885960] ? vm_unmap_ram+0x1d0/0x1d0 [ 4.886166] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.886441] ? module_frob_arch_sections+0x20/0x20 [ 4.886697] ? __do_sys_finit_module+0x108/0x170 [ 4.886941] __do_sys_finit_module+0x108/0x170 [ 4.887178] ? __ia32_sys_init_module+0x40/0x40 [ 4.887419] ? file_open_root+0x200/0x200 [ 4.887634] ? do_sys_open+0x85/0xe0 [ 4.887826] ? filp_open+0x50/0x50 [ 4.888009] ? fpregs_assert_state_consistent+0x4d/0x60 [ 4.888287] ? exit_to_user_mode_prepare+0x2f/0x130 [ 4.888547] do_syscall_64+0x33/0x40 [ 4.888739] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4.889010] RIP: 0033:0x7ff62fcf1cf7 [ 4.889202] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f71 [ 4.890172] RSP: 002b:00007ffe6644ade8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 4.890570] RAX: ffffffffffffffda RBX: 0000000000f2ca70 RCX: 00007ff62fcf1cf7 [ 4.890944] RDX: 0000000000000000 RSI: 0000000000f2b9e0 RDI: 0000000000000003 [ 4.891318] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001 [ 4.891691] R10: 00007ff62fd55300 R11: 0000000000000246 R12: 0000000000f2b9e0 [ 4.892064] R13: 0000000000000000 R14: 0000000000f2bdd0 R15: 0000000000000001 [ 4.892439] ================================================================== Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
17994ae to
827c98d
Compare
73038db to
7279a7d
Compare
rweight
pushed a commit
that referenced
this pull request
Apr 28, 2021
commit 90bd070 upstream. The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc #1 schedule at ffffffff81866de6 #2 inode_dio_wait at ffffffff812a2d04 #3 ocfs2_setattr at ffffffffc05f322e [ocfs2] #4 notify_change at ffffffff812a5a09 #5 do_truncate at ffffffff812808f5 #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 #7 sys_ftruncate at ffffffff81280d8e #8 do_syscall_64 at ffffffff81003949 #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc #1 schedule+54 at ffffffff81866de6 #2 rwsem_down_write_failed+536 at ffffffff8186aa28 #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 #4 down_write+45 at ffffffff81869c9d #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] #7 dio_complete+140 at ffffffff812c873c #8 dio_aio_complete_work+25 at ffffffff812c89f9 #9 process_one_work+361 at ffffffff810b1889 #10 worker_thread+77 at ffffffff810b233d #11 kthread+261 at ffffffff810b7fd5 #12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rweight
pushed a commit
that referenced
this pull request
May 20, 2021
An out of bounds write happens when setting the default power state. KASAN sees this as: [drm] radeon: 512M of GTT memory ready. [drm] GART: num cpu pages 131072, num gpu pages 131072 ================================================================== BUG: KASAN: slab-out-of-bounds in radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon] Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157 CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 #50 Hardware name: eMachines eMachines E620 /Nile , BIOS V1.03 09/30/2008 Call Trace: dump_stack+0xa5/0xe6 print_address_description.constprop.0+0x18/0x239 kasan_report+0x170/0x1a8 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon] radeon_atombios_get_power_modes+0x144/0x1888 [radeon] radeon_pm_init+0x1019/0x1904 [radeon] rs690_init+0x76e/0x84a [radeon] radeon_device_init+0x1c1a/0x21e5 [radeon] radeon_driver_load_kms+0xf5/0x30b [radeon] drm_dev_register+0x255/0x4a0 [drm] radeon_pci_probe+0x246/0x2f6 [radeon] pci_device_probe+0x1aa/0x294 really_probe+0x30e/0x850 driver_probe_device+0xe6/0x135 device_driver_attach+0xc1/0xf8 __driver_attach+0x13f/0x146 bus_for_each_dev+0xfa/0x146 bus_add_driver+0x2b3/0x447 driver_register+0x242/0x2c1 do_one_initcall+0x149/0x2fd do_init_module+0x1ae/0x573 load_module+0x4dee/0x5cca __do_sys_finit_module+0xf1/0x140 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Without KASAN, this will manifest later when the kernel attempts to allocate memory that was stomped, since it collides with the inline slab freelist pointer: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G W 5.10.12-gentoo-E620 #2 Hardware name: eMachines eMachines E620 /Nile , BIOS V1.03 09/30/2008 RIP: 0010:kfree+0x115/0x230 Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7 RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246 RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80 RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000 R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100 FS: 00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0 Call Trace: __free_fdtable+0x16/0x1f put_files_struct+0x81/0x9b do_exit+0x433/0x94d do_group_exit+0xa6/0xa6 __x64_sys_exit_group+0xf/0xf do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe80ef64bea Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0. RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0 R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0 Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ... Use a valid power_state index when initializing the "flags" and "misc" and "misc2" fields. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537 Reported-by: Erhard F. <erhard_f@mailbox.org> Fixes: a48b9b4 ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)") Fixes: 79daedc ("drm/radeon/kms: minor pm cleanups") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
rweight
pushed a commit
that referenced
this pull request
May 20, 2021
We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
rweight
pushed a commit
that referenced
this pull request
Jun 17, 2021
[ Upstream commit b5332a9 ] We are not changing anything in the TCP connection state so we should not take a write_lock but rather a read lock. This caused a deadlock when running nvmet-tcp and nvme-tcp on the same system, where state_change callbacks on the host and on the controller side have causal relationship and made lockdep report on this with blktests: ================================ WARNING: inconsistent lock state 5.12.0-rc3 #1 Tainted: G I -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage. nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp] {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x79b/0x18d0 lock_acquire+0x1ca/0x480 _raw_write_lock_bh+0x39/0x80 nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp] tcp_fin+0x2a8/0x780 tcp_data_queue+0xf94/0x1f20 tcp_rcv_established+0x6ba/0x1f00 tcp_v4_do_rcv+0x502/0x760 tcp_v4_rcv+0x257e/0x3430 ip_protocol_deliver_rcu+0x69/0x6a0 ip_local_deliver_finish+0x1e2/0x2f0 ip_local_deliver+0x1a2/0x420 ip_rcv+0x4fb/0x6b0 __netif_receive_skb_one_core+0x162/0x1b0 process_backlog+0x1ff/0x770 __napi_poll.constprop.0+0xa9/0x5c0 net_rx_action+0x7b3/0xb30 __do_softirq+0x1f0/0x940 do_softirq+0xa1/0xd0 __local_bh_enable_ip+0xd8/0x100 ip_finish_output2+0x6b7/0x18a0 __ip_queue_xmit+0x706/0x1aa0 __tcp_transmit_skb+0x2068/0x2e20 tcp_write_xmit+0xc9e/0x2bb0 __tcp_push_pending_frames+0x92/0x310 inet_shutdown+0x158/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 10687 hardirqs last enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40 hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90 softirqs last enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940 softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(clock-AF_INET); <Interrupt> lock(clock-AF_INET); *** DEADLOCK *** 5 locks held by nvme/1324: #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp] #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300 stack backtrace: CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Call Trace: dump_stack+0x93/0xc2 mark_lock_irq.cold+0x2c/0xb3 ? verify_lock_unused+0x390/0x390 ? stack_trace_consume_entry+0x160/0x160 ? lock_downgrade+0x100/0x100 ? save_trace+0x88/0x5e0 ? _raw_spin_unlock_irqrestore+0x2d/0x40 mark_lock+0x530/0x1470 ? mark_lock_irq+0x1d10/0x1d10 ? enqueue_timer+0x660/0x660 mark_usage+0x215/0x2a0 __lock_acquire+0x79b/0x18d0 ? tcp_schedule_loss_probe.part.0+0x38c/0x520 lock_acquire+0x1ca/0x480 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? rcu_read_unlock+0x40/0x40 ? tcp_mtu_probe+0x1ae0/0x1ae0 ? kmalloc_reserve+0xa0/0xa0 ? sysfs_file_ops+0x170/0x170 _raw_read_lock+0x3d/0xa0 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? sysfs_file_ops+0x170/0x170 inet_shutdown+0x189/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 ? new_sync_read+0x600/0x600 ? lock_acquire+0x1ca/0x480 ? rcu_read_unlock+0x40/0x40 ? lock_is_held_type+0x9a/0x110 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 ? __ia32_sys_read+0xa0/0xa0 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340 ? syscall_enter_from_user_mode+0x27/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 872d26a ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
rweight
pushed a commit
that referenced
this pull request
Jun 17, 2021
[ Upstream commit 1748696 ] Commit eab2404 ("Bluetooth: Add BT_PHY socket option") added a dependency between socket lock and hci_dev->lock that could lead to deadlock. It turns out that hci_conn_get_phy() is not in any way relying on hdev being immutable during the runtime of this function, neither does it even look at any of the members of hdev, and as such there is no need to hold that lock. This fixes the lockdep splat below: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc1-00026-g73d464503354 #10 Not tainted ------------------------------------------------------ bluetoothd/1118 is trying to acquire lock: ffff8f078383c078 (&hdev->lock){+.+.}-{3:3}, at: hci_conn_get_phy+0x1c/0x150 [bluetooth] but task is already holding lock: ffff8f07e831d920 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}, at: l2cap_sock_getsockopt+0x8b/0x610 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}: lock_sock_nested+0x72/0xa0 l2cap_sock_ready_cb+0x18/0x70 [bluetooth] l2cap_config_rsp+0x27a/0x520 [bluetooth] l2cap_sig_channel+0x658/0x1330 [bluetooth] l2cap_recv_frame+0x1ba/0x310 [bluetooth] hci_rx_work+0x1cc/0x640 [bluetooth] process_one_work+0x244/0x5f0 worker_thread+0x3c/0x380 kthread+0x13e/0x160 ret_from_fork+0x22/0x30 -> #2 (&chan->lock#2/1){+.+.}-{3:3}: __mutex_lock+0xa3/0xa10 l2cap_chan_connect+0x33a/0x940 [bluetooth] l2cap_sock_connect+0x141/0x2a0 [bluetooth] __sys_connect+0x9b/0xc0 __x64_sys_connect+0x16/0x20 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&conn->chan_lock){+.+.}-{3:3}: __mutex_lock+0xa3/0xa10 l2cap_chan_connect+0x322/0x940 [bluetooth] l2cap_sock_connect+0x141/0x2a0 [bluetooth] __sys_connect+0x9b/0xc0 __x64_sys_connect+0x16/0x20 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&hdev->lock){+.+.}-{3:3}: __lock_acquire+0x147a/0x1a50 lock_acquire+0x277/0x3d0 __mutex_lock+0xa3/0xa10 hci_conn_get_phy+0x1c/0x150 [bluetooth] l2cap_sock_getsockopt+0x5a9/0x610 [bluetooth] __sys_getsockopt+0xcc/0x200 __x64_sys_getsockopt+0x20/0x30 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &hdev->lock --> &chan->lock#2/1 --> sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); lock(&chan->lock#2/1); lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by bluetoothd/1118: #0: ffff8f07e831d920 (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}-{0:0}, at: l2cap_sock_getsockopt+0x8b/0x610 [bluetooth] stack backtrace: CPU: 3 PID: 1118 Comm: bluetoothd Not tainted 5.12.0-rc1-00026-g73d464503354 #10 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: dump_stack+0x7f/0xa1 check_noncircular+0x105/0x120 ? __lock_acquire+0x147a/0x1a50 __lock_acquire+0x147a/0x1a50 lock_acquire+0x277/0x3d0 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] ? __lock_acquire+0x2e1/0x1a50 ? lock_is_held_type+0xb4/0x120 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] __mutex_lock+0xa3/0xa10 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] ? lock_acquire+0x277/0x3d0 ? mark_held_locks+0x49/0x70 ? mark_held_locks+0x49/0x70 ? hci_conn_get_phy+0x1c/0x150 [bluetooth] hci_conn_get_phy+0x1c/0x150 [bluetooth] l2cap_sock_getsockopt+0x5a9/0x610 [bluetooth] __sys_getsockopt+0xcc/0x200 __x64_sys_getsockopt+0x20/0x30 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fb73df33eee Code: 48 8b 0d 85 0f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 0f 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fffcfbbbf08 EFLAGS: 00000203 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007fb73df33eee RDX: 000000000000000e RSI: 0000000000000112 RDI: 0000000000000018 RBP: 0000000000000000 R08: 00007fffcfbbbf44 R09: 0000000000000000 R10: 00007fffcfbbbf3c R11: 0000000000000203 R12: 0000000000000000 R13: 0000000000000018 R14: 0000000000000000 R15: 0000556fcefc70d0 Fixes: eab2404 ("Bluetooth: Add BT_PHY socket option") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
rweight
pushed a commit
that referenced
this pull request
Jun 17, 2021
[ Upstream commit bbd6f0a ] In bnxt_rx_pkt(), the RX buffers are expected to complete in order. If the RX consumer index indicates an out of order buffer completion, it means we are hitting a hardware bug and the driver will abort all remaining RX packets and reset the RX ring. The RX consumer index that we pass to bnxt_discard_rx() is not correct. We should be passing the current index (tmp_raw_cons) instead of the old index (raw_cons). This bug can cause us to be at the wrong index when trying to abort the next RX packet. It can crash like this: #0 [ffff9bbcdf5c39a8] machine_kexec at ffffffff9b05e007 #1 [ffff9bbcdf5c3a00] __crash_kexec at ffffffff9b111232 #2 [ffff9bbcdf5c3ad0] panic at ffffffff9b07d61e #3 [ffff9bbcdf5c3b50] oops_end at ffffffff9b030978 #4 [ffff9bbcdf5c3b78] no_context at ffffffff9b06aaf0 #5 [ffff9bbcdf5c3bd8] __bad_area_nosemaphore at ffffffff9b06ae2e #6 [ffff9bbcdf5c3c28] bad_area_nosemaphore at ffffffff9b06af24 #7 [ffff9bbcdf5c3c38] __do_page_fault at ffffffff9b06b67e #8 [ffff9bbcdf5c3cb0] do_page_fault at ffffffff9b06bb12 #9 [ffff9bbcdf5c3ce0] page_fault at ffffffff9bc015c5 [exception RIP: bnxt_rx_pkt+237] RIP: ffffffffc0259cdd RSP: ffff9bbcdf5c3d98 RFLAGS: 00010213 RAX: 000000005dd8097f RBX: ffff9ba4cb11b7e0 RCX: ffffa923cf6e9000 RDX: 0000000000000fff RSI: 0000000000000627 RDI: 0000000000001000 RBP: ffff9bbcdf5c3e60 R8: 0000000000420003 R9: 000000000000020d R10: ffffa923cf6ec138 R11: ffff9bbcdf5c3e83 R12: ffff9ba4d6f928c0 R13: ffff9ba4cac28080 R14: ffff9ba4cb11b7f0 R15: ffff9ba4d5a30000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Fixes: a1b0e4e ("bnxt_en: Improve RX consumer index validity check.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
rweight
pushed a commit
that referenced
this pull request
Jun 17, 2021
[ Upstream commit 5bbf219 ] An out of bounds write happens when setting the default power state. KASAN sees this as: [drm] radeon: 512M of GTT memory ready. [drm] GART: num cpu pages 131072, num gpu pages 131072 ================================================================== BUG: KASAN: slab-out-of-bounds in radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon] Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157 CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 #50 Hardware name: eMachines eMachines E620 /Nile , BIOS V1.03 09/30/2008 Call Trace: dump_stack+0xa5/0xe6 print_address_description.constprop.0+0x18/0x239 kasan_report+0x170/0x1a8 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon] radeon_atombios_get_power_modes+0x144/0x1888 [radeon] radeon_pm_init+0x1019/0x1904 [radeon] rs690_init+0x76e/0x84a [radeon] radeon_device_init+0x1c1a/0x21e5 [radeon] radeon_driver_load_kms+0xf5/0x30b [radeon] drm_dev_register+0x255/0x4a0 [drm] radeon_pci_probe+0x246/0x2f6 [radeon] pci_device_probe+0x1aa/0x294 really_probe+0x30e/0x850 driver_probe_device+0xe6/0x135 device_driver_attach+0xc1/0xf8 __driver_attach+0x13f/0x146 bus_for_each_dev+0xfa/0x146 bus_add_driver+0x2b3/0x447 driver_register+0x242/0x2c1 do_one_initcall+0x149/0x2fd do_init_module+0x1ae/0x573 load_module+0x4dee/0x5cca __do_sys_finit_module+0xf1/0x140 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Without KASAN, this will manifest later when the kernel attempts to allocate memory that was stomped, since it collides with the inline slab freelist pointer: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G W 5.10.12-gentoo-E620 #2 Hardware name: eMachines eMachines E620 /Nile , BIOS V1.03 09/30/2008 RIP: 0010:kfree+0x115/0x230 Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7 RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246 RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80 RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000 R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100 FS: 00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0 Call Trace: __free_fdtable+0x16/0x1f put_files_struct+0x81/0x9b do_exit+0x433/0x94d do_group_exit+0xa6/0xa6 __x64_sys_exit_group+0xf/0xf do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe80ef64bea Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0. RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0 R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0 Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ... Use a valid power_state index when initializing the "flags" and "misc" and "misc2" fields. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537 Reported-by: Erhard F. <erhard_f@mailbox.org> Fixes: a48b9b4 ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)") Fixes: 79daedc ("drm/radeon/kms: minor pm cleanups") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
rweight
pushed a commit
that referenced
this pull request
Jun 17, 2021
[ Upstream commit cf7b39a ] We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
rweight
pushed a commit
that referenced
this pull request
Jul 26, 2021
Pach series "mm: thp: use generic THP migration for NUMA hinting fault", v3. When the THP NUMA fault support was added THP migration was not supported yet. So the ad hoc THP migration was implemented in NUMA fault handling. Since v4.14 THP migration has been supported so it doesn't make too much sense to still keep another THP migration implementation rather than using the generic migration code. It is definitely a maintenance burden to keep two THP migration implementation for different code paths and it is more error prone. Using the generic THP migration implementation allows us remove the duplicate code and some hacks needed by the old ad hoc implementation. A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP and NUMA balancing. The most of them support THP migration except for S390. Zi Yan tried to add THP migration support for S390 before but it was not accepted due to the design of S390 PMD. For the discussion, please see: https://lkml.org/lkml/2018/4/27/953. Per the discussion with Gerald Schaefer in v1 it is acceptible to skip huge PMD for S390 for now. I saw there were some hacks about gup from git history, but I didn't figure out if they have been removed or not since I just found FOLL_NUMA code in the current gup implementation and they seems useful. Patch #1 ~ #2 are preparation patches. Patch #3 is the real meat. Patch #4 ~ #6 keep consistent counters and behaviors with before. Patch #7 skips change huge PMD to prot_none if thp migration is not supported. Test ---- Did some tests to measure the latency of do_huge_pmd_numa_page. The test VM has 80 vcpus and 64G memory. The test would create 2 processes to consume 128G memory together which would incur memory pressure to cause THP splits. And it also creates 80 processes to hog cpu, and the memory consumer processes are bound to different nodes periodically in order to increase NUMA faults. The below test script is used: echo 3 > /proc/sys/vm/drop_caches # Run stress-ng for 24 hours ./stress-ng/stress-ng --vm 2 --vm-bytes 64G --timeout 24h & PID=$! ./stress-ng/stress-ng --cpu $NR_CPUS --timeout 24h & # Wait for vm stressors forked sleep 5 PID_1=`pgrep -P $PID | awk 'NR == 1'` PID_2=`pgrep -P $PID | awk 'NR == 2'` JOB1=`pgrep -P $PID_1` JOB2=`pgrep -P $PID_2` # Bind load jobs to different nodes periodically to force generate # cross node memory access while [ -d "/proc/$PID" ] do taskset -apc 8 $JOB1 taskset -apc 8 $JOB2 sleep 300 taskset -apc 58 $JOB1 taskset -apc 58 $JOB2 sleep 300 done With the above test the histogram of latency of do_huge_pmd_numa_page is as shown below. Since the number of do_huge_pmd_numa_page varies drastically for each run (should be due to scheduler), so I converted the raw number to percentage. patched base @us[stress-ng]: [0] 3.57% 0.16% [1] 55.68% 18.36% [2, 4) 10.46% 40.44% [4, 8) 7.26% 17.82% [8, 16) 21.12% 13.41% [16, 32) 1.06% 4.27% [32, 64) 0.56% 4.07% [64, 128) 0.16% 0.35% [128, 256) < 0.1% < 0.1% [256, 512) < 0.1% < 0.1% [512, 1K) < 0.1% < 0.1% [1K, 2K) < 0.1% < 0.1% [2K, 4K) < 0.1% < 0.1% [4K, 8K) < 0.1% < 0.1% [8K, 16K) < 0.1% < 0.1% [16K, 32K) < 0.1% < 0.1% [32K, 64K) < 0.1% < 0.1% Per the result, patched kernel is even slightly better than the base kernel. I think this is because the lock contention against THP split is less than base kernel due to the refactor. To exclude the affect from THP split, I also did test w/o memory pressure. No obvious regression is spotted. The below is the test result *w/o* memory pressure. patched base @us[stress-ng]: [0] 7.97% 18.4% [1] 69.63% 58.24% [2, 4) 4.18% 2.63% [4, 8) 0.22% 0.17% [8, 16) 1.03% 0.92% [16, 32) 0.14% < 0.1% [32, 64) < 0.1% < 0.1% [64, 128) < 0.1% < 0.1% [128, 256) < 0.1% < 0.1% [256, 512) 0.45% 1.19% [512, 1K) 15.45% 17.27% [1K, 2K) < 0.1% < 0.1% [2K, 4K) < 0.1% < 0.1% [4K, 8K) < 0.1% < 0.1% [8K, 16K) 0.86% 0.88% [16K, 32K) < 0.1% 0.15% [32K, 64K) < 0.1% < 0.1% [64K, 128K) < 0.1% < 0.1% [128K, 256K) < 0.1% < 0.1% The series also survived a series of tests that exercise NUMA balancing migrations by Mel. This patch (of 7): Add orig_pmd to struct vm_fault so the "orig_pmd" parameter used by huge page fault could be removed, just like its PTE counterpart does. Link: https://lkml.kernel.org/r/20210518200801.7413-1-shy828301@gmail.com Link: https://lkml.kernel.org/r/20210518200801.7413-2-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rweight
pushed a commit
that referenced
this pull request
Jul 26, 2021
Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2. Excessive details on MADV_POPULATE_(READ|WRITE) can be found in patch #2. This patch (of 5): Let's make the variable names in the function declaration match the variable names used in the definition. Link: https://lkml.kernel.org/r/20210419135443.12822-1-david@redhat.com Link: https://lkml.kernel.org/r/20210419135443.12822-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@surriel.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rweight
pushed a commit
that referenced
this pull request
Jul 26, 2021
ASan reports a memory leak caused by evlist not being deleted on exit in perf-report, perf-script and perf-data. The problem is caused by evlist->session not being deleted, which is allocated in perf_session__read_header, called in perf_session__new if perf_data is in read mode. In case of write mode, the session->evlist is filled by the caller. This patch solves the problem by calling evlist__delete in perf_session__delete if perf_data is in read mode. Changes in v2: - call evlist__delete from within perf_session__delete v1: https://lore.kernel.org/lkml/20210621234317.235545-1-rickyman7@gmail.com/ ASan report follows: $ ./perf script report flamegraph ================================================================= ==227640==ERROR: LeakSanitizer: detected memory leaks <SNIP unrelated> Indirect leak of 2704 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x7f999e in evlist__new /home/user/linux/tools/perf/util/evlist.c:77:26 #3 0x8ad938 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3797:20 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 568 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x80ce88 in evsel__new_idx /home/user/linux/tools/perf/util/evsel.c:268:24 #3 0x8aed93 in evsel__new /home/user/linux/tools/perf/util/evsel.h:210:9 #4 0x8ae07e in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3853:11 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 264 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbe3e70 in xyarray__new /home/user/linux/tools/lib/perf/xyarray.c:10:23 #3 0xbd7754 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:361:21 #4 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbd77e0 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:365:14 #3 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 7 byte(s) in 1 object(s) allocated from: #0 0x4b8207 in strdup (/home/user/linux/tools/perf/perf+0x4b8207) #1 0x8b4459 in evlist__set_event_name /home/user/linux/tools/perf/util/header.c:2292:16 #2 0x89d862 in process_event_desc /home/user/linux/tools/perf/util/header.c:2313:3 #3 0x8af319 in perf_file_section__process /home/user/linux/tools/perf/util/header.c:3651:9 #4 0x8aa6e9 in perf_header__process_sections /home/user/linux/tools/perf/util/header.c:3427:9 #5 0x8ae3e7 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3886:2 #6 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #7 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #8 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #9 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #10 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #11 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #12 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #13 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) SUMMARY: AddressSanitizer: 3728 byte(s) leaked in 7 allocation(s). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210624231926.212208-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
rweight
pushed a commit
that referenced
this pull request
Jul 26, 2021
On trogdor devices I see the following lockdep splat when stopping
youtube with lockdep enabled in the kernel.
======================================================
WARNING: possible circular locking dependency detected
5.13.0-rc2 #71 Not tainted
------------------------------------------------------
ThreadPoolSingl/3969 is trying to acquire lock:
ffffff80d4d5c080 (&inst->lock#3){+.+.}-{3:3}, at: vdec_buf_cleanup+0x3c/0x17c [venus_dec]
but task is already holding lock:
ffffff80d3c3c4f8 (&q->mmap_lock){+.+.}-{3:3}, at: vb2_core_reqbufs+0xe4/0x390 [videobuf2_common]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #5 (&q->mmap_lock){+.+.}-{3:3}:
__mutex_lock_common+0xcc/0xb88
mutex_lock_nested+0x5c/0x68
vb2_mmap+0xf4/0x290 [videobuf2_common]
v4l2_m2m_fop_mmap+0x44/0x50 [v4l2_mem2mem]
v4l2_mmap+0x5c/0xa4
mmap_region+0x310/0x5a4
do_mmap+0x348/0x43c
vm_mmap_pgoff+0xfc/0x178
ksys_mmap_pgoff+0x84/0xfc
__arm64_compat_sys_aarch32_mmap2+0x2c/0x38
invoke_syscall+0x54/0x110
el0_svc_common+0x88/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
-> #4 (&mm->mmap_lock){++++}-{3:3}:
__might_fault+0x60/0x88
filldir64+0x124/0x3a0
dcache_readdir+0x7c/0x1ec
iterate_dir+0xc4/0x184
__arm64_sys_getdents64+0x78/0x170
invoke_syscall+0x54/0x110
el0_svc_common+0xa8/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
-> #3 (&sb->s_type->i_mutex_key#3){++++}-{3:3}:
down_write+0x94/0x1f4
start_creating+0xb0/0x174
debugfs_create_dir+0x28/0x138
opp_debug_register+0x88/0xc0
_add_opp_dev+0x84/0x9c
_add_opp_table_indexed+0x16c/0x310
_of_add_table_indexed+0x70/0xb5c
dev_pm_opp_of_add_table_indexed+0x20/0x2c
of_genpd_add_provider_onecell+0xc4/0x1c8
rpmhpd_probe+0x21c/0x278
platform_probe+0xb4/0xd4
really_probe+0x140/0x35c
driver_probe_device+0x90/0xcc
__device_attach_driver+0xa4/0xc0
bus_for_each_drv+0x8c/0xd8
__device_attach+0xc4/0x150
device_initial_probe+0x20/0x2c
bus_probe_device+0x40/0xa4
device_add+0x22c/0x3fc
of_device_add+0x44/0x54
of_platform_device_create_pdata+0xb0/0xf4
of_platform_bus_create+0x1d0/0x350
of_platform_populate+0x80/0xd4
devm_of_platform_populate+0x64/0xb0
rpmh_rsc_probe+0x378/0x3dc
platform_probe+0xb4/0xd4
really_probe+0x140/0x35c
driver_probe_device+0x90/0xcc
__device_attach_driver+0xa4/0xc0
bus_for_each_drv+0x8c/0xd8
__device_attach+0xc4/0x150
device_initial_probe+0x20/0x2c
bus_probe_device+0x40/0xa4
device_add+0x22c/0x3fc
of_device_add+0x44/0x54
of_platform_device_create_pdata+0xb0/0xf4
of_platform_bus_create+0x1d0/0x350
of_platform_bus_create+0x21c/0x350
of_platform_populate+0x80/0xd4
of_platform_default_populate_init+0xb8/0xd4
do_one_initcall+0x1b4/0x400
do_initcall_level+0xa8/0xc8
do_initcalls+0x5c/0x9c
do_basic_setup+0x2c/0x38
kernel_init_freeable+0x1a4/0x1ec
kernel_init+0x20/0x118
ret_from_fork+0x10/0x30
-> #2 (gpd_list_lock){+.+.}-{3:3}:
__mutex_lock_common+0xcc/0xb88
mutex_lock_nested+0x5c/0x68
__genpd_dev_pm_attach+0x70/0x18c
genpd_dev_pm_attach_by_id+0xe4/0x158
genpd_dev_pm_attach_by_name+0x48/0x60
dev_pm_domain_attach_by_name+0x2c/0x38
dev_pm_opp_attach_genpd+0xac/0x160
vcodec_domains_get+0x94/0x14c [venus_core]
core_get_v4+0x150/0x188 [venus_core]
venus_probe+0x138/0x444 [venus_core]
platform_probe+0xb4/0xd4
really_probe+0x140/0x35c
driver_probe_device+0x90/0xcc
device_driver_attach+0x58/0x7c
__driver_attach+0xc8/0xe0
bus_for_each_dev+0x88/0xd4
driver_attach+0x30/0x3c
bus_add_driver+0x10c/0x1e0
driver_register+0x70/0x108
__platform_driver_register+0x30/0x3c
0xffffffde113e1044
do_one_initcall+0x1b4/0x400
do_init_module+0x64/0x1fc
load_module+0x17f4/0x1958
__arm64_sys_finit_module+0xb4/0xf0
invoke_syscall+0x54/0x110
el0_svc_common+0x88/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
-> #1 (&opp_table->genpd_virt_dev_lock){+.+.}-{3:3}:
__mutex_lock_common+0xcc/0xb88
mutex_lock_nested+0x5c/0x68
_set_required_opps+0x74/0x120
_set_opp+0x94/0x37c
dev_pm_opp_set_rate+0xa0/0x194
core_clks_set_rate+0x28/0x58 [venus_core]
load_scale_v4+0x228/0x2b4 [venus_core]
session_process_buf+0x160/0x198 [venus_core]
venus_helper_vb2_buf_queue+0xcc/0x130 [venus_core]
vdec_vb2_buf_queue+0xc4/0x140 [venus_dec]
__enqueue_in_driver+0x164/0x188 [videobuf2_common]
vb2_core_qbuf+0x13c/0x47c [videobuf2_common]
vb2_qbuf+0x88/0xec [videobuf2_v4l2]
v4l2_m2m_qbuf+0x84/0x15c [v4l2_mem2mem]
v4l2_m2m_ioctl_qbuf+0x24/0x30 [v4l2_mem2mem]
v4l_qbuf+0x54/0x68
__video_do_ioctl+0x2bc/0x3bc
video_usercopy+0x558/0xb04
video_ioctl2+0x24/0x30
v4l2_ioctl+0x58/0x68
v4l2_compat_ioctl32+0x84/0xa0
__arm64_compat_sys_ioctl+0x12c/0x140
invoke_syscall+0x54/0x110
el0_svc_common+0x88/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
-> #0 (&inst->lock#3){+.+.}-{3:3}:
__lock_acquire+0x248c/0x2d6c
lock_acquire+0x240/0x314
__mutex_lock_common+0xcc/0xb88
mutex_lock_nested+0x5c/0x68
vdec_buf_cleanup+0x3c/0x17c [venus_dec]
__vb2_queue_free+0x98/0x204 [videobuf2_common]
vb2_core_reqbufs+0x14c/0x390 [videobuf2_common]
vb2_reqbufs+0x58/0x74 [videobuf2_v4l2]
v4l2_m2m_reqbufs+0x58/0x90 [v4l2_mem2mem]
v4l2_m2m_ioctl_reqbufs+0x24/0x30 [v4l2_mem2mem]
v4l_reqbufs+0x58/0x6c
__video_do_ioctl+0x2bc/0x3bc
video_usercopy+0x558/0xb04
video_ioctl2+0x24/0x30
v4l2_ioctl+0x58/0x68
v4l2_compat_ioctl32+0x84/0xa0
__arm64_compat_sys_ioctl+0x12c/0x140
invoke_syscall+0x54/0x110
el0_svc_common+0x88/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
other info that might help us debug this:
Chain exists of:
&inst->lock#3 --> &mm->mmap_lock --> &q->mmap_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->mmap_lock);
lock(&mm->mmap_lock);
lock(&q->mmap_lock);
lock(&inst->lock#3);
*** DEADLOCK ***
1 lock held by ThreadPoolSingl/3969:
#0: ffffff80d3c3c4f8 (&q->mmap_lock){+.+.}-{3:3}, at: vb2_core_reqbufs+0xe4/0x390 [videobuf2_common]
stack backtrace:
CPU: 2 PID: 3969 Comm: ThreadPoolSingl Not tainted 5.13.0-rc2 #71
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x1b4
show_stack+0x24/0x30
dump_stack+0xe0/0x15c
print_circular_bug+0x32c/0x388
check_noncircular+0x138/0x140
__lock_acquire+0x248c/0x2d6c
lock_acquire+0x240/0x314
__mutex_lock_common+0xcc/0xb88
mutex_lock_nested+0x5c/0x68
vdec_buf_cleanup+0x3c/0x17c [venus_dec]
__vb2_queue_free+0x98/0x204 [videobuf2_common]
vb2_core_reqbufs+0x14c/0x390 [videobuf2_common]
vb2_reqbufs+0x58/0x74 [videobuf2_v4l2]
v4l2_m2m_reqbufs+0x58/0x90 [v4l2_mem2mem]
v4l2_m2m_ioctl_reqbufs+0x24/0x30 [v4l2_mem2mem]
v4l_reqbufs+0x58/0x6c
__video_do_ioctl+0x2bc/0x3bc
video_usercopy+0x558/0xb04
video_ioctl2+0x24/0x30
v4l2_ioctl+0x58/0x68
v4l2_compat_ioctl32+0x84/0xa0
__arm64_compat_sys_ioctl+0x12c/0x140
invoke_syscall+0x54/0x110
el0_svc_common+0x88/0xf0
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x24/0x34
el0_sync_compat_handler+0xc0/0xf0
el0_sync_compat+0x19c/0x1c0
The 'gpd_list_lock' is nominally named as such to protect the 'gpd_list'
from concurrent access and mutation. Unfortunately, holding that mutex
around various OPP framework calls leads to lockdep splats because now
we're doing various operations in OPP core such as registering with
debugfs while holding the list lock. We don't need to hold any list
mutex while we're calling into OPP, so let's shrink the locking area of
the 'gpd_list_lock' so that lockdep isn't triggered. This also helps
reduce contention on this lock, which probably doesn't matter much but
at least is nice to have.
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <linux-pm@vger.kernel.org>
Cc: Viresh Kumar <vireshk@kernel.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
rweight
pushed a commit
that referenced
this pull request
Jul 26, 2021
ASan reports a heap-buffer-overflow in elf_sec__is_text when using perf-top.
The bug is caused by the fact that secstrs is built from runtime_ss, while
shdr is built from syms_ss if shdr.sh_type != SHT_NOBITS. Therefore, they
point to two different ELF files.
This patch renames secstrs to secstrs_run and adds secstrs_sym, so that
the correct secstrs is chosen depending on shdr.sh_type.
$ ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 ./perf top
=================================================================
==363148==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300009add6 at pc 0x00000049875c bp 0x7f4f56446440 sp 0x7f4f56445bf0
READ of size 1 at 0x61300009add6 thread T6
#0 0x49875b in StrstrCheck(void*, char*, char const*, char const*) (/home/user/linux/tools/perf/perf+0x49875b)
#1 0x4d13a2 in strstr (/home/user/linux/tools/perf/perf+0x4d13a2)
#2 0xacae36 in elf_sec__is_text /home/user/linux/tools/perf/util/symbol-elf.c:176:9
#3 0xac3ec9 in elf_sec__filter /home/user/linux/tools/perf/util/symbol-elf.c:187:9
#4 0xac2c3d in dso__load_sym /home/user/linux/tools/perf/util/symbol-elf.c:1254:20
#5 0x883981 in dso__load /home/user/linux/tools/perf/util/symbol.c:1897:9
#6 0x8e6248 in map__load /home/user/linux/tools/perf/util/map.c:332:7
#7 0x8e66e5 in map__find_symbol /home/user/linux/tools/perf/util/map.c:366:6
#8 0x7f8278 in machine__resolve /home/user/linux/tools/perf/util/event.c:707:13
#9 0x5f3d1a in perf_event__process_sample /home/user/linux/tools/perf/builtin-top.c:773:6
#10 0x5f30e4 in deliver_event /home/user/linux/tools/perf/builtin-top.c:1197:3
#11 0x908a72 in do_flush /home/user/linux/tools/perf/util/ordered-events.c:244:9
#12 0x905fae in __ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:323:8
#13 0x9058db in ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:341:9
#14 0x5f19b1 in process_thread /home/user/linux/tools/perf/builtin-top.c:1109:7
#15 0x7f4f6a21a298 in start_thread /usr/src/debug/glibc-2.33-16.fc34.x86_64/nptl/pthread_create.c:481:8
#16 0x7f4f697d0352 in clone ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
0x61300009add6 is located 10 bytes to the right of 332-byte region [0x61300009ac80,0x61300009adcc)
allocated by thread T6 here:
#0 0x4f3f7f in malloc (/home/user/linux/tools/perf/perf+0x4f3f7f)
#1 0x7f4f6a0a88d9 (/lib64/libelf.so.1+0xa8d9)
Thread T6 created by T0 here:
#0 0x464856 in pthread_create (/home/user/linux/tools/perf/perf+0x464856)
#1 0x5f06e0 in __cmd_top /home/user/linux/tools/perf/builtin-top.c:1309:6
#2 0x5ef19f in cmd_top /home/user/linux/tools/perf/builtin-top.c:1762:11
#3 0x7b28c0 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
#4 0x7b119f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
#5 0x7b2423 in run_argv /home/user/linux/tools/perf/perf.c:409:2
#6 0x7b0c19 in main /home/user/linux/tools/perf/perf.c:539:3
#7 0x7f4f696f7b74 in __libc_start_main /usr/src/debug/glibc-2.33-16.fc34.x86_64/csu/../csu/libc-start.c:332:16
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/user/linux/tools/perf/perf+0x49875b) in StrstrCheck(void*, char*, char const*, char const*)
Shadow bytes around the buggy address:
0x0c268000b560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c268000b570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c268000b580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c268000b590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c268000b5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c268000b5b0: 00 00 00 00 00 00 00 00 00 04[fa]fa fa fa fa fa
0x0c268000b5c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c268000b5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c268000b5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c268000b5f0: 07 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c268000b600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==363148==ABORTING
Suggested-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Link: http://lore.kernel.org/lkml/20210621222108.196219-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
From the intel 2.0.4 driver
Signed-off-by: Tom Rix trix@redhat.com