From: Alexey Starikovskiy Date: Thu, 15 Feb 2007 21:13:51 +0000 (-0500) Subject: Execute AML Notify() requests on stack. X-Git-Tag: v2.6.21-rc1~88^2 X-Git-Url: https://err.no/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=5f7748cf91558a5026ded5be93c5bf6c1ac34edf;p=linux-2.6 Execute AML Notify() requests on stack. HP nx6125/nx6325/... machines have a _GPE handler with an infinite loop sending Notify() events to different ACPI subsystems. The notify handler in the ACPI thermal driver is a C-routine, which may invoke the ACPI interpreter again to get access to some ACPI variables such as temperature. (acpi_evaluate_xxx) On these HP machines such an evaluation changes state of an ASL variable and lets the loop above break. In the current ACPI implementation, Notify requests are being deferred to the same kacpid workqueue on which the above GPE handler with infinite loop is executing. Thus we have a deadlock -- loop will continue to spin, sending notify events, and at the same time preventing these notify events from being run on a workqueue. All notify events are deferred, thus we see explosion in memory consumption. Also as GPE handling is blocked, machines overheat because ACPI-based fan control is stalled. Eventually by external poll of the same acpi_evaluate, kacpid is released and all the queued notify events are free to run, thus 100% CPU utilization by kacpid for several seconds or more. To prevent this failure, Linux must not send notify events to the kacpid workqueue -- either executing them immediately or putting them on some other thread. The first attempt to create a new thread was done by Peter Wainwright He created a bunch of threads, which were stealing work from a kacpid workqueue. This patch appeared in 2.6.15-based kernel shipped with Ubuntu 6.06 LTS. Second attempt was done by Alexey Starikovskiy, who created a new thread for each Notify event. This worked OK on HP nx machines, but broke Linus' Compaq n620c, by producing threads with a speed what they stopped the machine completely. Thus this patch was reverted from 2.6.18-rc2. Alexey re-made the patch to create second workqueue just for notify events, thus hopping it will not break Linus' machine. Patch was tested on the same HP nx machines in #5534 and #7122, but this broke Linus' machine also and was reverted from 2.6.19-rc with much fanfair. The 4th patch inserted schedule_timeout(1) into deferred execution of kacpid, if we had any notify requests pending, but Linus decided that it was too complex (involved either changes to workqueue to see if it's empty or atomic inc/dec). Then a 5th attempt did a yield() to every GPE execution. Finally, this 6th generation patch simply executes the notify handler on the stack. Previous attempts to do this simple solution failed because of issues in AML mutex re-entrancy which are now fixed by the previous patch in this series. http://bugzilla.kernel.org/show_bug.cgi?id=5534 Signed-off-by: Alexey Starikovskiy Signed-off-by: Len Brown --- diff --git a/drivers/acpi/events/evmisc.c b/drivers/acpi/events/evmisc.c index 1b784ffe54..d572700197 100644 --- a/drivers/acpi/events/evmisc.c +++ b/drivers/acpi/events/evmisc.c @@ -196,12 +196,11 @@ acpi_ev_queue_notify_request(struct acpi_namespace_node * node, notify_info->notify.value = (u16) notify_value; notify_info->notify.handler_obj = handler_obj; - status = - acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_ev_notify_dispatch, - notify_info); - if (ACPI_FAILURE(status)) { - acpi_ut_delete_generic_state(notify_info); - } + acpi_ex_relinquish_interpreter(); + + acpi_ev_notify_dispatch(notify_info); + + acpi_ex_reacquire_interpreter(); } if (!handler_obj) {