Ingo Molnar [Wed, 14 May 2008 06:10:31 +0000 (08:10 +0200)]
ftrace: fix mcount export bug
David S. Miller noticed the following bug: the -pg instrumentation
function callback is named differently on each platform. On x86 it
is mcount, on sparc it is _mcount. So the export does not make sense
in kernel/trace/ftrace.c - move it to x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Miller [Wed, 14 May 2008 05:06:56 +0000 (22:06 -0700)]
ftrace: remove packed attribute on ftrace_page.
It causes unaligned access traps on platforms like sparc
(ftrace_page may be marked packed, but once we return
a dyn_ftrace sub-object from this array to another piece
of code, the "packed" part of the typing information doesn't
propagate).
But also, it didn't serve any purpose either. Even if packed,
on 64-bit or 32-bit, it didn't give us any more dyn_ftrace
entries per-page.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To support the forthcoming "immediate values" marker optimization, we must have
a way to declare markers in few code paths that does not use instruction
modification based enable. This will be the case of printk(), some traps and
eventually lockdep instrumentation.
Changelog :
- Fix reversed boolean logic of "generic".
> Not in this patch, but I noticed:
>
> #define __trace_mark(name, call_private, format, args...) \
> do { \
> static const char __mstrtab_##name[] \
> __attribute__((section("__markers_strings"))) \
> = #name "\0" format; \
> static struct marker __mark_##name \
> __attribute__((section("__markers"), aligned(8))) = \
> { __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
> 0, 0, marker_probe_cb, \
> { __mark_empty_function, NULL}, NULL }; \
> __mark_check_format(format, ## args); \
> if (unlikely(__mark_##name.state)) { \
> (*__mark_##name.call) \
> (&__mark_##name, call_private, \
> format, ## args); \
> } \
> } while (0)
>
> In this call:
>
> (*__mark_##name.call) \
> (&__mark_##name, call_private, \
> format, ## args); \
>
> you make gcc allocate duplicate format string. You can use
> &__mstrtab_##name[sizeof(#name)] instead since it holds the same string,
> or drop ", format," above and "const char *fmt" from here:
>
> void (*call)(const struct marker *mdata, /* Probe wrapper */
> void *call_private, const char *fmt, ...);
>
> since mdata->format is the same and all callees which need it can take it there.
Very good point. I actually thought about dropping it, since it would
remove an unnecessary argument from the stack. And actually, since I now
have the marker_probe_cb sitting between the marker site and the
callbacks, there is no API change required. Thanks :)
Steven Rostedt [Mon, 12 May 2008 19:21:04 +0000 (21:21 +0200)]
ftrace: limit trace entries
Currently there is no protection from the root user to use up all of
memory for trace buffers. If the root user allocates too many entries,
the OOM killer might start kill off all tasks.
This patch adds an algorith to check the following condition:
Pekka Paalanen [Mon, 12 May 2008 19:21:02 +0000 (21:21 +0200)]
ftrace: add readpos to struct trace_seq; add trace_seq_to_user()
Refactor code from tracing_read_pipe() and create trace_seq_to_user().
Moved trace_seq_reset() call before iter->trace->read() call so that
when all leftover data is returned, trace_seq is reset automatically.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:21:02 +0000 (21:21 +0200)]
ftrace: use raw_smp_processor_id for mcount functions
Due to debug hooks in the kernel that can change the way smp_processor_id
works, use raw_smp_processor_id in mcount called functions (namely
ftrace_record_ip). Currently we annotate most debug functions from calling
mcount, but we should not rely on that to prevent kernel lockups.
This patch uses the raw_smp_processor_id to prevent a recusive crash
that can happen if a debug hook in smp_processor_id calls mcount.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:21:01 +0000 (21:21 +0200)]
ftrace: fix setting of pos in read_pipe
In resetting the iterator in read_pipe, the reset of pos was
postitioned in the wrong location with respect to the memset
operation. The current code sets pos, incorrectly, to zero.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pekka Paalanen [Mon, 12 May 2008 19:21:01 +0000 (21:21 +0200)]
x86_64: fix kernel rodata NX setting
Without CONFIG_DYNAMIC_FTRACE, mark_rodata_ro() would mark a wrong
number of pages as no-execute. The bug was introduced in the patch
"ftrace: dont write protect kernel text". The symptom was machine reboot
after a CPU hotplug.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:21:00 +0000 (21:21 +0200)]
ftrace: fix comm on function trace output
In cleaning up of the sched_switch code, the function trace recording
of task comms was removed. This patch adds back the recording of comms
for function trace. The output of ftrace now has the task comm instead
of <...>.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:21:00 +0000 (21:21 +0200)]
lockdep: update lockdep_recursion on graph_lock
With the introduction of ftrace, it is possible to recurse into
the lockdep functions via the mcount call. To prevent possible
lockups, updating the lockdep_recursion counter on grabbing the internal
lockdep_lock should prevent deadlocks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:59 +0000 (21:20 +0200)]
ftrace: trace_entries to dynamically change trace buffer size
This patch adds /debug/tracing/trace_entries that allows users to
see as well as modify the number of trace entries the buffers hold.
The number of entries only increments in ENTRIES_PER_PAGE which is
calculated by the size of an entry with the number of entries that
can fit in a page. The user does not need to use an exact size, but
the entries will be rounded to one of the increments.
Trying to set the entries to 0 will return with -EINVAL.
To avoid race conditions, the modification of the buffer size can only
be done when tracing is completely disabled (current_tracer == none).
A info message will be printed if a user tries to modify the buffer size
when not set to none.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:58 +0000 (21:20 +0200)]
ftrace: trace_pipe implement NONBLOCK
This patch implements "NONBLOCK" for trace_pipe. If the trace_pipe is opened
with O_NONBLOCK, then the trace_pipe read will not block when buffer is empty.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ankita Garg [Mon, 12 May 2008 19:20:58 +0000 (21:20 +0200)]
ftrace: fix conversion of task state to char in latency tracer
The conversion of task states to a character in the sched_switch tracer (part
of latency tracer infrastructure), seems to be incorrect. We currently do it
by indexing into the state_to_char array using the state value. The state
values do not map directly into the array index and are thus incorrect. The
following patch addresses this issue. This is also what is being done even
in the show_task() routine in kernel/sched.c
Pekka Paalanen [Mon, 12 May 2008 19:20:56 +0000 (21:20 +0200)]
x86: add a list for custom page fault handlers.
Provides kernel modules a way to register custom page fault handlers.
On every page fault this will call a list of registered functions. The
functions may handle the fault and force do_page_fault() to return
immediately.
This functionality is similar to the now removed page fault notifiers.
Custom page fault handlers are used by debugging and reverse engineering
tools. Mmiotrace is one such tool and a patch to add it into the tree
will follow.
The custom page fault handlers are called earlier in do_page_fault()
than the page fault notifiers were.
Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:56 +0000 (21:20 +0200)]
ftrace: dont write protect kernel text
Dynamic ftrace cant work when the kernel has its text write protected.
This patch keeps the kernel from being write protected when
dynamic ftrace is in place.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:56 +0000 (21:20 +0200)]
ftrace: selftest protect againt max flip
There is a slight race condition in the selftest where the max update
of the wakeup and irqs/preemption off tests can be doing a max update as
the buffers are being tested. If this happens the system can crash with
a GPF.
This patch adds the max update spinlock around the checking of the
buffers to prevent such a race.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:56 +0000 (21:20 +0200)]
ftrace: fix mutex unlock in trace output
If the trace output changes on reading the trace files, there is a chance
that the start function will return NULL. If the start function of a sequence
returns NULL the stop equivalent is not called. In this case, all locks
that are taken must be released even if they are released in the stop function.
This patch fixes a case that a mutex was not released on return of NULL
in the start sequence function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:55 +0000 (21:20 +0200)]
ftrace: add UNINTERRUPTIBLE state for kftraced on disable
When dynamic ftrace fails and sets itself disabled, the ftraced daemon
will go back to sleep everytime it wakes up. The setting of the
ftraced state to UNINTERRUPTIBLE is skipped in this process, and the
daemon takes up 100% of the CPU. This patch makes sure the ftraced daemon
sets itself to UNINTERRUPTIBLE in that loop.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:55 +0000 (21:20 +0200)]
ftrace: use Makefile to remove tracing from lockdep
This patch removes the "notrace" annotation from lockdep and adds the debugging
files in the kernel director to those that should not be compiled with
"-pg" mcount tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:55 +0000 (21:20 +0200)]
ftrace: remove function tracing from spinlock debug
The debug functions in spin_lock debugging pollute the output of the
function tracer. This patch adds the debug files in the lib director
to those that should not be compiled with mcount tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:55 +0000 (21:20 +0200)]
ftrace: user raw_spin_lock in tracing
Lock debugging enabled cause huge performance problems for tracing. Having
the lock verification happening for every function that is called
because mcount calls spin_lock can cripple the system.
This patch converts the spin_locks used by ftrace into raw_spin_locks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:55 +0000 (21:20 +0200)]
ftrace: irqsoff use raw_smp_processor_id
This patch changes the use of __get_cpu_var to explicitly calling
raw_smp_processor_id and using the per_cpu() macro. On some debug
configurations, the use of __get_cpu_var may cause ftrace to trigger
and this can cause problems with the irqsoff tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:54 +0000 (21:20 +0200)]
ftrace: fix dynamic ftrace selftest
With the adding of the configuration changes in the Makefile to prevent
tracing of functions in the ftrace code, all tracing of all the ftrace
code has been removed. Unfortunately, one of the selftests, relied on
a function to be traced. With the new change, the function was no longer
traced and the test failed.
This patch separates out the test function into its own file so that
we can add the "-pg" flag to the compilation of that function and the
adding of the mcount call to that function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:54 +0000 (21:20 +0200)]
ftrace: add TRACE_STACK and TRACE_SPECIAL to selftest validation
The selftest validation code checks for valid entries in the trace buffer.
TRACE_STACK and TRACE_SPECIAL have been added to the code but not to
the validator. This patch adds the two to prevent them from flagging a
failure in the selftest.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:54 +0000 (21:20 +0200)]
ftrace: printk and trace irqsoff and wakeups
printk called from wakeup critical timings and irqs off can
cause deadlocks since printk might do a wakeup itself. If the
call to printk happens with the runqueue lock held, it can
deadlock.
This patch protects the printk from being called in trace irqs off
with a test to see if the runqueue for the current CPU is locked.
If it is locked, the printk is skipped.
The wakeup always holds the runqueue lock, so the printk is
simply removed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:50 +0000 (21:20 +0200)]
ftrace: do not profile lib/string.o
Most archs define the string and memory compare functions in assembly.
Some do not. But these functions may be used in some archs at early
boot up.
Since most archs define this code in assembly and they are not usually
traced, there's no need to trace them when they are not defined in
assembly.
This patch removes the -pg from the CFLAGS for lib/string.o.
This prevents the string functions use in either vdso or early bootup
from crashing the system.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:50 +0000 (21:20 +0200)]
ftrace: remove address of function names
PowerPC is very fragile when it comes to use of function names
and function addresses. ftrace needs to either use all function
addresses or function names (i.e. my_func as suppose to &my_func).
This patch chooses to use the names and not the addresses, and
makes ftrace consistent.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:49 +0000 (21:20 +0200)]
ftrace: add trace_function api for other tracers to use
A new check was added in the ftrace function that wont trace if the CPU
trace buffer is disabled. Unfortunately, other tracers used ftrace() to
write to the buffer after they disabled it. The new disable check makes
these calls into a nop.
This patch changes the __ftrace that is called without the check into a
new api for the other tracers to use, called "trace_function". The other
tracers use this interface instead when the trace CPU buffer is already
disabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:48 +0000 (21:20 +0200)]
ftrace: disable tracing on failure
Since ftrace touches practically every function. If we detect any
anomaly, we want to fully disable ftrace. This patch adds code
to try shutdown ftrace as much as possible without doing any more
harm is something is detected not quite correct.
This only kills ftrace, this patch does have checks for other parts of
the tracer (irqsoff, wakeup, etc.).
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:48 +0000 (21:20 +0200)]
ftrace - fix dynamic ftrace memory leak
The ftrace dynamic function update allocates a record to store the
instruction pointers that are being modified. If the modified
instruction pointer fails to update, then the record is marked as
failed and nothing more is done.
Worse, if the modification fails, but the record ip function is still
called, it will allocate a new record and try again. In just a matter
of time, will this cause a serious memory leak and crash the system.
This patch plugs this memory leak. When a record fails, it is
included back into the pool of records to be used. Now a record may
fail over and over again, but the number of allocated records will
not increase.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:46 +0000 (21:20 +0200)]
ftrace: user run time file reading
This patch creates a file called trace_pipe in the tracing
debug directory. This file is a consumer of the trace buffers.
This means that reads of this file consumes the entries from
the trace buffers so that they will not be read a second time,
as contrast to the static buffers latency_trace and trace.
Reading from the trace_pipe will remove the entries from trace
and latency_trace too.
The advantage that trace_pipe has is that it can record live
traces. It will block when there is nothing in the buffer,
and read the entries as they are entered. An EOF happens when
tracing is disabled (tracing_enabled = 0).
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:46 +0000 (21:20 +0200)]
ftrace: add a buffer for output
Later patches will need to print the same things as the seq output
does. But those outputs will not use the seq utility. This patch
adds a buffer to the iterator, that can be used by either the
seq utility or other output.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:45 +0000 (21:20 +0200)]
ftrace: change buffers to producer consumer
This patch changes the way the CPU trace buffers are handled.
Instead of always starting from the trace page head, the logic
is changed to a producer consumer logic. This allows for the
buffers to be drained while they are alive.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:45 +0000 (21:20 +0200)]
ftrace: reset selftests
The tests may leave stuff in the buffers. This resets the buffers
after each test is run. If a test fails, it does not reset the
buffer to avoid touching a buffer that is corrupted.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>