aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-09 10:06:18 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-09 10:06:18 -0700
commitd1e521adad250ab8c979861c857fa2b1542c9741 (patch)
tree8c83de6b27c8a1aae0952cb86a25be7f27a51f9b
parent595a56ac1b0d5f0a16a89589ef55ffd35c1967a2 (diff)
parent388d8bdb87e01bcea6d0b2bf797b5f6d7b2401fb (diff)
downloadlinux-mmp-d1e521adad250ab8c979861c857fa2b1542c9741.tar.gz
Merge tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt: "No new features this release. Mostly clean ups, restructuring and documentation. - Have ftrace_bug() show ftrace errors before the WARN, as the WARN will reboot the box before the error messages are printed if panic_on_warn is set. - Have traceoff_on_warn disable tracing sooner (before prints) - Write a message to the trace buffer that its being disabled when disable_trace_on_warning() is set. - Separate out synthetic events from histogram code to let it be used by other parts of the kernel. - More documentation on histogram design. - Other small fixes and clean ups" * tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove obsolete PREEMPTIRQ_EVENTS kconfig option tracing/doc: Fix ascii-art in histogram-design.rst tracing: Add a trace print when traceoff_on_warning is triggered ftrace,bug: Improve traceoff_on_warn selftests/ftrace: Distinguish between hist and synthetic event checks tracing: Move synthetic events to a separate file tracing: Fix events.rst section numbering tracing/doc: Fix typos in histogram-design.rst tracing: Add hist_debug trace event files for histogram debugging tracing: Add histogram-design document tracing: Check state.disabled in synth event trace functions tracing/probe: reverse arguments to list_add tools/bootconfig: Add a summary of test cases and return error ftrace: show debugging information when panic_on_warn set
-rw-r--r--Documentation/trace/histogram-design.rst2115
-rw-r--r--kernel/trace/Kconfig52
-rw-r--r--kernel/trace/Makefile1
-rw-r--r--kernel/trace/ftrace.c8
-rw-r--r--kernel/trace/trace.c5
-rw-r--r--kernel/trace/trace.h1
-rw-r--r--kernel/trace/trace_events.c4
-rw-r--r--kernel/trace/trace_events_hist.c2073
-rw-r--r--kernel/trace/trace_events_synth.c1789
-rw-r--r--kernel/trace/trace_probe.c2
-rw-r--r--kernel/trace/trace_synth.h36
-rw-r--r--lib/bug.c3
-rwxr-xr-xtools/bootconfig/test-bootconfig.sh7
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc5
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc5
21 files changed, 4322 insertions, 1814 deletions
diff --git a/Documentation/trace/histogram-design.rst b/Documentation/trace/histogram-design.rst
new file mode 100644
index 00000000000000..eef840043da9c6
--- /dev/null
+++ b/Documentation/trace/histogram-design.rst
@@ -0,0 +1,2115 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Histogram Design Notes
+======================
+
+:Author: Tom Zanussi <zanussi@kernel.org>
+
+This document attempts to provide a description of how the ftrace
+histograms work and how the individual pieces map to the data
+structures used to implement them in trace_events_hist.c and
+tracing_map.c.
+
+Note: All the ftrace histogram command examples assume the working
+ directory is the ftrace /tracing directory. For example::
+
+ # cd /sys/kernel/debug/tracing
+
+Also, the histogram output displayed for those commands will be
+generally be truncated - only enough to make the point is displayed.
+
+'hist_debug' trace event files
+==============================
+
+If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an
+event file named 'hist_debug' will appear in each event's
+subdirectory. This file can be read at any time and will display some
+of the hist trigger internals described in this document. Specific
+examples and output will be described in test cases below.
+
+Basic histograms
+================
+
+First, basic histograms. Below is pretty much the simplest thing you
+can do with histograms - create one with a single key on a single
+event and cat the output::
+
+ # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
+
+ # cat events/sched/sched_waking/hist
+
+ { pid: 18249 } hitcount: 1
+ { pid: 13399 } hitcount: 1
+ { pid: 17973 } hitcount: 1
+ { pid: 12572 } hitcount: 1
+ ...
+ { pid: 10 } hitcount: 921
+ { pid: 18255 } hitcount: 1444
+ { pid: 25526 } hitcount: 2055
+ { pid: 5257 } hitcount: 2055
+ { pid: 27367 } hitcount: 2055
+ { pid: 1728 } hitcount: 2161
+
+ Totals:
+ Hits: 21305
+ Entries: 183
+ Dropped: 0
+
+What this does is create a histogram on the sched_waking event using
+pid as a key and with a single value, hitcount, which even if not
+explicitly specified, exists for every histogram regardless.
+
+The hitcount value is a per-bucket value that's automatically
+incremented on every hit for the given key, which in this case is the
+pid.
+
+So in this histogram, there's a separate bucket for each pid, and each
+bucket contains a value for that bucket, counting the number of times
+sched_waking was called for that pid.
+
+Each histogram is represented by a hist_data struct.
+
+To keep track of each key and value field in the histogram, hist_data
+keeps an array of these fields named fields[]. The fields[] array is
+an array containing struct hist_field representations of each
+histogram val and key in the histogram (variables are also included
+here, but are discussed later). So for the above histogram we have one
+key and one value; in this case the one value is the hitcount value,
+which all histograms have, regardless of whether they define that
+value or not, which the above histogram does not.
+
+Each struct hist_field contains a pointer to the ftrace_event_field
+from the event's trace_event_file along with various bits related to
+that such as the size, offset, type, and a hist_field_fn_t function,
+which is used to grab the field's data from the ftrace event buffer
+(in most cases - some hist_fields such as hitcount don't directly map
+to an event field in the trace buffer - in these cases the function
+implementation gets its value from somewhere else). The flags field
+indicates which type of field it is - key, value, variable, variable
+reference, etc., with value being the default.
+
+The other important hist_data data structure in addition to the
+fields[] array is the tracing_map instance created for the histogram,
+which is held in the .map member. The tracing_map implements the
+lock-free hash table used to implement histograms (see
+kernel/trace/tracing_map.h for much more discussion about the
+low-level data structures implementing the tracing_map). For the
+purposes of this discussion, the tracing_map contains a number of
+buckets, each bucket corresponding to a particular tracing_map_elt
+object hashed by a given histogram key.
+
+Below is a diagram the first part of which describes the hist_data and
+associated key and value fields for the histogram described above. As
+you can see, there are two fields in the fields array, one val field
+for the hitcount and one key field for the pid key.
+
+Below that is a diagram of a run-time snapshot of what the tracing_map
+might look like for a given run. It attempts to show the
+relationships between the hist_data fields and the tracing_map
+elements for a couple hypothetical keys and values.::
+
+ +------------------+
+ | hist_data |
+ +------------------+ +----------------+
+ | .fields[] |---->| val = hitcount |----------------------------+
+ +----------------+ +----------------+ |
+ | .map | | .size | |
+ +----------------+ +--------------+ |
+ | .offset | |
+ +--------------+ |
+ | .fn() | |
+ +--------------+ |
+ . |
+ . |
+ . |
+ +----------------+ <--- n_vals |
+ | key = pid |----------------------------|--+
+ +----------------+ | |
+ | .size | | |
+ +--------------+ | |
+ | .offset | | |
+ +--------------+ | |
+ | .fn() | | |
+ +----------------+ <--- n_fields | |
+ | unused | | |
+ +----------------+ | |
+ | | | |
+ +--------------+ | |
+ | | | |
+ +--------------+ | |
+ | | | |
+ +--------------+ | |
+ n_keys = n_fields - n_vals | |
+
+The hist_data n_vals and n_fields delineate the extent of the fields[] | |
+array and separate keys from values for the rest of the code. | |
+
+Below is a run-time representation of the tracing_map part of the | |
+histogram, with pointers from various parts of the fields[] array | |
+to corresponding parts of the tracing_map. | |
+
+The tracing_map consists of an array of tracing_map_entrys and a set | |
+of preallocated tracing_map_elts (abbreviated below as map_entry and | |
+map_elt). The total number of map_entrys in the hist_data.map array = | |
+map->max_elts (actually map->map_size but only max_elts of those are | |
+used. This is a property required by the map_insert() algorithm). | |
+
+If a map_entry is unused, meaning no key has yet hashed into it, its | |
+.key value is 0 and its .val pointer is NULL. Once a map_entry has | |
+been claimed, the .key value contains the key's hash value and the | |
+.val member points to a map_elt containing the full key and an entry | |
+for each key or value in the map_elt.fields[] array. There is an | |
+entry in the map_elt.fields[] array corresponding to each hist_field | |
+in the histogram, and this is where the continually aggregated sums | |
+corresponding to each histogram value are kept. | |
+
+The diagram attempts to show the relationship between the | |
+hist_data.fields[] and the map_elt.fields[] with the links drawn | |
+between diagrams::
+
+ +-----------+ | |
+ | hist_data | | |
+ +-----------+ | |
+ | .fields | | |
+ +---------+ +-----------+ | |
+ | .map |---->| map_entry | | |
+ +---------+ +-----------+ | |
+ | .key |---> 0 | |
+ +---------+ | |
+ | .val |---> NULL | |
+ +-----------+ | |
+ | map_entry | | |
+ +-----------+ | |
+ | .key |---> pid = 999 | |
+ +---------+ +-----------+ | |
+ | .val |--->| map_elt | | |
+ +---------+ +-----------+ | |
+ . | .key |---> full key * | |
+ . +---------+ +---------------+ | |
+ . | .fields |--->| .sum (val) |<-+ |
+ +-----------+ +---------+ | 2345 | | |
+ | map_entry | +---------------+ | |
+ +-----------+ | .offset (key) |<----+
+ | .key |---> 0 | 0 | | |
+ +---------+ +---------------+ | |
+ | .val |---> NULL . | |
+ +-----------+ . | |
+ | map_entry | . | |
+ +-----------+ +---------------+ | |
+ | .key | | .sum (val) or | | |
+ +---------+ +---------+ | .offset (key) | | |
+ | .val |--->| map_elt | +---------------+ | |
+ +-----------+ +---------+ | .sum (val) or | | |
+ | map_entry | | .offset (key) | | |
+ +-----------+ +---------------+ | |
+ | .key |---> pid = 4444 | |
+ +---------+ +-----------+ | |
+ | .val | | map_elt | | |
+ +---------+ +-----------+ | |
+ | .key |---> full key * | |
+ +---------+ +---------------+ | |
+ | .fields |--->| .sum (val) |<-+ |
+ +---------+ | 65523 | |
+ +---------------+ |
+ | .offset (key) |<----+
+ | 0 |
+ +---------------+
+ .
+ .
+ .
+ +---------------+
+ | .sum (val) or |
+ | .offset (key) |
+ +---------------+
+ | .sum (val) or |
+ | .offset (key) |
+ +---------------+
+
+Abbreviations used in the diagrams::
+
+ hist_data = struct hist_trigger_data
+ hist_data.fields = struct hist_field
+ fn = hist_field_fn_t
+ map_entry = struct tracing_map_entry
+ map_elt = struct tracing_map_elt
+ map_elt.fields = struct tracing_map_field
+
+Whenever a new event occurs and it has a hist trigger associated with
+it, event_hist_trigger() is called. event_hist_trigger() first deals
+with the key: for each subkey in the key (in the above example, there
+is just one subkey corresponding to pid), the hist_field that
+represents that subkey is retrieved from hist_data.fields[] and the
+hist_field_fn_t fn() associated with that field, along with the
+field's size and offset, is used to grab that subkey's data from the
+current trace record.
+
+Once the complete key has been retrieved, it's used to look that key
+up in the tracing_map. If there's no tracing_map_elt associated with
+that key, an empty one is claimed and inserted in the map for the new
+key. In either case, the tracing_map_elt associated with that key is
+returned.
+
+Once a tracing_map_elt available, hist_trigger_elt_update() is called.
+As the name implies, this updates the element, which basically means
+updating the element's fields. There's a tracing_map_field associated
+with each key and value in the histogram, and each of these correspond
+to the key and value hist_fields created when the histogram was
+created. hist_trigger_elt_update() goes through each value hist_field
+and, as for the keys, uses the hist_field's fn() and size and offset
+to grab the field's value from the current trace record. Once it has
+that value, it simply adds that value to that field's
+continually-updated tracing_map_field.sum member. Some hist_field
+fn()s, such as for the hitcount, don't actually grab anything from the
+trace record (the hitcount fn() just increments the counter sum by 1),
+but the idea is the same.
+
+Once all the values have been updated, hist_trigger_elt_update() is
+done and returns. Note that there are also tracing_map_fields for
+each subkey in the key, but hist_trigger_elt_update() doesn't look at
+them or update anything - those exist only for sorting, which can
+happen later.
+
+Basic histogram test
+--------------------
+
+This is a good example to try. It produces 3 value fields and 2 key
+fields in the output::
+
+ # echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
+
+To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It
+will show the trigger info of the histogram it corresponds to, along
+with the address of the hist_data associated with the histogram, which
+will become useful in later examples. It then displays the number of
+total hist_fields associated with the histogram along with a count of
+how many of those correspond to keys and how many correspond to values.
+
+It then goes on to display details for each field, including the
+field's flags and the position of each field in the hist_data's
+fields[] array, which is useful information for verifying that things
+internally appear correct or not, and which again will become even
+more useful in further examples::
+
+ # cat events/kmem/kmalloc/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active]
+ #
+
+ hist_data: 000000005e48c9a5
+
+ n_vals: 3
+ n_keys: 2
+ n_fields: 5
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ VAL: normal u64 value
+ ftrace_event_field name: bytes_req
+ type: size_t
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[2]:
+ flags:
+ VAL: normal u64 value
+ ftrace_event_field name: bytes_alloc
+ type: size_t
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[3]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: common_pid
+ type: int
+ size: 8
+ is_signed: 1
+
+ hist_data->fields[4]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: call_site
+ type: unsigned long
+ size: 8
+ is_signed: 0
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
+
+Variables
+=========
+
+Variables allow data from one hist trigger to be saved by one hist
+trigger and retrieved by another hist trigger. For example, a trigger
+on the sched_waking event can capture a timestamp for a particular
+pid, and later a sched_switch event that switches to that pid event
+can grab the timestamp and use it to calculate a time delta between
+the two events::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
+ events/sched/sched_waking/trigger
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>
+ events/sched/sched_switch/trigger
+
+In terms of the histogram data structures, variables are implemented
+as another type of hist_field and for a given hist trigger are added
+to the hist_data.fields[] array just after all the val fields. To
+distinguish them from the existing key and val fields, they're given a
+new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also
+make use of a new .var.idx field member in struct hist_field, which
+maps them to an index in a new map_elt.vars[] array added to the
+map_elt specifically designed to store and retrieve variable values.
+The diagram below shows those new elements and adds a new variable
+entry, ts0, corresponding to the ts0 variable in the sched_waking
+trigger above.
+
+sched_waking histogram
+----------------------::
+
+ +------------------+
+ | hist_data |<-------------------------------------------------------+
+ +------------------+ +-------------------+ |
+ | .fields[] |-->| val = hitcount | |
+ +----------------+ +-------------------+ |
+ | .map | | .size | |
+ +----------------+ +-----------------+ |
+ | .offset | |
+ +-----------------+ |
+ | .fn() | |
+ +-----------------+ |
+ | .flags | |
+ +-----------------+ |
+ | .var.idx | |
+ +-------------------+ |
+ | var = ts0 | |
+ +-------------------+ |
+ | .size | |
+ +-----------------+ |
+ | .offset | |
+ +-----------------+ |
+ | .fn() | |
+ +-----------------+ |
+ | .flags & FL_VAR | |
+ +-----------------+ |
+ | .var.idx |----------------------------+-+ |
+ +-----------------+ | | |
+ . | | |
+ . | | |
+ . | | |
+ +-------------------+ <--- n_vals | | |
+ | key = pid | | | |
+ +-------------------+ | | |
+ | .size | | | |
+ +-----------------+ | | |
+ | .offset | | | |
+ +-----------------+ | | |
+ | .fn() | | | |
+ +-----------------+ | | |
+ | .flags & FL_KEY | | | |
+ +-----------------+ | | |
+ | .var.idx | | | |
+ +-------------------+ <--- n_fields | | |
+ | unused | | | |
+ +-------------------+ | | |
+ | | | | |
+ +-----------------+ | | |
+ | | | | |
+ +-----------------+ | | |
+ | | | | |
+ +-----------------+ | | |
+ | | | | |
+ +-----------------+ | | |
+ | | | | |
+ +-----------------+ | | |
+ n_keys = n_fields - n_vals | | |
+ | | |
+
+This is very similar to the basic case. In the above diagram, we can | | |
+see a new .flags member has been added to the struct hist_field | | |
+struct, and a new entry added to hist_data.fields representing the ts0 | | |
+variable. For a normal val hist_field, .flags is just 0 (modulo | | |
+modifier flags), but if the value is defined as a variable, the .flags | | |
+contains a set FL_VAR bit. | | |
+
+As you can see, the ts0 entry's .var.idx member contains the index | | |
+into the tracing_map_elts' .vars[] array containing variable values. | | |
+This idx is used whenever the value of the variable is set or read. | | |
+The map_elt.vars idx assigned to the given variable is assigned and | | |
+saved in .var.idx by create_tracing_map_fields() after it calls | | |
+tracing_map_add_var(). | | |
+
+Below is a representation of the histogram at run-time, which | | |
+populates the map, along with correspondence to the above hist_data and | | |
+hist_field data structures. | | |
+
+The diagram attempts to show the relationship between the | | |
+hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | |
+the links drawn between diagrams. For each of the map_elts, you can | | |
+see that the .fields[] members point to the .sum or .offset of a key | | |
+or val and the .vars[] members point to the value of a variable. The | | |
+arrows between the two diagrams show the linkages between those | | |
+tracing_map members and the field definitions in the corresponding | | |
+hist_data fields[] members.::
+
+ +-----------+ | | |
+ | hist_data | | | |
+ +-----------+ | | |
+ | .fields | | | |
+ +---------+ +-----------+ | | |
+ | .map |---->| map_entry | | | |
+ +---------+ +-----------+ | | |
+ | .key |---> 0 | | |
+ +---------+ | | |
+ | .val |---> NULL | | |
+ +-----------+ | | |
+ | map_entry | | | |
+ +-----------+ | | |
+ | .key |---> pid = 999 | | |
+ +---------+ +-----------+ | | |
+ | .val |--->| map_elt | | | |
+ +---------+ +-----------+ | | |
+ . | .key |---> full key * | | |
+ . +---------+ +---------------+ | | |
+ . | .fields |--->| .sum (val) | | | |
+ . +---------+ | 2345 | | | |
+ . +--| .vars | +---------------+ | | |
+ . | +---------+ | .offset (key) | | | |
+ . | | 0 | | | |
+ . | +---------------+ | | |
+ . | . | | |
+ . | . | | |
+ . | . | | |
+ . | +---------------+ | | |
+ . | | .sum (val) or | | | |
+ . | | .offset (key) | | | |
+ . | +---------------+ | | |
+ . | | .sum (val) or | | | |
+ . | | .offset (key) | | | |
+ . | +---------------+ | | |
+ . | | | |
+ . +---------------->+---------------+ | | |
+ . | ts0 |<--+ | |
+ . | 113345679876 | | | |
+ . +---------------+ | | |
+ . | unused | | | |
+ . | | | | |
+ . +---------------+ | | |
+ . . | | |
+ . . | | |
+ . . | | |
+ . +---------------+ | | |
+ . | unused | | | |
+ . | | | | |
+ . +---------------+ | | |
+ . | unused | | | |
+ . | | | | |
+ . +---------------+ | | |
+ . | | |
+ +-----------+ | | |
+ | map_entry | | | |
+ +-----------+ | | |
+ | .key |---> pid = 4444 | | |
+ +---------+ +-----------+ | | |
+ | .val |--->| map_elt | | | |
+ +---------+ +-----------+ | | |
+ . | .key |---> full key * | | |
+ . +---------+ +---------------+ | | |
+ . | .fields |--->| .sum (val) | | | |
+ +---------+ | 2345 | | | |
+ +--| .vars | +---------------+ | | |
+ | +---------+ | .offset (key) | | | |
+ | | 0 | | | |
+ | +---------------+ | | |
+ | . | | |
+ | . | | |
+ | . | | |
+ | +---------------+ | | |
+ | | .sum (val) or | | | |
+ | | .offset (key) | | | |
+ | +---------------+ | | |
+ | | .sum (val) or | | | |
+ | | .offset (key) | | | |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ | | |
+ +---------------->| ts0 |<--+ | |
+ | 213499240729 | | |
+ +---------------+ | |
+ | unused | | |
+ | | | |
+ +---------------+ | |
+ . | |
+ . | |
+ . | |
+ +---------------+ | |
+ | unused | | |
+ | | | |
+ +---------------+ | |
+ | unused | | |
+ | | | |
+ +---------------+ | |
+
+For each used map entry, there's a map_elt pointing to an array of | |
+.vars containing the current value of the variables associated with | |
+that histogram entry. So in the above, the timestamp associated with | |
+pid 999 is 113345679876, and the timestamp variable in the same | |
+.var.idx for pid 4444 is 213499240729. | |
+
+sched_switch histogram | |
+---------------------- | |
+
+The sched_switch histogram paired with the above sched_waking | |
+histogram is shown below. The most important aspect of the | |
+sched_switch histogram is that it references a variable on the | |
+sched_waking histogram above. | |
+
+The histogram diagram is very similar to the others so far displayed, | |
+but it adds variable references. You can see the normal hitcount and | |
+key fields along with a new wakeup_lat variable implemented in the | |
+same way as the sched_waking ts0 variable, but in addition there's an | |
+entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | |
+
+Associated with the new var ref field are a couple of new hist_field | |
+members, var.hist_data and var_ref_idx. For a variable reference, the | |
+var.hist_data goes with the var.idx, which together uniquely identify | |
+a particular variable on a particular histogram. The var_ref_idx is | |
+just the index into the var_ref_vals[] array that caches the values of | |
+each variable whenever a hist trigger is updated. Those resulting | |
+values are then finally accessed by other code such as trace action | |
+code that uses the var_ref_idx values to assign param values. | |
+
+The diagram below describes the situation for the sched_switch | |
+histogram referred to before::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | |
+ events/sched/sched_switch/trigger | |
+ | |
+ +------------------+ | |
+ | hist_data | | |
+ +------------------+ +-----------------------+ | |
+ | .fields[] |-->| val = hitcount | | |
+ +----------------+ +-----------------------+ | |
+ | .map | | .size | | |
+ +----------------+ +---------------------+ | |
+ +--| .var_refs[] | | .offset | | |
+ | +----------------+ +---------------------+ | |
+ | | .fn() | | |
+ | var_ref_vals[] +---------------------+ | |
+ | +-------------+ | .flags | | |
+ | | $ts0 |<---+ +---------------------+ | |
+ | +-------------+ | | .var.idx | | |
+ | | | | +---------------------+ | |
+ | +-------------+ | | .var.hist_data | | |
+ | | | | +---------------------+ | |
+ | +-------------+ | | .var_ref_idx | | |
+ | | | | +-----------------------+ | |
+ | +-------------+ | | var = wakeup_lat | | |
+ | . | +-----------------------+ | |
+ | . | | .size | | |
+ | . | +---------------------+ | |
+ | +-------------+ | | .offset | | |
+ | | | | +---------------------+ | |
+ | +-------------+ | | .fn() | | |
+ | | | | +---------------------+ | |
+ | +-------------+ | | .flags & FL_VAR | | |
+ | | +---------------------+ | |
+ | | | .var.idx | | |
+ | | +---------------------+ | |
+ | | | .var.hist_data | | |
+ | | +---------------------+ | |
+ | | | .var_ref_idx | | |
+ | | +---------------------+ | |
+ | | . | |
+ | | . | |
+ | | . | |
+ | | +-----------------------+ <--- n_vals | |
+ | | | key = pid | | |
+ | | +-----------------------+ | |
+ | | | .size | | |
+ | | +---------------------+ | |
+ | | | .offset | | |
+ | | +---------------------+ | |
+ | | | .fn() | | |
+ | | +---------------------+ | |
+ | | | .flags | | |
+ | | +---------------------+ | |
+ | | | .var.idx | | |
+ | | +-----------------------+ <--- n_fields | |
+ | | | unused | | |
+ | | +-----------------------+ | |
+ | | | | | |
+ | | +---------------------+ | |
+ | | | | | |
+ | | +---------------------+ | |
+ | | | | | |
+ | | +---------------------+ | |
+ | | | | | |
+ | | +---------------------+ | |
+ | | | | | |
+ | | +---------------------+ | |
+ | | n_keys = n_fields - n_vals | |
+ | | | |
+ | | | |
+ | | +-----------------------+ | |
+ +---------------------->| var_ref = $ts0 | | |
+ | +-----------------------+ | |
+ | | .size | | |
+ | +---------------------+ | |
+ | | .offset | | |
+ | +---------------------+ | |
+ | | .fn() | | |
+ | +---------------------+ | |
+ | | .flags & FL_VAR_REF | | |
+ | +---------------------+ | |
+ | | .var.idx |--------------------------+ |
+ | +---------------------+ |
+ | | .var.hist_data |----------------------------+
+ | +---------------------+
+ +---| .var_ref_idx |
+ +---------------------+
+
+Abbreviations used in the diagrams::
+
+ hist_data = struct hist_trigger_data
+ hist_data.fields = struct hist_field
+ fn = hist_field_fn_t
+ FL_KEY = HIST_FIELD_FL_KEY
+ FL_VAR = HIST_FIELD_FL_VAR
+ FL_VAR_REF = HIST_FIELD_FL_VAR_REF
+
+When a hist trigger makes use of a variable, a new hist_field is
+created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the
+var.idx and var.hist_data take the same values as the referenced
+variable, as well as the referenced variable's size, type, and
+is_signed values. The VAR_REF field's .name is set to the name of the
+variable it references. If a variable reference was created using the
+explicit system.event.$var_ref notation, the hist_field's system and
+event_name variables are also set.
+
+So, in order to handle an event for the sched_switch histogram,
+because we have a reference to a variable on another histogram, we
+need to resolve all variable references first. This is done via the
+resolve_var_refs() calls made from event_hist_trigger(). What this
+does is grabs the var_refs[] array from the hist_data representing the
+sched_switch histogram. For each one of those, the referenced
+variable's var.hist_data along with the current key is used to look up
+the corresponding tracing_map_elt in that histogram. Once found, the
+referenced variable's var.idx is used to look up the variable's value
+using tracing_map_read_var(elt, var.idx), which yields the value of
+the variable for that element, ts0 in the case above. Note that both
+the hist_fields representing both the variable and the variable
+reference have the same var.idx, so this is straightforward.
+
+Variable and variable reference test
+------------------------------------
+
+This example creates a variable on the sched_waking event, ts0, and
+uses it in the sched_switch trigger. The sched_switch trigger also
+creates its own variable, wakeup_lat, but nothing yet uses it::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
+
+Looking at the sched_waking 'hist_debug' output, in addition to the
+normal key and value hist_fields, in the val fields section we see a
+field with the HIST_FIELD_FL_VAR flag, which indicates that that field
+represents a variable. Note that in addition to the variable name,
+contained in the var.name field, it includes the var.idx, which is the
+index into the tracing_map_elt.vars[] array of the actual variable
+location. Note also that the output shows that variables live in the
+same part of the hist_data->fields[] array as normal values::
+
+ # cat events/sched/sched_waking/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 000000009536f554
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+Moving on to the sched_switch trigger hist_debug output, in addition
+to the unused wakeup_lat variable, we see a new section displaying
+variable references. Variable references are displayed in a separate
+section because in addition to to being logically separate from
+variables and values, they actually live in a separate hist_data
+array, var_refs[].
+
+In this example, the sched_switch trigger has a reference to a
+variable on the sched_waking trigger, $ts0. Looking at the details,
+we can see that the var.hist_data value of the referenced variable
+matches the previously displayed sched_waking trigger, and the var.idx
+value matches the previously displayed var.idx value for that
+variable. Also displayed is the var_ref_idx value for that variable
+reference, which is where the value for that variable is cached for
+use when the trigger is invoked::
+
+ # cat events/sched/sched_switch/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 00000000f4ee8006
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 0
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+ variable reference fields:
+
+ hist_data->var_refs[0]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 000000009536f554
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
+
+ # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+Actions and Handlers
+====================
+
+Adding onto the previous example, we will now do something with that
+wakeup_lat variable, namely send it and another field as a synthetic
+event.
+
+The onmatch() action below basically says that whenever we have a
+sched_switch event, if we have a matching sched_waking event, in this
+case if we have a pid in the sched_waking histogram that matches the
+the next_pid field on this sched_switch event, we retrieve the
+variables specified in the wakeup_latency() trace action, and use
+them to generate a new wakeup_latency event into the trace stream.
+
+Note that the way the trace handlers such as wakeup_latency() (which
+could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid)
+are implemented, the parameters specified to the trace handler must be
+variables. In this case, $wakeup_lat is obviously a variable, but
+next_pid isn't, since it's just naming a field in the sched_switch
+trace event. Since this is something that almost every trace() and
+save() action does, a special shortcut is implemented to allow field
+names to be used directly in those cases. How it works is that under
+the covers, a temporary variable is created for the named field, and
+this variable is what is actually passed to the trace handler. In the
+code and documentation, this type of variable is called a 'field
+variable'.
+
+Fields on other trace event's histograms can be used as well. In that
+case we have to generate a new histogram and an unfortunately named
+'synthetic_field' (the use of synthetic here has nothing to do with
+synthetic events) and use that special histogram field as a variable.
+
+The diagram below illustrates the new elements described above in the
+context of the sched_switch histogram using the onmatch() handler and
+the trace() action.
+
+First, we define the wakeup_latency synthetic event::
+
+ # echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
+
+Next, the sched_waking hist trigger as before::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
+ events/sched/sched_waking/trigger
+
+Finally, we create a hist trigger on the sched_switch event that
+generates a wakeup_latency() trace event. In this case we pass
+next_pid into the wakeup_latency synthetic event invocation, which
+means it will be automatically converted into a field variable::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
+ onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >>
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+The diagram for the sched_switch event is similar to previous examples
+but shows the additional field_vars[] array for hist_data and shows
+the linkages between the field_vars and the variables and references
+created to implement the field variables. The details are discussed
+below::
+
+ +------------------+
+ | hist_data |
+ +------------------+ +-----------------------+
+ | .fields[] |-->| val = hitcount |
+ +----------------+ +-----------------------+
+ | .map | | .size |
+ +----------------+ +---------------------+
+ +---| .field_vars[] | | .offset |
+ | +----------------+ +---------------------+
+ |+--| .var_refs[] | | .offset |
+ || +----------------+ +---------------------+
+ || | .fn() |
+ || var_ref_vals[] +---------------------+
+ || +-------------+ | .flags |
+ || | $ts0 |<---+ +---------------------+
+ || +-------------+ | | .var.idx |
+ || | $next_pid |<-+ | +---------------------+
+ || +-------------+ | | | .var.hist_data |
+ ||+>| $wakeup_lat | | | +---------------------+
+ ||| +-------------+ | | | .var_ref_idx |
+ ||| | | | | +-----------------------+
+ ||| +-------------+ | | | var = wakeup_lat |
+ ||| . | | +-----------------------+
+ ||| . | | | .size |
+ ||| . | | +---------------------+
+ ||| +-------------+ | | | .offset |
+ ||| | | | | +---------------------+
+ ||| +-------------+ | | | .fn() |
+ ||| | | | | +---------------------+
+ ||| +-------------+ | | | .flags & FL_VAR |
+ ||| | | +---------------------+
+ ||| | | | .var.idx |
+ ||| | | +---------------------+
+ ||| | | | .var.hist_data |
+ ||| | | +---------------------+
+ ||| | | | .var_ref_idx |
+ ||| | | +---------------------+
+ ||| | | .
+ ||| | | .
+ ||| | | .
+ ||| | | .
+ ||| +--------------+ | | .
+ +-->| field_var | | | .
+ || +--------------+ | | .
+ || | var | | | .
+ || +------------+ | | .
+ || | val | | | .
+ || +--------------+ | | .
+ || | field_var | | | .
+ || +--------------+ | | .
+ || | var | | | .
+ || +------------+ | | .
+ || | val | | | .
+ || +------------+ | | .
+ || . | | .
+ || . | | .
+ || . | | +-----------------------+ <--- n_vals
+ || +--------------+ | | | key = pid |
+ || | field_var | | | +-----------------------+
+ || +--------------+ | | | .size |
+ || | var |--+| +---------------------+
+ || +------------+ ||| | .offset |
+ || | val |-+|| +---------------------+
+ || +------------+ ||| | .fn() |
+ || ||| +---------------------+
+ || ||| | .flags |
+ || ||| +---------------------+
+ || ||| | .var.idx |
+ || ||| +---------------------+ <--- n_fields
+ || |||
+ || ||| n_keys = n_fields - n_vals
+ || ||| +-----------------------+
+ || |+->| var = next_pid |
+ || | | +-----------------------+
+ || | | | .size |
+ || | | +---------------------+
+ || | | | .offset |
+ || | | +---------------------+
+ || | | | .flags & FL_VAR |
+ || | | +---------------------+
+ || | | | .var.idx |
+ || | | +---------------------+
+ || | | | .var.hist_data |
+ || | | +-----------------------+
+ || +-->| val for next_pid |
+ || | | +-----------------------+
+ || | | | .size |
+ || | | +---------------------+
+ || | | | .offset |
+ || | | +---------------------+
+ || | | | .fn() |
+ || | | +---------------------+
+ || | | | .flags |
+ || | | +---------------------+
+ || | | | |
+ || | | +---------------------+
+ || | |
+ || | |
+ || | | +-----------------------+
+ +|------------------|-|>| var_ref = $ts0 |
+ | | | +-----------------------+
+ | | | | .size |
+ | | | +---------------------+
+ | | | | .offset |
+ | | | +---------------------+
+ | | | | .fn() |
+ | | | +---------------------+
+ | | | | .flags & FL_VAR_REF |
+ | | | +---------------------+
+ | | +---| .var_ref_idx |
+ | | +-----------------------+
+ | | | var_ref = $next_pid |
+ | | +-----------------------+
+ | | | .size |
+ | | +---------------------+
+ | | | .offset |
+ | | +---------------------+
+ | | | .fn() |
+ | | +---------------------+
+ | | | .flags & FL_VAR_REF |
+ | | +---------------------+
+ | +-----| .var_ref_idx |
+ | +-----------------------+
+ | | var_ref = $wakeup_lat |
+ | +-----------------------+
+ | | .size |
+ | +---------------------+
+ | | .offset |
+ | +---------------------+
+ | | .fn() |
+ | +---------------------+
+ | | .flags & FL_VAR_REF |
+ | +---------------------+
+ +------------------------| .var_ref_idx |
+ +---------------------+
+
+As you can see, for a field variable, two hist_fields are created: one
+representing the variable, in this case next_pid, and one to actually
+get the value of the field from the trace stream, like a normal val
+field does. These are created separately from normal variable
+creation and are saved in the hist_data->field_vars[] array. See
+below for how these are used. In addition, a reference hist_field is
+also created, which is needed to reference the field variables such as
+$next_pid variable in the trace() action.
+
+Note that $wakeup_lat is also a variable reference, referencing the
+value of the expression common_timestamp-$ts0, and so also needs to
+have a hist field entry representing that reference created.
+
+When hist_trigger_elt_update() is called to get the normal key and
+value fields, it also calls update_field_vars(), which goes through
+each field_var created for the histogram, and available from
+hist_data->field_vars and calls val->fn() to get the data from the
+current trace record, and then uses the var's var.idx to set the
+variable at the var.idx offset in the appropriate tracing_map_elt's
+variable at elt->vars[var.idx].
+
+Once all the variables have been updated, resolve_var_refs() can be
+called from event_hist_trigger(), and not only can our $ts0 and
+$next_pid references be resolved but the $wakeup_lat reference as
+well. At this point, the trace() action can simply access the values
+assembled in the var_ref_vals[] array and generate the trace event.
+
+The same process occurs for the field variables associated with the
+save() action.
+
+Abbreviations used in the diagram::
+
+ hist_data = struct hist_trigger_data
+ hist_data.fields = struct hist_field
+ field_var = struct field_var
+ fn = hist_field_fn_t
+ FL_KEY = HIST_FIELD_FL_KEY
+ FL_VAR = HIST_FIELD_FL_VAR
+ FL_VAR_REF = HIST_FIELD_FL_VAR_REF
+
+trace() action field variable test
+----------------------------------
+
+This example adds to the previous test example by finally making use
+of the wakeup_lat variable, but in addition also creates a couple of
+field variables that then are all passed to the wakeup_latency() trace
+action via the onmatch() handler.
+
+First, we create the wakeup_latency synthetic event::
+
+ # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
+
+Next, the sched_waking trigger from previous examples::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+Finally, as in the previous test example, we calculate and assign the
+wakeup latency using the $ts0 reference from the sched_waking trigger
+to the wakeup_lat variable, and finally use it along with a couple
+sched_switch event fields, next_pid and next_comm, to generate a
+wakeup_latency trace event. The next_pid and next_comm event fields
+are automatically converted into field variables for this purpose::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+The sched_waking hist_debug output shows the same data as in the
+previous test example::
+
+ # cat events/sched/sched_waking/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 00000000d60ff61f
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+The sched_switch hist_debug output shows the same key and value fields
+as in the previous test example - note that wakeup_lat is still in the
+val fields section, but that the new field variables are not there -
+although the field variables are variables, they're held separately in
+the hist_data's field_vars[] array. Although the field variables and
+the normal variables are located in separate places, you can see that
+the actual variable locations for those variables in the
+tracing_map_elt.vars[] do have increasing indices as expected:
+wakeup_lat takes the var.idx = 0 slot, while the field variables for
+next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note
+also that those are the same values displayed for the variable
+references corresponding to those variables in the variable reference
+fields section. Since there are two triggers and thus two hist_data
+addresses, those addresses also need to be accounted for when doing
+the matching - you can see that the first variable refers to the 0
+var.idx on the previous hist trigger (see the hist_data address
+associated with that trigger), while the second variable refers to the
+0 var.idx on the sched_switch hist trigger, as do all the remaining
+variable references.
+
+Finally, the action tracking variables section just shows the system
+and event name for the onmatch() handler::
+
+ # cat events/sched/sched_switch/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active]
+ #
+
+ hist_data: 0000000008f551b7
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 0
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+ variable reference fields:
+
+ hist_data->var_refs[0]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 00000000d60ff61f
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->var_refs[1]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 0000000008f551b7
+ var_ref_idx (into hist_data->var_refs[]): 1
+ type: u64
+ size: 0
+ is_signed: 0
+
+ hist_data->var_refs[2]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: next_pid
+ var.idx (into tracing_map_elt.vars[]): 1
+ var.hist_data: 0000000008f551b7
+ var_ref_idx (into hist_data->var_refs[]): 2
+ type: pid_t
+ size: 4
+ is_signed: 0
+
+ hist_data->var_refs[3]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: next_comm
+ var.idx (into tracing_map_elt.vars[]): 2
+ var.hist_data: 0000000008f551b7
+ var_ref_idx (into hist_data->var_refs[]): 3
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+ field variables:
+
+ hist_data->field_vars[0]:
+
+ field_vars[0].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: next_pid
+ var.idx (into tracing_map_elt.vars[]): 1
+
+ field_vars[0].val:
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->field_vars[1]:
+
+ field_vars[1].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: next_comm
+ var.idx (into tracing_map_elt.vars[]): 2
+
+ field_vars[1].val:
+ ftrace_event_field name: next_comm
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+ action tracking variables (for onmax()/onchange()/onmatch()):
+
+ hist_data->actions[0].match_data.event_system: sched
+ hist_data->actions[0].match_data.event: sched_waking
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+ # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+ # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
+
+action_data and the trace() action
+----------------------------------
+
+As mentioned above, when the trace() action generates a synthetic
+event, all the parameters to the synthetic event either already are
+variables or are converted into variables (via field variables), and
+finally all those variable values are collected via references to them
+into a var_ref_vals[] array.
+
+The values in the var_ref_vals[] array, however, don't necessarily
+follow the same ordering as the synthetic event params. To address
+that, struct action_data contains another array, var_ref_idx[] that
+maps the trace action params to the var_ref_vals[] values. Below is a
+diagram illustrating that for the wakeup_latency() synthetic event::
+
+ +------------------+ wakeup_latency()
+ | action_data | event params var_ref_vals[]
+ +------------------+ +-----------------+ +-----------------+
+ | .var_ref_idx[] |--->| $wakeup_lat idx |---+ | |
+ +----------------+ +-----------------+ | +-----------------+
+ | .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val |
+ +----------------+ +-----------------+ | | +-----------------+
+ . | +->| $next_pid val |
+ . | +-----------------+
+ . | .
+ +-----------------+ | .
+ | | | .
+ +-----------------+ | +-----------------+
+ +--->| $wakeup_lat val |
+ +-----------------+
+
+Basically, how this ends up getting used in the synthetic event probe
+function, trace_event_raw_event_synth(), is as follows::
+
+ for each field i in .synth_event
+ val_idx = .var_ref_idx[i]
+ val = var_ref_vals[val_idx]
+
+action_data and the onXXX() handlers
+------------------------------------
+
+The hist trigger onXXX() actions other than onmatch(), such as onmax()
+and onchange(), also make use of and internally create hidden
+variables. This information is contained in the
+action_data.track_data struct, and is also visible in the hist_debug
+output as will be described in the example below.
+
+Typically, the onmax() or onchange() handlers are used in conjunction
+with the save() and snapshot() actions. For example::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
+ onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >>
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+or::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
+ onmax($wakeup_lat).snapshot()' >>
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+save() action field variable test
+---------------------------------
+
+For this example, instead of generating a synthetic event, the save()
+action is used to save field values whenever an onmax() handler
+detects that a new max latency has been hit. As in the previous
+example, the values being saved are also field values, but in this
+case, are kept in a separate hist_data array named save_vars[].
+
+As in previous test examples, we set up the sched_waking trigger::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+In this case, however, we set up the sched_switch trigger to save some
+sched_switch field values whenever we hit a new maximum latency. For
+both the onmax() handler and save() action, variables will be created,
+which we can use the hist_debug files to examine::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
+
+The sched_waking hist_debug output shows the same data as in the
+previous test examples::
+
+ # cat events/sched/sched_waking/hist_debug
+
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 00000000e6290f48
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+The output of the sched_switch trigger shows the same val and key
+values as before, but also shows a couple new sections.
+
+First, the action tracking variables section now shows the
+actions[].track_data information describing the special tracking
+variables and references used to track, in this case, the running
+maximum value. The actions[].track_data.var_ref member contains the
+reference to the variable being tracked, in this case the $wakeup_lat
+variable. In order to perform the onmax() handler function, there
+also needs to be a variable that tracks the current maximum by getting
+updated whenever a new maximum is hit. In this case, we can see that
+an auto-generated variable named ' __max' has been created and is
+visible in the actions[].track_data.track_var variable.
+
+Finally, in the new 'save action variables' section, we can see that
+the 4 params to the save() function have resulted in 4 field variables
+being created for the purposes of saving the values of the named
+fields when the max is hit. These variables are kept in a separate
+save_vars[] array off of hist_data, so are displayed in a separate
+section::
+
+ # cat events/sched/sched_switch/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active]
+ #
+
+ hist_data: 0000000057bcd28d
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 0
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+ variable reference fields:
+
+ hist_data->var_refs[0]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 00000000e6290f48
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->var_refs[1]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 0000000057bcd28d
+ var_ref_idx (into hist_data->var_refs[]): 1
+ type: u64
+ size: 0
+ is_signed: 0
+
+ action tracking variables (for onmax()/onchange()/onmatch()):
+
+ hist_data->actions[0].track_data.var_ref:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 0000000057bcd28d
+ var_ref_idx (into hist_data->var_refs[]): 1
+ type: u64
+ size: 0
+ is_signed: 0
+
+ hist_data->actions[0].track_data.track_var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: __max
+ var.idx (into tracing_map_elt.vars[]): 1
+ type: u64
+ size: 8
+ is_signed: 0
+
+ save action variables (save() params):
+
+ hist_data->save_vars[0]:
+
+ save_vars[0].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: next_comm
+ var.idx (into tracing_map_elt.vars[]): 2
+
+ save_vars[0].val:
+ ftrace_event_field name: next_comm
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+ hist_data->save_vars[1]:
+
+ save_vars[1].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: prev_pid
+ var.idx (into tracing_map_elt.vars[]): 3
+
+ save_vars[1].val:
+ ftrace_event_field name: prev_pid
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->save_vars[2]:
+
+ save_vars[2].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: prev_prio
+ var.idx (into tracing_map_elt.vars[]): 4
+
+ save_vars[2].val:
+ ftrace_event_field name: prev_prio
+ type: int
+ size: 4
+ is_signed: 1
+
+ hist_data->save_vars[3]:
+
+ save_vars[3].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: prev_comm
+ var.idx (into tracing_map_elt.vars[]): 5
+
+ save_vars[3].val:
+ ftrace_event_field name: prev_comm
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
+
+ # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+A couple special cases
+======================
+
+While the above covers the basics of the histogram internals, there
+are a couple of special cases that should be discussed, since they
+tend to create even more confusion. Those are field variables on other
+histograms, and aliases, both described below through example tests
+using the hist_debug files.
+
+Test of field variables on other histograms
+-------------------------------------------
+
+This example is similar to the previous examples, but in this case,
+the sched_switch trigger references a hist trigger field on another
+event, namely the sched_waking event. In order to accomplish this, a
+field variable is created for the other event, but since an existing
+histogram can't be used, as existing histograms are immutable, a new
+histogram with a matching variable is created and used, and we'll see
+that reflected in the hist_debug output shown below.
+
+First, we create the wakeup_latency synthetic event. Note the
+addition of the prio field::
+
+ # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
+
+As in previous test examples, we set up the sched_waking trigger::
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+Here we set up a hist trigger on sched_switch to send a wakeup_latency
+event using an onmatch handler naming the sched_waking event. Note
+that the third param being passed to the wakeup_latency() is prio,
+which is a field name that needs to have a field variable created for
+it. There isn't however any prio field on the sched_switch event so
+it would seem that it wouldn't be possible to create a field variable
+for it. The matching sched_waking event does have a prio field, so it
+should be possible to make use of it for this purpose. The problem
+with that is that it's not currently possible to define a new variable
+on an existing histogram, so it's not possible to add a new prio field
+variable to the existing sched_waking histogram. It is however
+possible to create an additional new 'matching' sched_waking histogram
+for the same event, meaning that it uses the same key and filters, and
+define the new prio field variable on that.
+
+Here's the sched_switch trigger::
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
+
+And here's the output of the hist_debug information for the
+sched_waking hist trigger. Note that there are two histograms
+displayed in the output: the first is the normal sched_waking
+histogram we've seen in the previous examples, and the second is the
+special histogram we created to provide the prio field variable.
+
+Looking at the second histogram below, we see a variable with the name
+synthetic_prio. This is the field variable created for the prio field
+on that sched_waking histogram::
+
+ # cat events/sched/sched_waking/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 00000000349570e4
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+
+ # event histogram
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active]
+ #
+
+ hist_data: 000000006920cf38
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ ftrace_event_field name: prio
+ var.name: synthetic_prio
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: int
+ size: 4
+ is_signed: 1
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+Looking at the sched_switch histogram below, we can see a reference to
+the synthetic_prio variable on sched_waking, and looking at the
+associated hist_data address we see that it is indeed associated with
+the new histogram. Note also that the other references are to a
+normal variable, wakeup_lat, and to a normal field variable, next_pid,
+the details of which are in the field variables section::
+
+ # cat events/sched/sched_switch/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active]
+ #
+
+ hist_data: 00000000a73b67df
+
+ n_vals: 2
+ n_keys: 1
+ n_fields: 3
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: u64
+ size: 0
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+ variable reference fields:
+
+ hist_data->var_refs[0]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: ts0
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 00000000349570e4
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->var_refs[1]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 00000000a73b67df
+ var_ref_idx (into hist_data->var_refs[]): 1
+ type: u64
+ size: 0
+ is_signed: 0
+
+ hist_data->var_refs[2]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: next_pid
+ var.idx (into tracing_map_elt.vars[]): 1
+ var.hist_data: 00000000a73b67df
+ var_ref_idx (into hist_data->var_refs[]): 2
+ type: pid_t
+ size: 4
+ is_signed: 0
+
+ hist_data->var_refs[3]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: synthetic_prio
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 000000006920cf38
+ var_ref_idx (into hist_data->var_refs[]): 3
+ type: int
+ size: 4
+ is_signed: 1
+
+ field variables:
+
+ hist_data->field_vars[0]:
+
+ field_vars[0].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: next_pid
+ var.idx (into tracing_map_elt.vars[]): 1
+
+ field_vars[0].val:
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ action tracking variables (for onmax()/onchange()/onmatch()):
+
+ hist_data->actions[0].match_data.event_system: sched
+ hist_data->actions[0].match_data.event: sched_waking
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
+
+ # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+ # echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
+
+Alias test
+----------
+
+This example is very similar to previous examples, but demonstrates
+the alias flag.
+
+First, we create the wakeup_latency synthetic event::
+
+ # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
+
+Next, we create a sched_waking trigger similar to previous examples,
+but in this case we save the pid in the waking_pid variable::
+
+ # echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+For the sched_switch trigger, instead of using $waking_pid directly in
+the wakeup_latency synthetic event invocation, we create an alias of
+$waking_pid named $woken_pid, and use that in the synthetic event
+invocation instead::
+
+ # echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
+
+Looking at the sched_waking hist_debug output, in addition to the
+normal fields, we can see the waking_pid variable::
+
+ # cat events/sched/sched_waking/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
+ #
+
+ hist_data: 00000000a250528c
+
+ n_vals: 3
+ n_keys: 1
+ n_fields: 4
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ ftrace_event_field name: pid
+ var.name: waking_pid
+ var.idx (into tracing_map_elt.vars[]): 0
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: ts0
+ var.idx (into tracing_map_elt.vars[]): 1
+ type: u64
+ size: 8
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[3]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+The sched_switch hist_debug output shows that a variable named
+woken_pid has been created but that it also has the
+HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag
+set, which is why it appears in the val field section.
+
+Despite that implementation detail, an alias variable is actually more
+like a variable reference; in fact it can be thought of as a reference
+to a reference. The implementation copies the var_ref->fn() from the
+variable reference being referenced, in this case, the waking_pid
+fn(), which is hist_field_var_ref() and makes that the fn() of the
+alias. The hist_field_var_ref() fn() requires the var_ref_idx of the
+variable reference it's using, so waking_pid's var_ref_idx is also
+copied to the alias. The end result is that when the value of alias
+is retrieved, in the end it just does the same thing the original
+reference would have done and retrieves the same value from the
+var_ref_vals[] array. You can verify this in the output by noting
+that the var_ref_idx of the alias, in this case woken_pid, is the same
+as the var_ref_idx of the reference, waking_pid, in the variable
+reference fields section.
+
+Additionally, once it gets that value, since it is also a variable, it
+then saves that value into its var.idx. So the var.idx of the
+woken_pid alias is 0, which it fills with the value from var_ref_idx 0
+when its fn() is called to update itself. You'll also notice that
+there's a woken_pid var_ref in the variable refs section. That is the
+reference to the woken_pid alias variable, and you can see that it
+retrieves the value from the same var.idx as the woken_pid alias, 0,
+and then in turn saves that value in its own var_ref_idx slot, 3, and
+the value at this position is finally what gets assigned to the
+$woken_pid slot in the trace event invocation::
+
+ # cat events/sched/sched_switch/hist_debug
+
+ # event histogram
+ #
+ # trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active]
+ #
+
+ hist_data: 0000000055d65ed0
+
+ n_vals: 3
+ n_keys: 1
+ n_fields: 4
+
+ val fields:
+
+ hist_data->fields[0]:
+ flags:
+ VAL: HIST_FIELD_FL_HITCOUNT
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->fields[1]:
+ flags:
+ HIST_FIELD_FL_VAR
+ HIST_FIELD_FL_ALIAS
+ var.name: woken_pid
+ var.idx (into tracing_map_elt.vars[]): 0
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->fields[2]:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 1
+ type: u64
+ size: 0
+ is_signed: 0
+
+ key fields:
+
+ hist_data->fields[3]:
+ flags:
+ HIST_FIELD_FL_KEY
+ ftrace_event_field name: next_pid
+ type: pid_t
+ size: 8
+ is_signed: 1
+
+ variable reference fields:
+
+ hist_data->var_refs[0]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: waking_pid
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 00000000a250528c
+ var_ref_idx (into hist_data->var_refs[]): 0
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->var_refs[1]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: ts0
+ var.idx (into tracing_map_elt.vars[]): 1
+ var.hist_data: 00000000a250528c
+ var_ref_idx (into hist_data->var_refs[]): 1
+ type: u64
+ size: 8
+ is_signed: 0
+
+ hist_data->var_refs[2]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: wakeup_lat
+ var.idx (into tracing_map_elt.vars[]): 1
+ var.hist_data: 0000000055d65ed0
+ var_ref_idx (into hist_data->var_refs[]): 2
+ type: u64
+ size: 0
+ is_signed: 0
+
+ hist_data->var_refs[3]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: woken_pid
+ var.idx (into tracing_map_elt.vars[]): 0
+ var.hist_data: 0000000055d65ed0
+ var_ref_idx (into hist_data->var_refs[]): 3
+ type: pid_t
+ size: 4
+ is_signed: 1
+
+ hist_data->var_refs[4]:
+ flags:
+ HIST_FIELD_FL_VAR_REF
+ name: next_comm
+ var.idx (into tracing_map_elt.vars[]): 2
+ var.hist_data: 0000000055d65ed0
+ var_ref_idx (into hist_data->var_refs[]): 4
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+ field variables:
+
+ hist_data->field_vars[0]:
+
+ field_vars[0].var:
+ flags:
+ HIST_FIELD_FL_VAR
+ var.name: next_comm
+ var.idx (into tracing_map_elt.vars[]): 2
+
+ field_vars[0].val:
+ ftrace_event_field name: next_comm
+ type: char[16]
+ size: 256
+ is_signed: 0
+
+ action tracking variables (for onmax()/onchange()/onmatch()):
+
+ hist_data->actions[0].match_data.event_system: sched
+ hist_data->actions[0].match_data.event: sched_waking
+
+The commands below can be used to clean things up for the next test::
+
+ # echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
+
+ # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
+
+ # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 24876faac75324..a4020c0b4508c9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -249,15 +249,6 @@ config TRACE_PREEMPT_TOGGLE
Enables hooks which will be called when preemption is first disabled,
and last enabled.
-config PREEMPTIRQ_EVENTS
- bool "Enable trace events for preempt and irq disable/enable"
- select TRACE_IRQFLAGS
- select TRACE_PREEMPT_TOGGLE if PREEMPTION
- select GENERIC_TRACER
- default n
- help
- Enable tracing of disable and enable events for preemption and irqs.
-
config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer"
default n
@@ -614,12 +605,30 @@ config TRACING_MAP
generally used outside of that context, and is normally
selected by tracers that use it.
+config SYNTH_EVENTS
+ bool "Synthetic trace events"
+ select TRACING
+ select DYNAMIC_EVENTS
+ default n
+ help
+ Synthetic events are user-defined trace events that can be
+ used to combine data from other trace events or in fact any
+ data source. Synthetic events can be generated indirectly
+ via the trace() action of histogram triggers or directly
+ by way of an in-kernel API.
+
+ See Documentation/trace/events.rst or
+ Documentation/trace/histogram.rst for details and examples.
+
+ If in doubt, say N.
+
config HIST_TRIGGERS
bool "Histogram triggers"
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP
select TRACING
select DYNAMIC_EVENTS
+ select SYNTH_EVENTS
default n
help
Hist triggers allow one or more arbitrary trace event fields
@@ -815,7 +824,7 @@ config PREEMPTIRQ_DELAY_TEST
config SYNTH_EVENT_GEN_TEST
tristate "Test module for in-kernel synthetic event generation"
- depends on HIST_TRIGGERS
+ depends on SYNTH_EVENTS
help
This option creates a test module to check the base
functionality of in-kernel synthetic event definition and
@@ -838,6 +847,29 @@ config KPROBE_EVENT_GEN_TEST
If unsure, say N.
+config HIST_TRIGGERS_DEBUG
+ bool "Hist trigger debug support"
+ depends on HIST_TRIGGERS
+ help
+ Add "hist_debug" file for each event, which when read will
+ dump out a bunch of internal details about the hist triggers
+ defined on that event.
+
+ The hist_debug file serves a couple of purposes:
+
+ - Helps developers verify that nothing is broken.
+
+ - Provides educational information to support the details
+ of the hist trigger internals as described by
+ Documentation/trace/histogram-design.rst.
+
+ The hist_debug output only covers the data structures
+ related to the histogram definitions themselves and doesn't
+ display the internals of map buckets or variable values of
+ running histograms.
+
+ If unsure, say N.
+
endif # FTRACE
endif # TRACING_SUPPORT
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f9dcd19165fa2c..1d8aaa54641230 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -72,6 +72,7 @@ endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
obj-$(CONFIG_TRACE_EVENT_INJECT) += trace_events_inject.o
+obj-$(CONFIG_SYNTH_EVENTS) += trace_events_synth.o
obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7d0ebd10470695..c163c3531fafc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2016,14 +2016,14 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
{
unsigned long ip = rec ? rec->ip : 0;
+ pr_info("------------[ ftrace bug ]------------\n");
+
switch (failed) {
case -EFAULT:
- FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on modifying ");
print_ip_sym(KERN_INFO, ip);
break;
case -EINVAL:
- FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace failed to modify ");
print_ip_sym(KERN_INFO, ip);
print_ip_ins(" actual: ", (unsigned char *)ip);
@@ -2034,12 +2034,10 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
}
break;
case -EPERM:
- FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on writing ");
print_ip_sym(KERN_INFO, ip);
break;
default:
- FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on unknown error ");
print_ip_sym(KERN_INFO, ip);
}
@@ -2066,6 +2064,8 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
ip = ftrace_get_addr_curr(rec);
pr_cont("\n expected tramp: %lx\n", ip);
}
+
+ FTRACE_WARN_ON_ONCE(1);
}
static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 3ab27022c20f08..ec44b0e2a19c3b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1299,8 +1299,11 @@ EXPORT_SYMBOL_GPL(tracing_off);
void disable_trace_on_warning(void)
{
- if (__disable_trace_on_warning)
+ if (__disable_trace_on_warning) {
+ trace_array_printk_buf(global_trace.array_buffer.buffer, _THIS_IP_,
+ "Disabling tracing due to warning\n");
tracing_off();
+ }
}
/**
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4eb1d004d5f2b2..def769df5bf12e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1661,6 +1661,7 @@ extern struct list_head ftrace_events;
extern const struct file_operations event_trigger_fops;
extern const struct file_operations event_hist_fops;
+extern const struct file_operations event_hist_debug_fops;
extern const struct file_operations event_inject_fops;
#ifdef CONFIG_HIST_TRIGGERS
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 242f59e7f17d5d..f6f55682d3e2d0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2209,6 +2209,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
trace_create_file("hist", 0444, file->dir, file,
&event_hist_fops);
#endif
+#ifdef CONFIG_HIST_TRIGGERS_DEBUG
+ trace_create_file("hist_debug", 0444, file->dir, file,
+ &event_hist_debug_fops);
+#endif
trace_create_file("format", 0444, file->dir, call,
&ftrace_event_format_fops);
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index fcab11cc6833b2..0b933546142e8b 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -19,13 +19,7 @@
#include <trace/events/mmflags.h>
#include "tracing_map.h"
-#include "trace.h"
-#include "trace_dynevent.h"
-
-#define SYNTH_SYSTEM "synthetic"
-#define SYNTH_FIELDS_MAX 32
-
-#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
+#include "trace_synth.h"
#define ERRORS \
C(NONE, "No error"), \
@@ -380,69 +374,6 @@ struct hist_trigger_data {
unsigned int n_save_var_str;
};
-static int create_synth_event(int argc, const char **argv);
-static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
-static int synth_event_release(struct dyn_event *ev);
-static bool synth_event_is_busy(struct dyn_event *ev);
-static bool synth_event_match(const char *system, const char *event,
- int argc, const char **argv, struct dyn_event *ev);
-
-static struct dyn_event_operations synth_event_ops = {
- .create = create_synth_event,
- .show = synth_event_show,
- .is_busy = synth_event_is_busy,
- .free = synth_event_release,
- .match = synth_event_match,
-};
-
-struct synth_field {
- char *type;
- char *name;
- size_t size;
- unsigned int offset;
- bool is_signed;
- bool is_string;
-};
-
-struct synth_event {
- struct dyn_event devent;
- int ref;
- char *name;
- struct synth_field **fields;
- unsigned int n_fields;
- unsigned int n_u64;
- struct trace_event_class class;
- struct trace_event_call call;
- struct tracepoint *tp;
- struct module *mod;
-};
-
-static bool is_synth_event(struct dyn_event *ev)
-{
- return ev->ops == &synth_event_ops;
-}
-
-static struct synth_event *to_synth_event(struct dyn_event *ev)
-{
- return container_of(ev, struct synth_event, devent);
-}
-
-static bool synth_event_is_busy(struct dyn_event *ev)
-{
- struct synth_event *event = to_synth_event(ev);
-
- return event->ref != 0;
-}
-
-static bool synth_event_match(const char *system, const char *event,
- int argc, const char **argv, struct dyn_event *ev)
-{
- struct synth_event *sev = to_synth_event(ev);
-
- return strcmp(sev->name, event) == 0 &&
- (!system || strcmp(system, SYNTH_SYSTEM) == 0);
-}
-
struct action_data;
typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -589,6 +520,7 @@ static struct track_data *track_data_alloc(unsigned int key_len,
track_data_free(data);
return ERR_PTR(-ENOMEM);
}
+
data->elt.private_data = elt_data;
elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL);
@@ -621,7 +553,6 @@ static void last_cmd_set(struct trace_event_file *file, char *str)
if (file) {
call = file->event_call;
-
system = call->class->system;
if (system) {
name = trace_event_name(call);
@@ -646,510 +577,6 @@ static void hist_err_clear(void)
last_cmd_loc[0] = '\0';
}
-struct synth_trace_event {
- struct trace_entry ent;
- u64 fields[];
-};
-
-static int synth_event_define_fields(struct trace_event_call *call)
-{
- struct synth_trace_event trace;
- int offset = offsetof(typeof(trace), fields);
- struct synth_event *event = call->data;
- unsigned int i, size, n_u64;
- char *name, *type;
- bool is_signed;
- int ret = 0;
-
- for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
- size = event->fields[i]->size;
- is_signed = event->fields[i]->is_signed;
- type = event->fields[i]->type;
- name = event->fields[i]->name;
- ret = trace_define_field(call, type, name, offset, size,
- is_signed, FILTER_OTHER);
- if (ret)
- break;
-
- event->fields[i]->offset = n_u64;
-
- if (event->fields[i]->is_string) {
- offset += STR_VAR_LEN_MAX;
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
- } else {
- offset += sizeof(u64);
- n_u64++;
- }
- }
-
- event->n_u64 = n_u64;
-
- return ret;
-}
-
-static bool synth_field_signed(char *type)
-{
- if (str_has_prefix(type, "u"))
- return false;
- if (strcmp(type, "gfp_t") == 0)
- return false;
-
- return true;
-}
-
-static int synth_field_is_string(char *type)
-{
- if (strstr(type, "char[") != NULL)
- return true;
-
- return false;
-}
-
-static int synth_field_string_size(char *type)
-{
- char buf[4], *end, *start;
- unsigned int len;
- int size, err;
-
- start = strstr(type, "char[");
- if (start == NULL)
- return -EINVAL;
- start += sizeof("char[") - 1;
-
- end = strchr(type, ']');
- if (!end || end < start)
- return -EINVAL;
-
- len = end - start;
- if (len > 3)
- return -EINVAL;
-
- strncpy(buf, start, len);
- buf[len] = '\0';
-
- err = kstrtouint(buf, 0, &size);
- if (err)
- return err;
-
- if (size > STR_VAR_LEN_MAX)
- return -EINVAL;
-
- return size;
-}
-
-static int synth_field_size(char *type)
-{
- int size = 0;
-
- if (strcmp(type, "s64") == 0)
- size = sizeof(s64);
- else if (strcmp(type, "u64") == 0)
- size = sizeof(u64);
- else if (strcmp(type, "s32") == 0)
- size = sizeof(s32);
- else if (strcmp(type, "u32") == 0)
- size = sizeof(u32);
- else if (strcmp(type, "s16") == 0)
- size = sizeof(s16);
- else if (strcmp(type, "u16") == 0)
- size = sizeof(u16);
- else if (strcmp(type, "s8") == 0)
- size = sizeof(s8);
- else if (strcmp(type, "u8") == 0)
- size = sizeof(u8);
- else if (strcmp(type, "char") == 0)
- size = sizeof(char);
- else if (strcmp(type, "unsigned char") == 0)
- size = sizeof(unsigned char);
- else if (strcmp(type, "int") == 0)
- size = sizeof(int);
- else if (strcmp(type, "unsigned int") == 0)
- size = sizeof(unsigned int);
- else if (strcmp(type, "long") == 0)
- size = sizeof(long);
- else if (strcmp(type, "unsigned long") == 0)
- size = sizeof(unsigned long);
- else if (strcmp(type, "pid_t") == 0)
- size = sizeof(pid_t);
- else if (strcmp(type, "gfp_t") == 0)
- size = sizeof(gfp_t);
- else if (synth_field_is_string(type))
- size = synth_field_string_size(type);
-
- return size;
-}
-
-static const char *synth_field_fmt(char *type)
-{
- const char *fmt = "%llu";
-
- if (strcmp(type, "s64") == 0)
- fmt = "%lld";
- else if (strcmp(type, "u64") == 0)
- fmt = "%llu";
- else if (strcmp(type, "s32") == 0)
- fmt = "%d";
- else if (strcmp(type, "u32") == 0)
- fmt = "%u";
- else if (strcmp(type, "s16") == 0)
- fmt = "%d";
- else if (strcmp(type, "u16") == 0)
- fmt = "%u";
- else if (strcmp(type, "s8") == 0)
- fmt = "%d";
- else if (strcmp(type, "u8") == 0)
- fmt = "%u";
- else if (strcmp(type, "char") == 0)
- fmt = "%d";
- else if (strcmp(type, "unsigned char") == 0)
- fmt = "%u";
- else if (strcmp(type, "int") == 0)
- fmt = "%d";
- else if (strcmp(type, "unsigned int") == 0)
- fmt = "%u";
- else if (strcmp(type, "long") == 0)
- fmt = "%ld";
- else if (strcmp(type, "unsigned long") == 0)
- fmt = "%lu";
- else if (strcmp(type, "pid_t") == 0)
- fmt = "%d";
- else if (strcmp(type, "gfp_t") == 0)
- fmt = "%x";
- else if (synth_field_is_string(type))
- fmt = "%s";
-
- return fmt;
-}
-
-static void print_synth_event_num_val(struct trace_seq *s,
- char *print_fmt, char *name,
- int size, u64 val, char *space)
-{
- switch (size) {
- case 1:
- trace_seq_printf(s, print_fmt, name, (u8)val, space);
- break;
-
- case 2:
- trace_seq_printf(s, print_fmt, name, (u16)val, space);
- break;
-
- case 4:
- trace_seq_printf(s, print_fmt, name, (u32)val, space);
- break;
-
- default:
- trace_seq_printf(s, print_fmt, name, val, space);
- break;
- }
-}
-
-static enum print_line_t print_synth_event(struct trace_iterator *iter,
- int flags,
- struct trace_event *event)
-{
- struct trace_array *tr = iter->tr;
- struct trace_seq *s = &iter->seq;
- struct synth_trace_event *entry;
- struct synth_event *se;
- unsigned int i, n_u64;
- char print_fmt[32];
- const char *fmt;
-
- entry = (struct synth_trace_event *)iter->ent;
- se = container_of(event, struct synth_event, call.event);
-
- trace_seq_printf(s, "%s: ", se->name);
-
- for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
- if (trace_seq_has_overflowed(s))
- goto end;
-
- fmt = synth_field_fmt(se->fields[i]->type);
-
- /* parameter types */
- if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
- trace_seq_printf(s, "%s ", fmt);
-
- snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);
-
- /* parameter values */
- if (se->fields[i]->is_string) {
- trace_seq_printf(s, print_fmt, se->fields[i]->name,
- (char *)&entry->fields[n_u64],
- i == se->n_fields - 1 ? "" : " ");
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
- } else {
- struct trace_print_flags __flags[] = {
- __def_gfpflag_names, {-1, NULL} };
- char *space = (i == se->n_fields - 1 ? "" : " ");
-
- print_synth_event_num_val(s, print_fmt,
- se->fields[i]->name,
- se->fields[i]->size,
- entry->fields[n_u64],
- space);
-
- if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
- trace_seq_puts(s, " (");
- trace_print_flags_seq(s, "|",
- entry->fields[n_u64],
- __flags);
- trace_seq_putc(s, ')');
- }
- n_u64++;
- }
- }
-end:
- trace_seq_putc(s, '\n');
-
- return trace_handle_return(s);
-}
-
-static struct trace_event_functions synth_event_funcs = {
- .trace = print_synth_event
-};
-
-static notrace void trace_event_raw_event_synth(void *__data,
- u64 *var_ref_vals,
- unsigned int *var_ref_idx)
-{
- struct trace_event_file *trace_file = __data;
- struct synth_trace_event *entry;
- struct trace_event_buffer fbuffer;
- struct trace_buffer *buffer;
- struct synth_event *event;
- unsigned int i, n_u64, val_idx;
- int fields_size = 0;
-
- event = trace_file->event_call->data;
-
- if (trace_trigger_soft_disabled(trace_file))
- return;
-
- fields_size = event->n_u64 * sizeof(u64);
-
- /*
- * Avoid ring buffer recursion detection, as this event
- * is being performed within another event.
- */
- buffer = trace_file->tr->array_buffer.buffer;
- ring_buffer_nest_start(buffer);
-
- entry = trace_event_buffer_reserve(&fbuffer, trace_file,
- sizeof(*entry) + fields_size);
- if (!entry)
- goto out;
-
- for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
- val_idx = var_ref_idx[i];
- if (event->fields[i]->is_string) {
- char *str_val = (char *)(long)var_ref_vals[val_idx];
- char *str_field = (char *)&entry->fields[n_u64];
-
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
- } else {
- struct synth_field *field = event->fields[i];
- u64 val = var_ref_vals[val_idx];
-
- switch (field->size) {
- case 1:
- *(u8 *)&entry->fields[n_u64] = (u8)val;
- break;
-
- case 2:
- *(u16 *)&entry->fields[n_u64] = (u16)val;
- break;
-
- case 4:
- *(u32 *)&entry->fields[n_u64] = (u32)val;
- break;
-
- default:
- entry->fields[n_u64] = val;
- break;
- }
- n_u64++;
- }
- }
-
- trace_event_buffer_commit(&fbuffer);
-out:
- ring_buffer_nest_end(buffer);
-}
-
-static void free_synth_event_print_fmt(struct trace_event_call *call)
-{
- if (call) {
- kfree(call->print_fmt);
- call->print_fmt = NULL;
- }
-}
-
-static int __set_synth_event_print_fmt(struct synth_event *event,
- char *buf, int len)
-{
- const char *fmt;
- int pos = 0;
- int i;
-
- /* When len=0, we just calculate the needed length */
-#define LEN_OR_ZERO (len ? len - pos : 0)
-
- pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
- for (i = 0; i < event->n_fields; i++) {
- fmt = synth_field_fmt(event->fields[i]->type);
- pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
- event->fields[i]->name, fmt,
- i == event->n_fields - 1 ? "" : ", ");
- }
- pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
-
- for (i = 0; i < event->n_fields; i++) {
- pos += snprintf(buf + pos, LEN_OR_ZERO,
- ", REC->%s", event->fields[i]->name);
- }
-
-#undef LEN_OR_ZERO
-
- /* return the length of print_fmt */
- return pos;
-}
-
-static int set_synth_event_print_fmt(struct trace_event_call *call)
-{
- struct synth_event *event = call->data;
- char *print_fmt;
- int len;
-
- /* First: called with 0 length to calculate the needed length */
- len = __set_synth_event_print_fmt(event, NULL, 0);
-
- print_fmt = kmalloc(len + 1, GFP_KERNEL);
- if (!print_fmt)
- return -ENOMEM;
-
- /* Second: actually write the @print_fmt */
- __set_synth_event_print_fmt(event, print_fmt, len + 1);
- call->print_fmt = print_fmt;
-
- return 0;
-}
-
-static void free_synth_field(struct synth_field *field)
-{
- kfree(field->type);
- kfree(field->name);
- kfree(field);
-}
-
-static struct synth_field *parse_synth_field(int argc, const char **argv,
- int *consumed)
-{
- struct synth_field *field;
- const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
- int len, ret = 0;
-
- if (field_type[0] == ';')
- field_type++;
-
- if (!strcmp(field_type, "unsigned")) {
- if (argc < 3)
- return ERR_PTR(-EINVAL);
- prefix = "unsigned ";
- field_type = argv[1];
- field_name = argv[2];
- *consumed = 3;
- } else {
- field_name = argv[1];
- *consumed = 2;
- }
-
- field = kzalloc(sizeof(*field), GFP_KERNEL);
- if (!field)
- return ERR_PTR(-ENOMEM);
-
- len = strlen(field_name);
- array = strchr(field_name, '[');
- if (array)
- len -= strlen(array);
- else if (field_name[len - 1] == ';')
- len--;
-
- field->name = kmemdup_nul(field_name, len, GFP_KERNEL);
- if (!field->name) {
- ret = -ENOMEM;
- goto free;
- }
-
- if (field_type[0] == ';')
- field_type++;
- len = strlen(field_type) + 1;
- if (array)
- len += strlen(array);
- if (prefix)
- len += strlen(prefix);
-
- field->type = kzalloc(len, GFP_KERNEL);
- if (!field->type) {
- ret = -ENOMEM;
- goto free;
- }
- if (prefix)
- strcat(field->type, prefix);
- strcat(field->type, field_type);
- if (array) {
- strcat(field->type, array);
- if (field->type[len - 1] == ';')
- field->type[len - 1] = '\0';
- }
-
- field->size = synth_field_size(field->type);
- if (!field->size) {
- ret = -EINVAL;
- goto free;
- }
-
- if (synth_field_is_string(field->type))
- field->is_string = true;
-
- field->is_signed = synth_field_signed(field->type);
-
- out:
- return field;
- free:
- free_synth_field(field);
- field = ERR_PTR(ret);
- goto out;
-}
-
-static void free_synth_tracepoint(struct tracepoint *tp)
-{
- if (!tp)
- return;
-
- kfree(tp->name);
- kfree(tp);
-}
-
-static struct tracepoint *alloc_synth_tracepoint(char *name)
-{
- struct tracepoint *tp;
-
- tp = kzalloc(sizeof(*tp), GFP_KERNEL);
- if (!tp)
- return ERR_PTR(-ENOMEM);
-
- tp->name = kstrdup(name, GFP_KERNEL);
- if (!tp->name) {
- kfree(tp);
- return ERR_PTR(-ENOMEM);
- }
-
- return tp;
-}
-
typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
unsigned int *var_ref_idx);
@@ -1177,145 +604,6 @@ static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
}
}
-static struct synth_event *find_synth_event(const char *name)
-{
- struct dyn_event *pos;
- struct synth_event *event;
-
- for_each_dyn_event(pos) {
- if (!is_synth_event(pos))
- continue;
- event = to_synth_event(pos);
- if (strcmp(event->name, name) == 0)
- return event;
- }
-
- return NULL;
-}
-
-static struct trace_event_fields synth_event_fields_array[] = {
- { .type = TRACE_FUNCTION_TYPE,
- .define_fields = synth_event_define_fields },
- {}
-};
-
-static int register_synth_event(struct synth_event *event)
-{
- struct trace_event_call *call = &event->call;
- int ret = 0;
-
- event->call.class = &event->class;
- event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
- if (!event->class.system) {
- ret = -ENOMEM;
- goto out;
- }
-
- event->tp = alloc_synth_tracepoint(event->name);
- if (IS_ERR(event->tp)) {
- ret = PTR_ERR(event->tp);
- event->tp = NULL;
- goto out;
- }
-
- INIT_LIST_HEAD(&call->class->fields);
- call->event.funcs = &synth_event_funcs;
- call->class->fields_array = synth_event_fields_array;
-
- ret = register_trace_event(&call->event);
- if (!ret) {
- ret = -ENODEV;
- goto out;
- }
- call->flags = TRACE_EVENT_FL_TRACEPOINT;
- call->class->reg = trace_event_reg;
- call->class->probe = trace_event_raw_event_synth;
- call->data = event;
- call->tp = event->tp;
-
- ret = trace_add_event_call(call);
- if (ret) {
- pr_warn("Failed to register synthetic event: %s\n",
- trace_event_name(call));
- goto err;
- }
-
- ret = set_synth_event_print_fmt(call);
- if (ret < 0) {
- trace_remove_event_call(call);
- goto err;
- }
- out:
- return ret;
- err:
- unregister_trace_event(&call->event);
- goto out;
-}
-
-static int unregister_synth_event(struct synth_event *event)
-{
- struct trace_event_call *call = &event->call;
- int ret;
-
- ret = trace_remove_event_call(call);
-
- return ret;
-}
-
-static void free_synth_event(struct synth_event *event)
-{
- unsigned int i;
-
- if (!event)
- return;
-
- for (i = 0; i < event->n_fields; i++)
- free_synth_field(event->fields[i]);
-
- kfree(event->fields);
- kfree(event->name);
- kfree(event->class.system);
- free_synth_tracepoint(event->tp);
- free_synth_event_print_fmt(&event->call);
- kfree(event);
-}
-
-static struct synth_event *alloc_synth_event(const char *name, int n_fields,
- struct synth_field **fields)
-{
- struct synth_event *event;
- unsigned int i;
-
- event = kzalloc(sizeof(*event), GFP_KERNEL);
- if (!event) {
- event = ERR_PTR(-ENOMEM);
- goto out;
- }
-
- event->name = kstrdup(name, GFP_KERNEL);
- if (!event->name) {
- kfree(event);
- event = ERR_PTR(-ENOMEM);
- goto out;
- }
-
- event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL);
- if (!event->fields) {
- free_synth_event(event);
- event = ERR_PTR(-ENOMEM);
- goto out;
- }
-
- dyn_event_init(&event->devent, &synth_event_ops);
-
- for (i = 0; i < n_fields; i++)
- event->fields[i] = fields[i];
-
- event->n_fields = n_fields;
- out:
- return event;
-}
-
static void action_trace(struct hist_trigger_data *hist_data,
struct tracing_map_elt *elt, void *rec,
struct ring_buffer_event *rbe, void *key,
@@ -1331,1056 +619,6 @@ struct hist_var_data {
struct hist_trigger_data *hist_data;
};
-static int synth_event_check_arg_fn(void *data)
-{
- struct dynevent_arg_pair *arg_pair = data;
- int size;
-
- size = synth_field_size((char *)arg_pair->lhs);
-
- return size ? 0 : -EINVAL;
-}
-
-/**
- * synth_event_add_field - Add a new field to a synthetic event cmd
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @type: The type of the new field to add
- * @name: The name of the new field to add
- *
- * Add a new field to a synthetic event cmd object. Field ordering is in
- * the same order the fields are added.
- *
- * See synth_field_size() for available types. If field_name contains
- * [n] the field is considered to be an array.
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_add_field(struct dynevent_cmd *cmd, const char *type,
- const char *name)
-{
- struct dynevent_arg_pair arg_pair;
- int ret;
-
- if (cmd->type != DYNEVENT_TYPE_SYNTH)
- return -EINVAL;
-
- if (!type || !name)
- return -EINVAL;
-
- dynevent_arg_pair_init(&arg_pair, 0, ';');
-
- arg_pair.lhs = type;
- arg_pair.rhs = name;
-
- ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn);
- if (ret)
- return ret;
-
- if (++cmd->n_fields > SYNTH_FIELDS_MAX)
- ret = -EINVAL;
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_add_field);
-
-/**
- * synth_event_add_field_str - Add a new field to a synthetic event cmd
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @type_name: The type and name of the new field to add, as a single string
- *
- * Add a new field to a synthetic event cmd object, as a single
- * string. The @type_name string is expected to be of the form 'type
- * name', which will be appended by ';'. No sanity checking is done -
- * what's passed in is assumed to already be well-formed. Field
- * ordering is in the same order the fields are added.
- *
- * See synth_field_size() for available types. If field_name contains
- * [n] the field is considered to be an array.
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name)
-{
- struct dynevent_arg arg;
- int ret;
-
- if (cmd->type != DYNEVENT_TYPE_SYNTH)
- return -EINVAL;
-
- if (!type_name)
- return -EINVAL;
-
- dynevent_arg_init(&arg, ';');
-
- arg.str = type_name;
-
- ret = dynevent_arg_add(cmd, &arg, NULL);
- if (ret)
- return ret;
-
- if (++cmd->n_fields > SYNTH_FIELDS_MAX)
- ret = -EINVAL;
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_add_field_str);
-
-/**
- * synth_event_add_fields - Add multiple fields to a synthetic event cmd
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @fields: An array of type/name field descriptions
- * @n_fields: The number of field descriptions contained in the fields array
- *
- * Add a new set of fields to a synthetic event cmd object. The event
- * fields that will be defined for the event should be passed in as an
- * array of struct synth_field_desc, and the number of elements in the
- * array passed in as n_fields. Field ordering will retain the
- * ordering given in the fields array.
- *
- * See synth_field_size() for available types. If field_name contains
- * [n] the field is considered to be an array.
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_add_fields(struct dynevent_cmd *cmd,
- struct synth_field_desc *fields,
- unsigned int n_fields)
-{
- unsigned int i;
- int ret = 0;
-
- for (i = 0; i < n_fields; i++) {
- if (fields[i].type == NULL || fields[i].name == NULL) {
- ret = -EINVAL;
- break;
- }
-
- ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
- if (ret)
- break;
- }
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_add_fields);
-
-/**
- * __synth_event_gen_cmd_start - Start a synthetic event command from arg list
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @name: The name of the synthetic event
- * @mod: The module creating the event, NULL if not created from a module
- * @args: Variable number of arg (pairs), one pair for each field
- *
- * NOTE: Users normally won't want to call this function directly, but
- * rather use the synth_event_gen_cmd_start() wrapper, which
- * automatically adds a NULL to the end of the arg list. If this
- * function is used directly, make sure the last arg in the variable
- * arg list is NULL.
- *
- * Generate a synthetic event command to be executed by
- * synth_event_gen_cmd_end(). This function can be used to generate
- * the complete command or only the first part of it; in the latter
- * case, synth_event_add_field(), synth_event_add_field_str(), or
- * synth_event_add_fields() can be used to add more fields following
- * this.
- *
- * There should be an even number variable args, each pair consisting
- * of a type followed by a field name.
- *
- * See synth_field_size() for available types. If field_name contains
- * [n] the field is considered to be an array.
- *
- * Return: 0 if successful, error otherwise.
- */
-int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name,
- struct module *mod, ...)
-{
- struct dynevent_arg arg;
- va_list args;
- int ret;
-
- cmd->event_name = name;
- cmd->private_data = mod;
-
- if (cmd->type != DYNEVENT_TYPE_SYNTH)
- return -EINVAL;
-
- dynevent_arg_init(&arg, 0);
- arg.str = name;
- ret = dynevent_arg_add(cmd, &arg, NULL);
- if (ret)
- return ret;
-
- va_start(args, mod);
- for (;;) {
- const char *type, *name;
-
- type = va_arg(args, const char *);
- if (!type)
- break;
- name = va_arg(args, const char *);
- if (!name)
- break;
-
- if (++cmd->n_fields > SYNTH_FIELDS_MAX) {
- ret = -EINVAL;
- break;
- }
-
- ret = synth_event_add_field(cmd, type, name);
- if (ret)
- break;
- }
- va_end(args);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start);
-
-/**
- * synth_event_gen_cmd_array_start - Start synthetic event command from an array
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @name: The name of the synthetic event
- * @fields: An array of type/name field descriptions
- * @n_fields: The number of field descriptions contained in the fields array
- *
- * Generate a synthetic event command to be executed by
- * synth_event_gen_cmd_end(). This function can be used to generate
- * the complete command or only the first part of it; in the latter
- * case, synth_event_add_field(), synth_event_add_field_str(), or
- * synth_event_add_fields() can be used to add more fields following
- * this.
- *
- * The event fields that will be defined for the event should be
- * passed in as an array of struct synth_field_desc, and the number of
- * elements in the array passed in as n_fields. Field ordering will
- * retain the ordering given in the fields array.
- *
- * See synth_field_size() for available types. If field_name contains
- * [n] the field is considered to be an array.
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name,
- struct module *mod,
- struct synth_field_desc *fields,
- unsigned int n_fields)
-{
- struct dynevent_arg arg;
- unsigned int i;
- int ret = 0;
-
- cmd->event_name = name;
- cmd->private_data = mod;
-
- if (cmd->type != DYNEVENT_TYPE_SYNTH)
- return -EINVAL;
-
- if (n_fields > SYNTH_FIELDS_MAX)
- return -EINVAL;
-
- dynevent_arg_init(&arg, 0);
- arg.str = name;
- ret = dynevent_arg_add(cmd, &arg, NULL);
- if (ret)
- return ret;
-
- for (i = 0; i < n_fields; i++) {
- if (fields[i].type == NULL || fields[i].name == NULL)
- return -EINVAL;
-
- ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
- if (ret)
- break;
- }
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start);
-
-static int __create_synth_event(int argc, const char *name, const char **argv)
-{
- struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
- struct synth_event *event = NULL;
- int i, consumed = 0, n_fields = 0, ret = 0;
-
- /*
- * Argument syntax:
- * - Add synthetic event: <event_name> field[;field] ...
- * - Remove synthetic event: !<event_name> field[;field] ...
- * where 'field' = type field_name
- */
-
- if (name[0] == '\0' || argc < 1)
- return -EINVAL;
-
- mutex_lock(&event_mutex);
-
- event = find_synth_event(name);
- if (event) {
- ret = -EEXIST;
- goto out;
- }
-
- for (i = 0; i < argc - 1; i++) {
- if (strcmp(argv[i], ";") == 0)
- continue;
- if (n_fields == SYNTH_FIELDS_MAX) {
- ret = -EINVAL;
- goto err;
- }
-
- field = parse_synth_field(argc - i, &argv[i], &consumed);
- if (IS_ERR(field)) {
- ret = PTR_ERR(field);
- goto err;
- }
- fields[n_fields++] = field;
- i += consumed - 1;
- }
-
- if (i < argc && strcmp(argv[i], ";") != 0) {
- ret = -EINVAL;
- goto err;
- }
-
- event = alloc_synth_event(name, n_fields, fields);
- if (IS_ERR(event)) {
- ret = PTR_ERR(event);
- event = NULL;
- goto err;
- }
- ret = register_synth_event(event);
- if (!ret)
- dyn_event_add(&event->devent);
- else
- free_synth_event(event);
- out:
- mutex_unlock(&event_mutex);
-
- return ret;
- err:
- for (i = 0; i < n_fields; i++)
- free_synth_field(fields[i]);
-
- goto out;
-}
-
-/**
- * synth_event_create - Create a new synthetic event
- * @name: The name of the new sythetic event
- * @fields: An array of type/name field descriptions
- * @n_fields: The number of field descriptions contained in the fields array
- * @mod: The module creating the event, NULL if not created from a module
- *
- * Create a new synthetic event with the given name under the
- * trace/events/synthetic/ directory. The event fields that will be
- * defined for the event should be passed in as an array of struct
- * synth_field_desc, and the number elements in the array passed in as
- * n_fields. Field ordering will retain the ordering given in the
- * fields array.
- *
- * If the new synthetic event is being created from a module, the mod
- * param must be non-NULL. This will ensure that the trace buffer
- * won't contain unreadable events.
- *
- * The new synth event should be deleted using synth_event_delete()
- * function. The new synthetic event can be generated from modules or
- * other kernel code using trace_synth_event() and related functions.
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_create(const char *name, struct synth_field_desc *fields,
- unsigned int n_fields, struct module *mod)
-{
- struct dynevent_cmd cmd;
- char *buf;
- int ret;
-
- buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
- if (!buf)
- return -ENOMEM;
-
- synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
-
- ret = synth_event_gen_cmd_array_start(&cmd, name, mod,
- fields, n_fields);
- if (ret)
- goto out;
-
- ret = synth_event_gen_cmd_end(&cmd);
- out:
- kfree(buf);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_create);
-
-static int destroy_synth_event(struct synth_event *se)
-{
- int ret;
-
- if (se->ref)
- ret = -EBUSY;
- else {
- ret = unregister_synth_event(se);
- if (!ret) {
- dyn_event_remove(&se->devent);
- free_synth_event(se);
- }
- }
-
- return ret;
-}
-
-/**
- * synth_event_delete - Delete a synthetic event
- * @event_name: The name of the new sythetic event
- *
- * Delete a synthetic event that was created with synth_event_create().
- *
- * Return: 0 if successful, error otherwise.
- */
-int synth_event_delete(const char *event_name)
-{
- struct synth_event *se = NULL;
- struct module *mod = NULL;
- int ret = -ENOENT;
-
- mutex_lock(&event_mutex);
- se = find_synth_event(event_name);
- if (se) {
- mod = se->mod;
- ret = destroy_synth_event(se);
- }
- mutex_unlock(&event_mutex);
-
- if (mod) {
- mutex_lock(&trace_types_lock);
- /*
- * It is safest to reset the ring buffer if the module
- * being unloaded registered any events that were
- * used. The only worry is if a new module gets
- * loaded, and takes on the same id as the events of
- * this module. When printing out the buffer, traced
- * events left over from this module may be passed to
- * the new module events and unexpected results may
- * occur.
- */
- tracing_reset_all_online_cpus();
- mutex_unlock(&trace_types_lock);
- }
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_delete);
-
-static int create_or_delete_synth_event(int argc, char **argv)
-{
- const char *name = argv[0];
- int ret;
-
- /* trace_run_command() ensures argc != 0 */
- if (name[0] == '!') {
- ret = synth_event_delete(name + 1);
- return ret;
- }
-
- ret = __create_synth_event(argc - 1, name, (const char **)argv + 1);
- return ret == -ECANCELED ? -EINVAL : ret;
-}
-
-static int synth_event_run_command(struct dynevent_cmd *cmd)
-{
- struct synth_event *se;
- int ret;
-
- ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event);
- if (ret)
- return ret;
-
- se = find_synth_event(cmd->event_name);
- if (WARN_ON(!se))
- return -ENOENT;
-
- se->mod = cmd->private_data;
-
- return ret;
-}
-
-/**
- * synth_event_cmd_init - Initialize a synthetic event command object
- * @cmd: A pointer to the dynevent_cmd struct representing the new event
- * @buf: A pointer to the buffer used to build the command
- * @maxlen: The length of the buffer passed in @buf
- *
- * Initialize a synthetic event command object. Use this before
- * calling any of the other dyenvent_cmd functions.
- */
-void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen)
-{
- dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH,
- synth_event_run_command);
-}
-EXPORT_SYMBOL_GPL(synth_event_cmd_init);
-
-static inline int
-__synth_event_trace_start(struct trace_event_file *file,
- struct synth_event_trace_state *trace_state)
-{
- int entry_size, fields_size = 0;
- int ret = 0;
-
- memset(trace_state, '\0', sizeof(*trace_state));
-
- /*
- * Normal event tracing doesn't get called at all unless the
- * ENABLED bit is set (which attaches the probe thus allowing
- * this code to be called, etc). Because this is called
- * directly by the user, we don't have that but we still need
- * to honor not logging when disabled. For the the iterated
- * trace case, we save the enabed state upon start and just
- * ignore the following data calls.
- */
- if (!(file->flags & EVENT_FILE_FL_ENABLED) ||
- trace_trigger_soft_disabled(file)) {
- trace_state->disabled = true;
- ret = -ENOENT;
- goto out;
- }
-
- trace_state->event = file->event_call->data;
-
- fields_size = trace_state->event->n_u64 * sizeof(u64);
-
- /*
- * Avoid ring buffer recursion detection, as this event
- * is being performed within another event.
- */
- trace_state->buffer = file->tr->array_buffer.buffer;
- ring_buffer_nest_start(trace_state->buffer);
-
- entry_size = sizeof(*trace_state->entry) + fields_size;
- trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer,
- file,
- entry_size);
- if (!trace_state->entry) {
- ring_buffer_nest_end(trace_state->buffer);
- ret = -EINVAL;
- }
-out:
- return ret;
-}
-
-static inline void
-__synth_event_trace_end(struct synth_event_trace_state *trace_state)
-{
- trace_event_buffer_commit(&trace_state->fbuffer);
-
- ring_buffer_nest_end(trace_state->buffer);
-}
-
-/**
- * synth_event_trace - Trace a synthetic event
- * @file: The trace_event_file representing the synthetic event
- * @n_vals: The number of values in vals
- * @args: Variable number of args containing the event values
- *
- * Trace a synthetic event using the values passed in the variable
- * argument list.
- *
- * The argument list should be a list 'n_vals' u64 values. The number
- * of vals must match the number of field in the synthetic event, and
- * must be in the same order as the synthetic event fields.
- *
- * All vals should be cast to u64, and string vals are just pointers
- * to strings, cast to u64. Strings will be copied into space
- * reserved in the event for the string, using these pointers.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
-{
- struct synth_event_trace_state state;
- unsigned int i, n_u64;
- va_list args;
- int ret;
-
- ret = __synth_event_trace_start(file, &state);
- if (ret) {
- if (ret == -ENOENT)
- ret = 0; /* just disabled, not really an error */
- return ret;
- }
-
- if (n_vals != state.event->n_fields) {
- ret = -EINVAL;
- goto out;
- }
-
- va_start(args, n_vals);
- for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
- u64 val;
-
- val = va_arg(args, u64);
-
- if (state.event->fields[i]->is_string) {
- char *str_val = (char *)(long)val;
- char *str_field = (char *)&state.entry->fields[n_u64];
-
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
- } else {
- struct synth_field *field = state.event->fields[i];
-
- switch (field->size) {
- case 1:
- *(u8 *)&state.entry->fields[n_u64] = (u8)val;
- break;
-
- case 2:
- *(u16 *)&state.entry->fields[n_u64] = (u16)val;
- break;
-
- case 4:
- *(u32 *)&state.entry->fields[n_u64] = (u32)val;
- break;
-
- default:
- state.entry->fields[n_u64] = val;
- break;
- }
- n_u64++;
- }
- }
- va_end(args);
-out:
- __synth_event_trace_end(&state);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_trace);
-
-/**
- * synth_event_trace_array - Trace a synthetic event from an array
- * @file: The trace_event_file representing the synthetic event
- * @vals: Array of values
- * @n_vals: The number of values in vals
- *
- * Trace a synthetic event using the values passed in as 'vals'.
- *
- * The 'vals' array is just an array of 'n_vals' u64. The number of
- * vals must match the number of field in the synthetic event, and
- * must be in the same order as the synthetic event fields.
- *
- * All vals should be cast to u64, and string vals are just pointers
- * to strings, cast to u64. Strings will be copied into space
- * reserved in the event for the string, using these pointers.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
- unsigned int n_vals)
-{
- struct synth_event_trace_state state;
- unsigned int i, n_u64;
- int ret;
-
- ret = __synth_event_trace_start(file, &state);
- if (ret) {
- if (ret == -ENOENT)
- ret = 0; /* just disabled, not really an error */
- return ret;
- }
-
- if (n_vals != state.event->n_fields) {
- ret = -EINVAL;
- goto out;
- }
-
- for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
- if (state.event->fields[i]->is_string) {
- char *str_val = (char *)(long)vals[i];
- char *str_field = (char *)&state.entry->fields[n_u64];
-
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
- } else {
- struct synth_field *field = state.event->fields[i];
- u64 val = vals[i];
-
- switch (field->size) {
- case 1:
- *(u8 *)&state.entry->fields[n_u64] = (u8)val;
- break;
-
- case 2:
- *(u16 *)&state.entry->fields[n_u64] = (u16)val;
- break;
-
- case 4:
- *(u32 *)&state.entry->fields[n_u64] = (u32)val;
- break;
-
- default:
- state.entry->fields[n_u64] = val;
- break;
- }
- n_u64++;
- }
- }
-out:
- __synth_event_trace_end(&state);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_trace_array);
-
-/**
- * synth_event_trace_start - Start piecewise synthetic event trace
- * @file: The trace_event_file representing the synthetic event
- * @trace_state: A pointer to object tracking the piecewise trace state
- *
- * Start the trace of a synthetic event field-by-field rather than all
- * at once.
- *
- * This function 'opens' an event trace, which means space is reserved
- * for the event in the trace buffer, after which the event's
- * individual field values can be set through either
- * synth_event_add_next_val() or synth_event_add_val().
- *
- * A pointer to a trace_state object is passed in, which will keep
- * track of the current event trace state until the event trace is
- * closed (and the event finally traced) using
- * synth_event_trace_end().
- *
- * Note that synth_event_trace_end() must be called after all values
- * have been added for each event trace, regardless of whether adding
- * all field values succeeded or not.
- *
- * Note also that for a given event trace, all fields must be added
- * using either synth_event_add_next_val() or synth_event_add_val()
- * but not both together or interleaved.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_trace_start(struct trace_event_file *file,
- struct synth_event_trace_state *trace_state)
-{
- int ret;
-
- if (!trace_state)
- return -EINVAL;
-
- ret = __synth_event_trace_start(file, trace_state);
- if (ret == -ENOENT)
- ret = 0; /* just disabled, not really an error */
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(synth_event_trace_start);
-
-static int __synth_event_add_val(const char *field_name, u64 val,
- struct synth_event_trace_state *trace_state)
-{
- struct synth_field *field = NULL;
- struct synth_trace_event *entry;
- struct synth_event *event;
- int i, ret = 0;
-
- if (!trace_state) {
- ret = -EINVAL;
- goto out;
- }
-
- /* can't mix add_next_synth_val() with add_synth_val() */
- if (field_name) {
- if (trace_state->add_next) {
- ret = -EINVAL;
- goto out;
- }
- trace_state->add_name = true;
- } else {
- if (trace_state->add_name) {
- ret = -EINVAL;
- goto out;
- }
- trace_state->add_next = true;
- }
-
- if (trace_state->disabled)
- goto out;
-
- event = trace_state->event;
- if (trace_state->add_name) {
- for (i = 0; i < event->n_fields; i++) {
- field = event->fields[i];
- if (strcmp(field->name, field_name) == 0)
- break;
- }
- if (!field) {
- ret = -EINVAL;
- goto out;
- }
- } else {
- if (trace_state->cur_field >= event->n_fields) {
- ret = -EINVAL;
- goto out;
- }
- field = event->fields[trace_state->cur_field++];
- }
-
- entry = trace_state->entry;
- if (field->is_string) {
- char *str_val = (char *)(long)val;
- char *str_field;
-
- if (!str_val) {
- ret = -EINVAL;
- goto out;
- }
-
- str_field = (char *)&entry->fields[field->offset];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- } else {
- switch (field->size) {
- case 1:
- *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val;
- break;
-
- case 2:
- *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val;
- break;
-
- case 4:
- *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val;
- break;
-
- default:
- trace_state->entry->fields[field->offset] = val;
- break;
- }
- }
- out:
- return ret;
-}
-
-/**
- * synth_event_add_next_val - Add the next field's value to an open synth trace
- * @val: The value to set the next field to
- * @trace_state: A pointer to object tracking the piecewise trace state
- *
- * Set the value of the next field in an event that's been opened by
- * synth_event_trace_start().
- *
- * The val param should be the value cast to u64. If the value points
- * to a string, the val param should be a char * cast to u64.
- *
- * This function assumes all the fields in an event are to be set one
- * after another - successive calls to this function are made, one for
- * each field, in the order of the fields in the event, until all
- * fields have been set. If you'd rather set each field individually
- * without regard to ordering, synth_event_add_val() can be used
- * instead.
- *
- * Note however that synth_event_add_next_val() and
- * synth_event_add_val() can't be intermixed for a given event trace -
- * one or the other but not both can be used at the same time.
- *
- * Note also that synth_event_trace_end() must be called after all
- * values have been added for each event trace, regardless of whether
- * adding all field values succeeded or not.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_add_next_val(u64 val,
- struct synth_event_trace_state *trace_state)
-{
- return __synth_event_add_val(NULL, val, trace_state);
-}
-EXPORT_SYMBOL_GPL(synth_event_add_next_val);
-
-/**
- * synth_event_add_val - Add a named field's value to an open synth trace
- * @field_name: The name of the synthetic event field value to set
- * @val: The value to set the next field to
- * @trace_state: A pointer to object tracking the piecewise trace state
- *
- * Set the value of the named field in an event that's been opened by
- * synth_event_trace_start().
- *
- * The val param should be the value cast to u64. If the value points
- * to a string, the val param should be a char * cast to u64.
- *
- * This function looks up the field name, and if found, sets the field
- * to the specified value. This lookup makes this function more
- * expensive than synth_event_add_next_val(), so use that or the
- * none-piecewise synth_event_trace() instead if efficiency is more
- * important.
- *
- * Note however that synth_event_add_next_val() and
- * synth_event_add_val() can't be intermixed for a given event trace -
- * one or the other but not both can be used at the same time.
- *
- * Note also that synth_event_trace_end() must be called after all
- * values have been added for each event trace, regardless of whether
- * adding all field values succeeded or not.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_add_val(const char *field_name, u64 val,
- struct synth_event_trace_state *trace_state)
-{
- return __synth_event_add_val(field_name, val, trace_state);
-}
-EXPORT_SYMBOL_GPL(synth_event_add_val);
-
-/**
- * synth_event_trace_end - End piecewise synthetic event trace
- * @trace_state: A pointer to object tracking the piecewise trace state
- *
- * End the trace of a synthetic event opened by
- * synth_event_trace__start().
- *
- * This function 'closes' an event trace, which basically means that
- * it commits the reserved event and cleans up other loose ends.
- *
- * A pointer to a trace_state object is passed in, which will keep
- * track of the current event trace state opened with
- * synth_event_trace_start().
- *
- * Note that this function must be called after all values have been
- * added for each event trace, regardless of whether adding all field
- * values succeeded or not.
- *
- * Return: 0 on success, err otherwise.
- */
-int synth_event_trace_end(struct synth_event_trace_state *trace_state)
-{
- if (!trace_state)
- return -EINVAL;
-
- __synth_event_trace_end(trace_state);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(synth_event_trace_end);
-
-static int create_synth_event(int argc, const char **argv)
-{
- const char *name = argv[0];
- int len;
-
- if (name[0] != 's' || name[1] != ':')
- return -ECANCELED;
- name += 2;
-
- /* This interface accepts group name prefix */
- if (strchr(name, '/')) {
- len = str_has_prefix(name, SYNTH_SYSTEM "/");
- if (len == 0)
- return -EINVAL;
- name += len;
- }
- return __create_synth_event(argc - 1, name, argv + 1);
-}
-
-static int synth_event_release(struct dyn_event *ev)
-{
- struct synth_event *event = to_synth_event(ev);
- int ret;
-
- if (event->ref)
- return -EBUSY;
-
- ret = unregister_synth_event(event);
- if (ret)
- return ret;
-
- dyn_event_remove(ev);
- free_synth_event(event);
- return 0;
-}
-
-static int __synth_event_show(struct seq_file *m, struct synth_event *event)
-{
- struct synth_field *field;
- unsigned int i;
-
- seq_printf(m, "%s\t", event->name);
-
- for (i = 0; i < event->n_fields; i++) {
- field = event->fields[i];
-
- /* parameter values */
- seq_printf(m, "%s %s%s", field->type, field->name,
- i == event->n_fields - 1 ? "" : "; ");
- }
-
- seq_putc(m, '\n');
-
- return 0;
-}
-
-static int synth_event_show(struct seq_file *m, struct dyn_event *ev)
-{
- struct synth_event *event = to_synth_event(ev);
-
- seq_printf(m, "s:%s/", event->class.system);
-
- return __synth_event_show(m, event);
-}
-
-static int synth_events_seq_show(struct seq_file *m, void *v)
-{
- struct dyn_event *ev = v;
-
- if (!is_synth_event(ev))
- return 0;
-
- return __synth_event_show(m, to_synth_event(ev));
-}
-
-static const struct seq_operations synth_events_seq_op = {
- .start = dyn_event_seq_start,
- .next = dyn_event_seq_next,
- .stop = dyn_event_seq_stop,
- .show = synth_events_seq_show,
-};
-
-static int synth_events_open(struct inode *inode, struct file *file)
-{
- int ret;
-
- ret = security_locked_down(LOCKDOWN_TRACEFS);
- if (ret)
- return ret;
-
- if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
- ret = dyn_events_release_all(&synth_event_ops);
- if (ret < 0)
- return ret;
- }
-
- return seq_open(file, &synth_events_seq_op);
-}
-
-static ssize_t synth_events_write(struct file *file,
- const char __user *buffer,
- size_t count, loff_t *ppos)
-{
- return trace_parse_run_command(file, buffer, count, ppos,
- create_or_delete_synth_event);
-}
-
-static const struct file_operations synth_events_fops = {
- .open = synth_events_open,
- .write = synth_events_write,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release,
-};
-
static u64 hist_field_timestamp(struct hist_field *hist_field,
struct tracing_map_elt *elt,
struct ring_buffer_event *rbe,
@@ -6491,6 +4729,279 @@ const struct file_operations event_hist_fops = {
.release = single_release,
};
+#ifdef CONFIG_HIST_TRIGGERS_DEBUG
+static void hist_field_debug_show_flags(struct seq_file *m,
+ unsigned long flags)
+{
+ seq_puts(m, " flags:\n");
+
+ if (flags & HIST_FIELD_FL_KEY)
+ seq_puts(m, " HIST_FIELD_FL_KEY\n");
+ else if (flags & HIST_FIELD_FL_HITCOUNT)
+ seq_puts(m, " VAL: HIST_FIELD_FL_HITCOUNT\n");
+ else if (flags & HIST_FIELD_FL_VAR)
+ seq_puts(m, " HIST_FIELD_FL_VAR\n");
+ else if (flags & HIST_FIELD_FL_VAR_REF)
+ seq_puts(m, " HIST_FIELD_FL_VAR_REF\n");
+ else
+ seq_puts(m, " VAL: normal u64 value\n");
+
+ if (flags & HIST_FIELD_FL_ALIAS)
+ seq_puts(m, " HIST_FIELD_FL_ALIAS\n");
+}
+
+static int hist_field_debug_show(struct seq_file *m,
+ struct hist_field *field, unsigned long flags)
+{
+ if ((field->flags & flags) != flags) {
+ seq_printf(m, "ERROR: bad flags - %lx\n", flags);
+ return -EINVAL;
+ }
+
+ hist_field_debug_show_flags(m, field->flags);
+ if (field->field)
+ seq_printf(m, " ftrace_event_field name: %s\n",
+ field->field->name);
+
+ if (field->flags & HIST_FIELD_FL_VAR) {
+ seq_printf(m, " var.name: %s\n", field->var.name);
+ seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
+ field->var.idx);
+ }
+
+ if (field->flags & HIST_FIELD_FL_ALIAS)
+ seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n",
+ field->var_ref_idx);
+
+ if (field->flags & HIST_FIELD_FL_VAR_REF) {
+ seq_printf(m, " name: %s\n", field->name);
+ seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
+ field->var.idx);
+ seq_printf(m, " var.hist_data: %p\n", field->var.hist_data);
+ seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n",
+ field->var_ref_idx);
+ if (field->system)
+ seq_printf(m, " system: %s\n", field->system);
+ if (field->event_name)
+ seq_printf(m, " event_name: %s\n", field->event_name);
+ }
+
+ seq_printf(m, " type: %s\n", field->type);
+ seq_printf(m, " size: %u\n", field->size);
+ seq_printf(m, " is_signed: %u\n", field->is_signed);
+
+ return 0;
+}
+
+static int field_var_debug_show(struct seq_file *m,
+ struct field_var *field_var, unsigned int i,
+ bool save_vars)
+{
+ const char *vars_name = save_vars ? "save_vars" : "field_vars";
+ struct hist_field *field;
+ int ret = 0;
+
+ seq_printf(m, "\n hist_data->%s[%d]:\n", vars_name, i);
+
+ field = field_var->var;
+
+ seq_printf(m, "\n %s[%d].var:\n", vars_name, i);
+
+ hist_field_debug_show_flags(m, field->flags);
+ seq_printf(m, " var.name: %s\n", field->var.name);
+ seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
+ field->var.idx);
+
+ field = field_var->val;
+
+ seq_printf(m, "\n %s[%d].val:\n", vars_name, i);
+ if (field->field)
+ seq_printf(m, " ftrace_event_field name: %s\n",
+ field->field->name);
+ else {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ seq_printf(m, " type: %s\n", field->type);
+ seq_printf(m, " size: %u\n", field->size);
+ seq_printf(m, " is_signed: %u\n", field->is_signed);
+out:
+ return ret;
+}
+
+static int hist_action_debug_show(struct seq_file *m,
+ struct action_data *data, int i)
+{
+ int ret = 0;
+
+ if (data->handler == HANDLER_ONMAX ||
+ data->handler == HANDLER_ONCHANGE) {
+ seq_printf(m, "\n hist_data->actions[%d].track_data.var_ref:\n", i);
+ ret = hist_field_debug_show(m, data->track_data.var_ref,
+ HIST_FIELD_FL_VAR_REF);
+ if (ret)
+ goto out;
+
+ seq_printf(m, "\n hist_data->actions[%d].track_data.track_var:\n", i);
+ ret = hist_field_debug_show(m, data->track_data.track_var,
+ HIST_FIELD_FL_VAR);
+ if (ret)
+ goto out;
+ }
+
+ if (data->handler == HANDLER_ONMATCH) {
+ seq_printf(m, "\n hist_data->actions[%d].match_data.event_system: %s\n",
+ i, data->match_data.event_system);
+ seq_printf(m, " hist_data->actions[%d].match_data.event: %s\n",
+ i, data->match_data.event);
+ }
+out:
+ return ret;
+}
+
+static int hist_actions_debug_show(struct seq_file *m,
+ struct hist_trigger_data *hist_data)
+{
+ int i, ret = 0;
+
+ if (hist_data->n_actions)
+ seq_puts(m, "\n action tracking variables (for onmax()/onchange()/onmatch()):\n");
+
+ for (i = 0; i < hist_data->n_actions; i++) {
+ struct action_data *action = hist_data->actions[i];
+
+ ret = hist_action_debug_show(m, action, i);
+ if (ret)
+ goto out;
+ }
+
+ if (hist_data->n_save_vars)
+ seq_puts(m, "\n save action variables (save() params):\n");
+
+ for (i = 0; i < hist_data->n_save_vars; i++) {
+ ret = field_var_debug_show(m, hist_data->save_vars[i], i, true);
+ if (ret)
+ goto out;
+ }
+out:
+ return ret;
+}
+
+static void hist_trigger_debug_show(struct seq_file *m,
+ struct event_trigger_data *data, int n)
+{
+ struct hist_trigger_data *hist_data;
+ int i, ret;
+
+ if (n > 0)
+ seq_puts(m, "\n\n");
+
+ seq_puts(m, "# event histogram\n#\n# trigger info: ");
+ data->ops->print(m, data->ops, data);
+ seq_puts(m, "#\n\n");
+
+ hist_data = data->private_data;
+
+ seq_printf(m, "hist_data: %p\n\n", hist_data);
+ seq_printf(m, " n_vals: %u\n", hist_data->n_vals);
+ seq_printf(m, " n_keys: %u\n", hist_data->n_keys);
+ seq_printf(m, " n_fields: %u\n", hist_data->n_fields);
+
+ seq_puts(m, "\n val fields:\n\n");
+
+ seq_puts(m, " hist_data->fields[0]:\n");
+ ret = hist_field_debug_show(m, hist_data->fields[0],
+ HIST_FIELD_FL_HITCOUNT);
+ if (ret)
+ return;
+
+ for (i = 1; i < hist_data->n_vals; i++) {
+ seq_printf(m, "\n hist_data->fields[%d]:\n", i);
+ ret = hist_field_debug_show(m, hist_data->fields[i], 0);
+ if (ret)
+ return;
+ }
+
+ seq_puts(m, "\n key fields:\n");
+
+ for (i = hist_data->n_vals; i < hist_data->n_fields; i++) {
+ seq_printf(m, "\n hist_data->fields[%d]:\n", i);
+ ret = hist_field_debug_show(m, hist_data->fields[i],
+ HIST_FIELD_FL_KEY);
+ if (ret)
+ return;
+ }
+
+ if (hist_data->n_var_refs)
+ seq_puts(m, "\n variable reference fields:\n");
+
+ for (i = 0; i < hist_data->n_var_refs; i++) {
+ seq_printf(m, "\n hist_data->var_refs[%d]:\n", i);
+ ret = hist_field_debug_show(m, hist_data->var_refs[i],
+ HIST_FIELD_FL_VAR_REF);
+ if (ret)
+ return;
+ }
+
+ if (hist_data->n_field_vars)
+ seq_puts(m, "\n field variables:\n");
+
+ for (i = 0; i < hist_data->n_field_vars; i++) {
+ ret = field_var_debug_show(m, hist_data->field_vars[i], i, false);
+ if (ret)
+ return;
+ }
+
+ ret = hist_actions_debug_show(m, hist_data);
+ if (ret)
+ return;
+}
+
+static int hist_debug_show(struct seq_file *m, void *v)
+{
+ struct event_trigger_data *data;
+ struct trace_event_file *event_file;
+ int n = 0, ret = 0;
+
+ mutex_lock(&event_mutex);
+
+ event_file = event_file_data(m->private);
+ if (unlikely(!event_file)) {
+ ret = -ENODEV;
+ goto out_unlock;
+ }
+
+ list_for_each_entry(data, &event_file->triggers, list) {
+ if (data->cmd_ops->trigger_type == ETT_EVENT_HIST)
+ hist_trigger_debug_show(m, data, n++);
+ }
+
+ out_unlock:
+ mutex_unlock(&event_mutex);
+
+ return ret;
+}
+
+static int event_hist_debug_open(struct inode *inode, struct file *file)
+{
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_TRACEFS);
+ if (ret)
+ return ret;
+
+ return single_open(file, hist_debug_show, file);
+}
+
+const struct file_operations event_hist_debug_fops = {
+ .open = event_hist_debug_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
const char *field_name = hist_field_name(hist_field, 0);
@@ -7393,37 +5904,3 @@ __init int register_trigger_hist_enable_disable_cmds(void)
return ret;
}
-
-static __init int trace_events_hist_init(void)
-{
- struct dentry *entry = NULL;
- struct dentry *d_tracer;
- int err = 0;
-
- err = dyn_event_register(&synth_event_ops);
- if (err) {
- pr_warn("Could not register synth_event_ops\n");
- return err;
- }
-
- d_tracer = tracing_init_dentry();
- if (IS_ERR(d_tracer)) {
- err = PTR_ERR(d_tracer);
- goto err;
- }
-
- entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
- NULL, &synth_events_fops);
- if (!entry) {
- err = -ENODEV;
- goto err;
- }
-
- return err;
- err:
- pr_warn("Could not create tracefs 'synthetic_events' entry\n");
-
- return err;
-}
-
-fs_initcall(trace_events_hist_init);
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
new file mode 100644
index 00000000000000..c6cca0d1d5840a
--- /dev/null
+++ b/kernel/trace/trace_events_synth.c
@@ -0,0 +1,1789 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace_events_synth - synthetic trace events
+ *
+ * Copyright (C) 2015, 2020 Tom Zanussi <tom.zanussi@linux.intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kallsyms.h>
+#include <linux/security.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/stacktrace.h>
+#include <linux/rculist.h>
+#include <linux/tracefs.h>
+
+/* for gfp flag names */
+#include <linux/trace_events.h>
+#include <trace/events/mmflags.h>
+
+#include "trace_synth.h"
+
+static int create_synth_event(int argc, const char **argv);
+static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
+static int synth_event_release(struct dyn_event *ev);
+static bool synth_event_is_busy(struct dyn_event *ev);
+static bool synth_event_match(const char *system, const char *event,
+ int argc, const char **argv, struct dyn_event *ev);
+
+static struct dyn_event_operations synth_event_ops = {
+ .create = create_synth_event,
+ .show = synth_event_show,
+ .is_busy = synth_event_is_busy,
+ .free = synth_event_release,
+ .match = synth_event_match,
+};
+
+static bool is_synth_event(struct dyn_event *ev)
+{
+ return ev->ops == &synth_event_ops;
+}
+
+static struct synth_event *to_synth_event(struct dyn_event *ev)
+{
+ return container_of(ev, struct synth_event, devent);
+}
+
+static bool synth_event_is_busy(struct dyn_event *ev)
+{
+ struct synth_event *event = to_synth_event(ev);
+
+ return event->ref != 0;
+}
+
+static bool synth_event_match(const char *system, const char *event,
+ int argc, const char **argv, struct dyn_event *ev)
+{
+ struct synth_event *sev = to_synth_event(ev);
+
+ return strcmp(sev->name, event) == 0 &&
+ (!system || strcmp(system, SYNTH_SYSTEM) == 0);
+}
+
+struct synth_trace_event {
+ struct trace_entry ent;
+ u64 fields[];
+};
+
+static int synth_event_define_fields(struct trace_event_call *call)
+{
+ struct synth_trace_event trace;
+ int offset = offsetof(typeof(trace), fields);
+ struct synth_event *event = call->data;
+ unsigned int i, size, n_u64;
+ char *name, *type;
+ bool is_signed;
+ int ret = 0;
+
+ for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
+ size = event->fields[i]->size;
+ is_signed = event->fields[i]->is_signed;
+ type = event->fields[i]->type;
+ name = event->fields[i]->name;
+ ret = trace_define_field(call, type, name, offset, size,
+ is_signed, FILTER_OTHER);
+ if (ret)
+ break;
+
+ event->fields[i]->offset = n_u64;
+
+ if (event->fields[i]->is_string) {
+ offset += STR_VAR_LEN_MAX;
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ offset += sizeof(u64);
+ n_u64++;
+ }
+ }
+
+ event->n_u64 = n_u64;
+
+ return ret;
+}
+
+static bool synth_field_signed(char *type)
+{
+ if (str_has_prefix(type, "u"))
+ return false;
+ if (strcmp(type, "gfp_t") == 0)
+ return false;
+
+ return true;
+}
+
+static int synth_field_is_string(char *type)
+{
+ if (strstr(type, "char[") != NULL)
+ return true;
+
+ return false;
+}
+
+static int synth_field_string_size(char *type)
+{
+ char buf[4], *end, *start;
+ unsigned int len;
+ int size, err;
+
+ start = strstr(type, "char[");
+ if (start == NULL)
+ return -EINVAL;
+ start += sizeof("char[") - 1;
+
+ end = strchr(type, ']');
+ if (!end || end < start)
+ return -EINVAL;
+
+ len = end - start;
+ if (len > 3)
+ return -EINVAL;
+
+ strncpy(buf, start, len);
+ buf[len] = '\0';
+
+ err = kstrtouint(buf, 0, &size);
+ if (err)
+ return err;
+
+ if (size > STR_VAR_LEN_MAX)
+ return -EINVAL;
+
+ return size;
+}
+
+static int synth_field_size(char *type)
+{
+ int size = 0;
+
+ if (strcmp(type, "s64") == 0)
+ size = sizeof(s64);
+ else if (strcmp(type, "u64") == 0)
+ size = sizeof(u64);
+ else if (strcmp(type, "s32") == 0)
+ size = sizeof(s32);
+ else if (strcmp(type, "u32") == 0)
+ size = sizeof(u32);
+ else if (strcmp(type, "s16") == 0)
+ size = sizeof(s16);
+ else if (strcmp(type, "u16") == 0)
+ size = sizeof(u16);
+ else if (strcmp(type, "s8") == 0)
+ size = sizeof(s8);
+ else if (strcmp(type, "u8") == 0)
+ size = sizeof(u8);
+ else if (strcmp(type, "char") == 0)
+ size = sizeof(char);
+ else if (strcmp(type, "unsigned char") == 0)
+ size = sizeof(unsigned char);
+ else if (strcmp(type, "int") == 0)
+ size = sizeof(int);
+ else if (strcmp(type, "unsigned int") == 0)
+ size = sizeof(unsigned int);
+ else if (strcmp(type, "long") == 0)
+ size = sizeof(long);
+ else if (strcmp(type, "unsigned long") == 0)
+ size = sizeof(unsigned long);
+ else if (strcmp(type, "pid_t") == 0)
+ size = sizeof(pid_t);
+ else if (strcmp(type, "gfp_t") == 0)
+ size = sizeof(gfp_t);
+ else if (synth_field_is_string(type))
+ size = synth_field_string_size(type);
+
+ return size;
+}
+
+static const char *synth_field_fmt(char *type)
+{
+ const char *fmt = "%llu";
+
+ if (strcmp(type, "s64") == 0)
+ fmt = "%lld";
+ else if (strcmp(type, "u64") == 0)
+ fmt = "%llu";
+ else if (strcmp(type, "s32") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u32") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "s16") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u16") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "s8") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u8") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "char") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "unsigned char") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "int") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "unsigned int") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "long") == 0)
+ fmt = "%ld";
+ else if (strcmp(type, "unsigned long") == 0)
+ fmt = "%lu";
+ else if (strcmp(type, "pid_t") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "gfp_t") == 0)
+ fmt = "%x";
+ else if (synth_field_is_string(type))
+ fmt = "%s";
+
+ return fmt;
+}
+
+static void print_synth_event_num_val(struct trace_seq *s,
+ char *print_fmt, char *name,
+ int size, u64 val, char *space)
+{
+ switch (size) {
+ case 1:
+ trace_seq_printf(s, print_fmt, name, (u8)val, space);
+ break;
+
+ case 2:
+ trace_seq_printf(s, print_fmt, name, (u16)val, space);
+ break;
+
+ case 4:
+ trace_seq_printf(s, print_fmt, name, (u32)val, space);
+ break;
+
+ default:
+ trace_seq_printf(s, print_fmt, name, val, space);
+ break;
+ }
+}
+
+static enum print_line_t print_synth_event(struct trace_iterator *iter,
+ int flags,
+ struct trace_event *event)
+{
+ struct trace_array *tr = iter->tr;
+ struct trace_seq *s = &iter->seq;
+ struct synth_trace_event *entry;
+ struct synth_event *se;
+ unsigned int i, n_u64;
+ char print_fmt[32];
+ const char *fmt;
+
+ entry = (struct synth_trace_event *)iter->ent;
+ se = container_of(event, struct synth_event, call.event);
+
+ trace_seq_printf(s, "%s: ", se->name);
+
+ for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
+ if (trace_seq_has_overflowed(s))
+ goto end;
+
+ fmt = synth_field_fmt(se->fields[i]->type);
+
+ /* parameter types */
+ if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
+ trace_seq_printf(s, "%s ", fmt);
+
+ snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);
+
+ /* parameter values */
+ if (se->fields[i]->is_string) {
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ (char *)&entry->fields[n_u64],
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ struct trace_print_flags __flags[] = {
+ __def_gfpflag_names, {-1, NULL} };
+ char *space = (i == se->n_fields - 1 ? "" : " ");
+
+ print_synth_event_num_val(s, print_fmt,
+ se->fields[i]->name,
+ se->fields[i]->size,
+ entry->fields[n_u64],
+ space);
+
+ if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
+ trace_seq_puts(s, " (");
+ trace_print_flags_seq(s, "|",
+ entry->fields[n_u64],
+ __flags);
+ trace_seq_putc(s, ')');
+ }
+ n_u64++;
+ }
+ }
+end:
+ trace_seq_putc(s, '\n');
+
+ return trace_handle_return(s);
+}
+
+static struct trace_event_functions synth_event_funcs = {
+ .trace = print_synth_event
+};
+
+static notrace void trace_event_raw_event_synth(void *__data,
+ u64 *var_ref_vals,
+ unsigned int *var_ref_idx)
+{
+ struct trace_event_file *trace_file = __data;
+ struct synth_trace_event *entry;
+ struct trace_event_buffer fbuffer;
+ struct trace_buffer *buffer;
+ struct synth_event *event;
+ unsigned int i, n_u64, val_idx;
+ int fields_size = 0;
+
+ event = trace_file->event_call->data;
+
+ if (trace_trigger_soft_disabled(trace_file))
+ return;
+
+ fields_size = event->n_u64 * sizeof(u64);
+
+ /*
+ * Avoid ring buffer recursion detection, as this event
+ * is being performed within another event.
+ */
+ buffer = trace_file->tr->array_buffer.buffer;
+ ring_buffer_nest_start(buffer);
+
+ entry = trace_event_buffer_reserve(&fbuffer, trace_file,
+ sizeof(*entry) + fields_size);
+ if (!entry)
+ goto out;
+
+ for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
+ val_idx = var_ref_idx[i];
+ if (event->fields[i]->is_string) {
+ char *str_val = (char *)(long)var_ref_vals[val_idx];
+ char *str_field = (char *)&entry->fields[n_u64];
+
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ struct synth_field *field = event->fields[i];
+ u64 val = var_ref_vals[val_idx];
+
+ switch (field->size) {
+ case 1:
+ *(u8 *)&entry->fields[n_u64] = (u8)val;
+ break;
+
+ case 2:
+ *(u16 *)&entry->fields[n_u64] = (u16)val;
+ break;
+
+ case 4:
+ *(u32 *)&entry->fields[n_u64] = (u32)val;
+ break;
+
+ default:
+ entry->fields[n_u64] = val;
+ break;
+ }
+ n_u64++;
+ }
+ }
+
+ trace_event_buffer_commit(&fbuffer);
+out:
+ ring_buffer_nest_end(buffer);
+}
+
+static void free_synth_event_print_fmt(struct trace_event_call *call)
+{
+ if (call) {
+ kfree(call->print_fmt);
+ call->print_fmt = NULL;
+ }
+}
+
+static int __set_synth_event_print_fmt(struct synth_event *event,
+ char *buf, int len)
+{
+ const char *fmt;
+ int pos = 0;
+ int i;
+
+ /* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+ for (i = 0; i < event->n_fields; i++) {
+ fmt = synth_field_fmt(event->fields[i]->type);
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
+ event->fields[i]->name, fmt,
+ i == event->n_fields - 1 ? "" : ", ");
+ }
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+
+ for (i = 0; i < event->n_fields; i++) {
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ ", REC->%s", event->fields[i]->name);
+ }
+
+#undef LEN_OR_ZERO
+
+ /* return the length of print_fmt */
+ return pos;
+}
+
+static int set_synth_event_print_fmt(struct trace_event_call *call)
+{
+ struct synth_event *event = call->data;
+ char *print_fmt;
+ int len;
+
+ /* First: called with 0 length to calculate the needed length */
+ len = __set_synth_event_print_fmt(event, NULL, 0);
+
+ print_fmt = kmalloc(len + 1, GFP_KERNEL);
+ if (!print_fmt)
+ return -ENOMEM;
+
+ /* Second: actually write the @print_fmt */
+ __set_synth_event_print_fmt(event, print_fmt, len + 1);
+ call->print_fmt = print_fmt;
+
+ return 0;
+}
+
+static void free_synth_field(struct synth_field *field)
+{
+ kfree(field->type);
+ kfree(field->name);
+ kfree(field);
+}
+
+static struct synth_field *parse_synth_field(int argc, const char **argv,
+ int *consumed)
+{
+ struct synth_field *field;
+ const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
+ int len, ret = 0;
+
+ if (field_type[0] == ';')
+ field_type++;
+
+ if (!strcmp(field_type, "unsigned")) {
+ if (argc < 3)
+ return ERR_PTR(-EINVAL);
+ prefix = "unsigned ";
+ field_type = argv[1];
+ field_name = argv[2];
+ *consumed = 3;
+ } else {
+ field_name = argv[1];
+ *consumed = 2;
+ }
+
+ field = kzalloc(sizeof(*field), GFP_KERNEL);
+ if (!field)
+ return ERR_PTR(-ENOMEM);
+
+ len = strlen(field_name);
+ array = strchr(field_name, '[');
+ if (array)
+ len -= strlen(array);
+ else if (field_name[len - 1] == ';')
+ len--;
+
+ field->name = kmemdup_nul(field_name, len, GFP_KERNEL);
+ if (!field->name) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ if (field_type[0] == ';')
+ field_type++;
+ len = strlen(field_type) + 1;
+ if (array)
+ len += strlen(array);
+ if (prefix)
+ len += strlen(prefix);
+
+ field->type = kzalloc(len, GFP_KERNEL);
+ if (!field->type) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ if (prefix)
+ strcat(field->type, prefix);
+ strcat(field->type, field_type);
+ if (array) {
+ strcat(field->type, array);
+ if (field->type[len - 1] == ';')
+ field->type[len - 1] = '\0';
+ }
+
+ field->size = synth_field_size(field->type);
+ if (!field->size) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ if (synth_field_is_string(field->type))
+ field->is_string = true;
+
+ field->is_signed = synth_field_signed(field->type);
+
+ out:
+ return field;
+ free:
+ free_synth_field(field);
+ field = ERR_PTR(ret);
+ goto out;
+}
+
+static void free_synth_tracepoint(struct tracepoint *tp)
+{
+ if (!tp)
+ return;
+
+ kfree(tp->name);
+ kfree(tp);
+}
+
+static struct tracepoint *alloc_synth_tracepoint(char *name)
+{
+ struct tracepoint *tp;
+
+ tp = kzalloc(sizeof(*tp), GFP_KERNEL);
+ if (!tp)
+ return ERR_PTR(-ENOMEM);
+
+ tp->name = kstrdup(name, GFP_KERNEL);
+ if (!tp->name) {
+ kfree(tp);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ return tp;
+}
+
+struct synth_event *find_synth_event(const char *name)
+{
+ struct dyn_event *pos;
+ struct synth_event *event;
+
+ for_each_dyn_event(pos) {
+ if (!is_synth_event(pos))
+ continue;
+ event = to_synth_event(pos);
+ if (strcmp(event->name, name) == 0)
+ return event;
+ }
+
+ return NULL;
+}
+
+static struct trace_event_fields synth_event_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = synth_event_define_fields },
+ {}
+};
+
+static int register_synth_event(struct synth_event *event)
+{
+ struct trace_event_call *call = &event->call;
+ int ret = 0;
+
+ event->call.class = &event->class;
+ event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
+ if (!event->class.system) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ event->tp = alloc_synth_tracepoint(event->name);
+ if (IS_ERR(event->tp)) {
+ ret = PTR_ERR(event->tp);
+ event->tp = NULL;
+ goto out;
+ }
+
+ INIT_LIST_HEAD(&call->class->fields);
+ call->event.funcs = &synth_event_funcs;
+ call->class->fields_array = synth_event_fields_array;
+
+ ret = register_trace_event(&call->event);
+ if (!ret) {
+ ret = -ENODEV;
+ goto out;
+ }
+ call->flags = TRACE_EVENT_FL_TRACEPOINT;
+ call->class->reg = trace_event_reg;
+ call->class->probe = trace_event_raw_event_synth;
+ call->data = event;
+ call->tp = event->tp;
+
+ ret = trace_add_event_call(call);
+ if (ret) {
+ pr_warn("Failed to register synthetic event: %s\n",
+ trace_event_name(call));
+ goto err;
+ }
+
+ ret = set_synth_event_print_fmt(call);
+ if (ret < 0) {
+ trace_remove_event_call(call);
+ goto err;
+ }
+ out:
+ return ret;
+ err:
+ unregister_trace_event(&call->event);
+ goto out;
+}
+
+static int unregister_synth_event(struct synth_event *event)
+{
+ struct trace_event_call *call = &event->call;
+ int ret;
+
+ ret = trace_remove_event_call(call);
+
+ return ret;
+}
+
+static void free_synth_event(struct synth_event *event)
+{
+ unsigned int i;
+
+ if (!event)
+ return;
+
+ for (i = 0; i < event->n_fields; i++)
+ free_synth_field(event->fields[i]);
+
+ kfree(event->fields);
+ kfree(event->name);
+ kfree(event->class.system);
+ free_synth_tracepoint(event->tp);
+ free_synth_event_print_fmt(&event->call);
+ kfree(event);
+}
+
+static struct synth_event *alloc_synth_event(const char *name, int n_fields,
+ struct synth_field **fields)
+{
+ struct synth_event *event;
+ unsigned int i;
+
+ event = kzalloc(sizeof(*event), GFP_KERNEL);
+ if (!event) {
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ event->name = kstrdup(name, GFP_KERNEL);
+ if (!event->name) {
+ kfree(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL);
+ if (!event->fields) {
+ free_synth_event(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ dyn_event_init(&event->devent, &synth_event_ops);
+
+ for (i = 0; i < n_fields; i++)
+ event->fields[i] = fields[i];
+
+ event->n_fields = n_fields;
+ out:
+ return event;
+}
+
+static int synth_event_check_arg_fn(void *data)
+{
+ struct dynevent_arg_pair *arg_pair = data;
+ int size;
+
+ size = synth_field_size((char *)arg_pair->lhs);
+
+ return size ? 0 : -EINVAL;
+}
+
+/**
+ * synth_event_add_field - Add a new field to a synthetic event cmd
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @type: The type of the new field to add
+ * @name: The name of the new field to add
+ *
+ * Add a new field to a synthetic event cmd object. Field ordering is in
+ * the same order the fields are added.
+ *
+ * See synth_field_size() for available types. If field_name contains
+ * [n] the field is considered to be an array.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_add_field(struct dynevent_cmd *cmd, const char *type,
+ const char *name)
+{
+ struct dynevent_arg_pair arg_pair;
+ int ret;
+
+ if (cmd->type != DYNEVENT_TYPE_SYNTH)
+ return -EINVAL;
+
+ if (!type || !name)
+ return -EINVAL;
+
+ dynevent_arg_pair_init(&arg_pair, 0, ';');
+
+ arg_pair.lhs = type;
+ arg_pair.rhs = name;
+
+ ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn);
+ if (ret)
+ return ret;
+
+ if (++cmd->n_fields > SYNTH_FIELDS_MAX)
+ ret = -EINVAL;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_add_field);
+
+/**
+ * synth_event_add_field_str - Add a new field to a synthetic event cmd
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @type_name: The type and name of the new field to add, as a single string
+ *
+ * Add a new field to a synthetic event cmd object, as a single
+ * string. The @type_name string is expected to be of the form 'type
+ * name', which will be appended by ';'. No sanity checking is done -
+ * what's passed in is assumed to already be well-formed. Field
+ * ordering is in the same order the fields are added.
+ *
+ * See synth_field_size() for available types. If field_name contains
+ * [n] the field is considered to be an array.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name)
+{
+ struct dynevent_arg arg;
+ int ret;
+
+ if (cmd->type != DYNEVENT_TYPE_SYNTH)
+ return -EINVAL;
+
+ if (!type_name)
+ return -EINVAL;
+
+ dynevent_arg_init(&arg, ';');
+
+ arg.str = type_name;
+
+ ret = dynevent_arg_add(cmd, &arg, NULL);
+ if (ret)
+ return ret;
+
+ if (++cmd->n_fields > SYNTH_FIELDS_MAX)
+ ret = -EINVAL;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_add_field_str);
+
+/**
+ * synth_event_add_fields - Add multiple fields to a synthetic event cmd
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @fields: An array of type/name field descriptions
+ * @n_fields: The number of field descriptions contained in the fields array
+ *
+ * Add a new set of fields to a synthetic event cmd object. The event
+ * fields that will be defined for the event should be passed in as an
+ * array of struct synth_field_desc, and the number of elements in the
+ * array passed in as n_fields. Field ordering will retain the
+ * ordering given in the fields array.
+ *
+ * See synth_field_size() for available types. If field_name contains
+ * [n] the field is considered to be an array.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_add_fields(struct dynevent_cmd *cmd,
+ struct synth_field_desc *fields,
+ unsigned int n_fields)
+{
+ unsigned int i;
+ int ret = 0;
+
+ for (i = 0; i < n_fields; i++) {
+ if (fields[i].type == NULL || fields[i].name == NULL) {
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_add_fields);
+
+/**
+ * __synth_event_gen_cmd_start - Start a synthetic event command from arg list
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @name: The name of the synthetic event
+ * @mod: The module creating the event, NULL if not created from a module
+ * @args: Variable number of arg (pairs), one pair for each field
+ *
+ * NOTE: Users normally won't want to call this function directly, but
+ * rather use the synth_event_gen_cmd_start() wrapper, which
+ * automatically adds a NULL to the end of the arg list. If this
+ * function is used directly, make sure the last arg in the variable
+ * arg list is NULL.
+ *
+ * Generate a synthetic event command to be executed by
+ * synth_event_gen_cmd_end(). This function can be used to generate
+ * the complete command or only the first part of it; in the latter
+ * case, synth_event_add_field(), synth_event_add_field_str(), or
+ * synth_event_add_fields() can be used to add more fields following
+ * this.
+ *
+ * There should be an even number variable args, each pair consisting
+ * of a type followed by a field name.
+ *
+ * See synth_field_size() for available types. If field_name contains
+ * [n] the field is considered to be an array.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name,
+ struct module *mod, ...)
+{
+ struct dynevent_arg arg;
+ va_list args;
+ int ret;
+
+ cmd->event_name = name;
+ cmd->private_data = mod;
+
+ if (cmd->type != DYNEVENT_TYPE_SYNTH)
+ return -EINVAL;
+
+ dynevent_arg_init(&arg, 0);
+ arg.str = name;
+ ret = dynevent_arg_add(cmd, &arg, NULL);
+ if (ret)
+ return ret;
+
+ va_start(args, mod);
+ for (;;) {
+ const char *type, *name;
+
+ type = va_arg(args, const char *);
+ if (!type)
+ break;
+ name = va_arg(args, const char *);
+ if (!name)
+ break;
+
+ if (++cmd->n_fields > SYNTH_FIELDS_MAX) {
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = synth_event_add_field(cmd, type, name);
+ if (ret)
+ break;
+ }
+ va_end(args);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start);
+
+/**
+ * synth_event_gen_cmd_array_start - Start synthetic event command from an array
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @name: The name of the synthetic event
+ * @fields: An array of type/name field descriptions
+ * @n_fields: The number of field descriptions contained in the fields array
+ *
+ * Generate a synthetic event command to be executed by
+ * synth_event_gen_cmd_end(). This function can be used to generate
+ * the complete command or only the first part of it; in the latter
+ * case, synth_event_add_field(), synth_event_add_field_str(), or
+ * synth_event_add_fields() can be used to add more fields following
+ * this.
+ *
+ * The event fields that will be defined for the event should be
+ * passed in as an array of struct synth_field_desc, and the number of
+ * elements in the array passed in as n_fields. Field ordering will
+ * retain the ordering given in the fields array.
+ *
+ * See synth_field_size() for available types. If field_name contains
+ * [n] the field is considered to be an array.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name,
+ struct module *mod,
+ struct synth_field_desc *fields,
+ unsigned int n_fields)
+{
+ struct dynevent_arg arg;
+ unsigned int i;
+ int ret = 0;
+
+ cmd->event_name = name;
+ cmd->private_data = mod;
+
+ if (cmd->type != DYNEVENT_TYPE_SYNTH)
+ return -EINVAL;
+
+ if (n_fields > SYNTH_FIELDS_MAX)
+ return -EINVAL;
+
+ dynevent_arg_init(&arg, 0);
+ arg.str = name;
+ ret = dynevent_arg_add(cmd, &arg, NULL);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < n_fields; i++) {
+ if (fields[i].type == NULL || fields[i].name == NULL)
+ return -EINVAL;
+
+ ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start);
+
+static int __create_synth_event(int argc, const char *name, const char **argv)
+{
+ struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
+ struct synth_event *event = NULL;
+ int i, consumed = 0, n_fields = 0, ret = 0;
+
+ /*
+ * Argument syntax:
+ * - Add synthetic event: <event_name> field[;field] ...
+ * - Remove synthetic event: !<event_name> field[;field] ...
+ * where 'field' = type field_name
+ */
+
+ if (name[0] == '\0' || argc < 1)
+ return -EINVAL;
+
+ mutex_lock(&event_mutex);
+
+ event = find_synth_event(name);
+ if (event) {
+ ret = -EEXIST;
+ goto out;
+ }
+
+ for (i = 0; i < argc - 1; i++) {
+ if (strcmp(argv[i], ";") == 0)
+ continue;
+ if (n_fields == SYNTH_FIELDS_MAX) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ field = parse_synth_field(argc - i, &argv[i], &consumed);
+ if (IS_ERR(field)) {
+ ret = PTR_ERR(field);
+ goto err;
+ }
+ fields[n_fields++] = field;
+ i += consumed - 1;
+ }
+
+ if (i < argc && strcmp(argv[i], ";") != 0) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ event = alloc_synth_event(name, n_fields, fields);
+ if (IS_ERR(event)) {
+ ret = PTR_ERR(event);
+ event = NULL;
+ goto err;
+ }
+ ret = register_synth_event(event);
+ if (!ret)
+ dyn_event_add(&event->devent);
+ else
+ free_synth_event(event);
+ out:
+ mutex_unlock(&event_mutex);
+
+ return ret;
+ err:
+ for (i = 0; i < n_fields; i++)
+ free_synth_field(fields[i]);
+
+ goto out;
+}
+
+/**
+ * synth_event_create - Create a new synthetic event
+ * @name: The name of the new sythetic event
+ * @fields: An array of type/name field descriptions
+ * @n_fields: The number of field descriptions contained in the fields array
+ * @mod: The module creating the event, NULL if not created from a module
+ *
+ * Create a new synthetic event with the given name under the
+ * trace/events/synthetic/ directory. The event fields that will be
+ * defined for the event should be passed in as an array of struct
+ * synth_field_desc, and the number elements in the array passed in as
+ * n_fields. Field ordering will retain the ordering given in the
+ * fields array.
+ *
+ * If the new synthetic event is being created from a module, the mod
+ * param must be non-NULL. This will ensure that the trace buffer
+ * won't contain unreadable events.
+ *
+ * The new synth event should be deleted using synth_event_delete()
+ * function. The new synthetic event can be generated from modules or
+ * other kernel code using trace_synth_event() and related functions.
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_create(const char *name, struct synth_field_desc *fields,
+ unsigned int n_fields, struct module *mod)
+{
+ struct dynevent_cmd cmd;
+ char *buf;
+ int ret;
+
+ buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
+
+ ret = synth_event_gen_cmd_array_start(&cmd, name, mod,
+ fields, n_fields);
+ if (ret)
+ goto out;
+
+ ret = synth_event_gen_cmd_end(&cmd);
+ out:
+ kfree(buf);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_create);
+
+static int destroy_synth_event(struct synth_event *se)
+{
+ int ret;
+
+ if (se->ref)
+ ret = -EBUSY;
+ else {
+ ret = unregister_synth_event(se);
+ if (!ret) {
+ dyn_event_remove(&se->devent);
+ free_synth_event(se);
+ }
+ }
+
+ return ret;
+}
+
+/**
+ * synth_event_delete - Delete a synthetic event
+ * @event_name: The name of the new sythetic event
+ *
+ * Delete a synthetic event that was created with synth_event_create().
+ *
+ * Return: 0 if successful, error otherwise.
+ */
+int synth_event_delete(const char *event_name)
+{
+ struct synth_event *se = NULL;
+ struct module *mod = NULL;
+ int ret = -ENOENT;
+
+ mutex_lock(&event_mutex);
+ se = find_synth_event(event_name);
+ if (se) {
+ mod = se->mod;
+ ret = destroy_synth_event(se);
+ }
+ mutex_unlock(&event_mutex);
+
+ if (mod) {
+ mutex_lock(&trace_types_lock);
+ /*
+ * It is safest to reset the ring buffer if the module
+ * being unloaded registered any events that were
+ * used. The only worry is if a new module gets
+ * loaded, and takes on the same id as the events of
+ * this module. When printing out the buffer, traced
+ * events left over from this module may be passed to
+ * the new module events and unexpected results may
+ * occur.
+ */
+ tracing_reset_all_online_cpus();
+ mutex_unlock(&trace_types_lock);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_delete);
+
+static int create_or_delete_synth_event(int argc, char **argv)
+{
+ const char *name = argv[0];
+ int ret;
+
+ /* trace_run_command() ensures argc != 0 */
+ if (name[0] == '!') {
+ ret = synth_event_delete(name + 1);
+ return ret;
+ }
+
+ ret = __create_synth_event(argc - 1, name, (const char **)argv + 1);
+ return ret == -ECANCELED ? -EINVAL : ret;
+}
+
+static int synth_event_run_command(struct dynevent_cmd *cmd)
+{
+ struct synth_event *se;
+ int ret;
+
+ ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event);
+ if (ret)
+ return ret;
+
+ se = find_synth_event(cmd->event_name);
+ if (WARN_ON(!se))
+ return -ENOENT;
+
+ se->mod = cmd->private_data;
+
+ return ret;
+}
+
+/**
+ * synth_event_cmd_init - Initialize a synthetic event command object
+ * @cmd: A pointer to the dynevent_cmd struct representing the new event
+ * @buf: A pointer to the buffer used to build the command
+ * @maxlen: The length of the buffer passed in @buf
+ *
+ * Initialize a synthetic event command object. Use this before
+ * calling any of the other dyenvent_cmd functions.
+ */
+void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen)
+{
+ dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH,
+ synth_event_run_command);
+}
+EXPORT_SYMBOL_GPL(synth_event_cmd_init);
+
+static inline int
+__synth_event_trace_start(struct trace_event_file *file,
+ struct synth_event_trace_state *trace_state)
+{
+ int entry_size, fields_size = 0;
+ int ret = 0;
+
+ memset(trace_state, '\0', sizeof(*trace_state));
+
+ /*
+ * Normal event tracing doesn't get called at all unless the
+ * ENABLED bit is set (which attaches the probe thus allowing
+ * this code to be called, etc). Because this is called
+ * directly by the user, we don't have that but we still need
+ * to honor not logging when disabled. For the the iterated
+ * trace case, we save the enabed state upon start and just
+ * ignore the following data calls.
+ */
+ if (!(file->flags & EVENT_FILE_FL_ENABLED) ||
+ trace_trigger_soft_disabled(file)) {
+ trace_state->disabled = true;
+ ret = -ENOENT;
+ goto out;
+ }
+
+ trace_state->event = file->event_call->data;
+
+ fields_size = trace_state->event->n_u64 * sizeof(u64);
+
+ /*
+ * Avoid ring buffer recursion detection, as this event
+ * is being performed within another event.
+ */
+ trace_state->buffer = file->tr->array_buffer.buffer;
+ ring_buffer_nest_start(trace_state->buffer);
+
+ entry_size = sizeof(*trace_state->entry) + fields_size;
+ trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer,
+ file,
+ entry_size);
+ if (!trace_state->entry) {
+ ring_buffer_nest_end(trace_state->buffer);
+ ret = -EINVAL;
+ }
+out:
+ return ret;
+}
+
+static inline void
+__synth_event_trace_end(struct synth_event_trace_state *trace_state)
+{
+ trace_event_buffer_commit(&trace_state->fbuffer);
+
+ ring_buffer_nest_end(trace_state->buffer);
+}
+
+/**
+ * synth_event_trace - Trace a synthetic event
+ * @file: The trace_event_file representing the synthetic event
+ * @n_vals: The number of values in vals
+ * @args: Variable number of args containing the event values
+ *
+ * Trace a synthetic event using the values passed in the variable
+ * argument list.
+ *
+ * The argument list should be a list 'n_vals' u64 values. The number
+ * of vals must match the number of field in the synthetic event, and
+ * must be in the same order as the synthetic event fields.
+ *
+ * All vals should be cast to u64, and string vals are just pointers
+ * to strings, cast to u64. Strings will be copied into space
+ * reserved in the event for the string, using these pointers.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
+{
+ struct synth_event_trace_state state;
+ unsigned int i, n_u64;
+ va_list args;
+ int ret;
+
+ ret = __synth_event_trace_start(file, &state);
+ if (ret) {
+ if (ret == -ENOENT)
+ ret = 0; /* just disabled, not really an error */
+ return ret;
+ }
+
+ if (n_vals != state.event->n_fields) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ va_start(args, n_vals);
+ for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
+ u64 val;
+
+ val = va_arg(args, u64);
+
+ if (state.event->fields[i]->is_string) {
+ char *str_val = (char *)(long)val;
+ char *str_field = (char *)&state.entry->fields[n_u64];
+
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ struct synth_field *field = state.event->fields[i];
+
+ switch (field->size) {
+ case 1:
+ *(u8 *)&state.entry->fields[n_u64] = (u8)val;
+ break;
+
+ case 2:
+ *(u16 *)&state.entry->fields[n_u64] = (u16)val;
+ break;
+
+ case 4:
+ *(u32 *)&state.entry->fields[n_u64] = (u32)val;
+ break;
+
+ default:
+ state.entry->fields[n_u64] = val;
+ break;
+ }
+ n_u64++;
+ }
+ }
+ va_end(args);
+out:
+ __synth_event_trace_end(&state);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_trace);
+
+/**
+ * synth_event_trace_array - Trace a synthetic event from an array
+ * @file: The trace_event_file representing the synthetic event
+ * @vals: Array of values
+ * @n_vals: The number of values in vals
+ *
+ * Trace a synthetic event using the values passed in as 'vals'.
+ *
+ * The 'vals' array is just an array of 'n_vals' u64. The number of
+ * vals must match the number of field in the synthetic event, and
+ * must be in the same order as the synthetic event fields.
+ *
+ * All vals should be cast to u64, and string vals are just pointers
+ * to strings, cast to u64. Strings will be copied into space
+ * reserved in the event for the string, using these pointers.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
+ unsigned int n_vals)
+{
+ struct synth_event_trace_state state;
+ unsigned int i, n_u64;
+ int ret;
+
+ ret = __synth_event_trace_start(file, &state);
+ if (ret) {
+ if (ret == -ENOENT)
+ ret = 0; /* just disabled, not really an error */
+ return ret;
+ }
+
+ if (n_vals != state.event->n_fields) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
+ if (state.event->fields[i]->is_string) {
+ char *str_val = (char *)(long)vals[i];
+ char *str_field = (char *)&state.entry->fields[n_u64];
+
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ struct synth_field *field = state.event->fields[i];
+ u64 val = vals[i];
+
+ switch (field->size) {
+ case 1:
+ *(u8 *)&state.entry->fields[n_u64] = (u8)val;
+ break;
+
+ case 2:
+ *(u16 *)&state.entry->fields[n_u64] = (u16)val;
+ break;
+
+ case 4:
+ *(u32 *)&state.entry->fields[n_u64] = (u32)val;
+ break;
+
+ default:
+ state.entry->fields[n_u64] = val;
+ break;
+ }
+ n_u64++;
+ }
+ }
+out:
+ __synth_event_trace_end(&state);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_trace_array);
+
+/**
+ * synth_event_trace_start - Start piecewise synthetic event trace
+ * @file: The trace_event_file representing the synthetic event
+ * @trace_state: A pointer to object tracking the piecewise trace state
+ *
+ * Start the trace of a synthetic event field-by-field rather than all
+ * at once.
+ *
+ * This function 'opens' an event trace, which means space is reserved
+ * for the event in the trace buffer, after which the event's
+ * individual field values can be set through either
+ * synth_event_add_next_val() or synth_event_add_val().
+ *
+ * A pointer to a trace_state object is passed in, which will keep
+ * track of the current event trace state until the event trace is
+ * closed (and the event finally traced) using
+ * synth_event_trace_end().
+ *
+ * Note that synth_event_trace_end() must be called after all values
+ * have been added for each event trace, regardless of whether adding
+ * all field values succeeded or not.
+ *
+ * Note also that for a given event trace, all fields must be added
+ * using either synth_event_add_next_val() or synth_event_add_val()
+ * but not both together or interleaved.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_trace_start(struct trace_event_file *file,
+ struct synth_event_trace_state *trace_state)
+{
+ int ret;
+
+ if (!trace_state)
+ return -EINVAL;
+
+ ret = __synth_event_trace_start(file, trace_state);
+ if (ret == -ENOENT)
+ ret = 0; /* just disabled, not really an error */
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(synth_event_trace_start);
+
+static int __synth_event_add_val(const char *field_name, u64 val,
+ struct synth_event_trace_state *trace_state)
+{
+ struct synth_field *field = NULL;
+ struct synth_trace_event *entry;
+ struct synth_event *event;
+ int i, ret = 0;
+
+ if (!trace_state) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* can't mix add_next_synth_val() with add_synth_val() */
+ if (field_name) {
+ if (trace_state->add_next) {
+ ret = -EINVAL;
+ goto out;
+ }
+ trace_state->add_name = true;
+ } else {
+ if (trace_state->add_name) {
+ ret = -EINVAL;
+ goto out;
+ }
+ trace_state->add_next = true;
+ }
+
+ if (trace_state->disabled)
+ goto out;
+
+ event = trace_state->event;
+ if (trace_state->add_name) {
+ for (i = 0; i < event->n_fields; i++) {
+ field = event->fields[i];
+ if (strcmp(field->name, field_name) == 0)
+ break;
+ }
+ if (!field) {
+ ret = -EINVAL;
+ goto out;
+ }
+ } else {
+ if (trace_state->cur_field >= event->n_fields) {
+ ret = -EINVAL;
+ goto out;
+ }
+ field = event->fields[trace_state->cur_field++];
+ }
+
+ entry = trace_state->entry;
+ if (field->is_string) {
+ char *str_val = (char *)(long)val;
+ char *str_field;
+
+ if (!str_val) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ str_field = (char *)&entry->fields[field->offset];
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ } else {
+ switch (field->size) {
+ case 1:
+ *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val;
+ break;
+
+ case 2:
+ *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val;
+ break;
+
+ case 4:
+ *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val;
+ break;
+
+ default:
+ trace_state->entry->fields[field->offset] = val;
+ break;
+ }
+ }
+ out:
+ return ret;
+}
+
+/**
+ * synth_event_add_next_val - Add the next field's value to an open synth trace
+ * @val: The value to set the next field to
+ * @trace_state: A pointer to object tracking the piecewise trace state
+ *
+ * Set the value of the next field in an event that's been opened by
+ * synth_event_trace_start().
+ *
+ * The val param should be the value cast to u64. If the value points
+ * to a string, the val param should be a char * cast to u64.
+ *
+ * This function assumes all the fields in an event are to be set one
+ * after another - successive calls to this function are made, one for
+ * each field, in the order of the fields in the event, until all
+ * fields have been set. If you'd rather set each field individually
+ * without regard to ordering, synth_event_add_val() can be used
+ * instead.
+ *
+ * Note however that synth_event_add_next_val() and
+ * synth_event_add_val() can't be intermixed for a given event trace -
+ * one or the other but not both can be used at the same time.
+ *
+ * Note also that synth_event_trace_end() must be called after all
+ * values have been added for each event trace, regardless of whether
+ * adding all field values succeeded or not.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_add_next_val(u64 val,
+ struct synth_event_trace_state *trace_state)
+{
+ return __synth_event_add_val(NULL, val, trace_state);
+}
+EXPORT_SYMBOL_GPL(synth_event_add_next_val);
+
+/**
+ * synth_event_add_val - Add a named field's value to an open synth trace
+ * @field_name: The name of the synthetic event field value to set
+ * @val: The value to set the next field to
+ * @trace_state: A pointer to object tracking the piecewise trace state
+ *
+ * Set the value of the named field in an event that's been opened by
+ * synth_event_trace_start().
+ *
+ * The val param should be the value cast to u64. If the value points
+ * to a string, the val param should be a char * cast to u64.
+ *
+ * This function looks up the field name, and if found, sets the field
+ * to the specified value. This lookup makes this function more
+ * expensive than synth_event_add_next_val(), so use that or the
+ * none-piecewise synth_event_trace() instead if efficiency is more
+ * important.
+ *
+ * Note however that synth_event_add_next_val() and
+ * synth_event_add_val() can't be intermixed for a given event trace -
+ * one or the other but not both can be used at the same time.
+ *
+ * Note also that synth_event_trace_end() must be called after all
+ * values have been added for each event trace, regardless of whether
+ * adding all field values succeeded or not.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_add_val(const char *field_name, u64 val,
+ struct synth_event_trace_state *trace_state)
+{
+ return __synth_event_add_val(field_name, val, trace_state);
+}
+EXPORT_SYMBOL_GPL(synth_event_add_val);
+
+/**
+ * synth_event_trace_end - End piecewise synthetic event trace
+ * @trace_state: A pointer to object tracking the piecewise trace state
+ *
+ * End the trace of a synthetic event opened by
+ * synth_event_trace__start().
+ *
+ * This function 'closes' an event trace, which basically means that
+ * it commits the reserved event and cleans up other loose ends.
+ *
+ * A pointer to a trace_state object is passed in, which will keep
+ * track of the current event trace state opened with
+ * synth_event_trace_start().
+ *
+ * Note that this function must be called after all values have been
+ * added for each event trace, regardless of whether adding all field
+ * values succeeded or not.
+ *
+ * Return: 0 on success, err otherwise.
+ */
+int synth_event_trace_end(struct synth_event_trace_state *trace_state)
+{
+ if (!trace_state)
+ return -EINVAL;
+
+ __synth_event_trace_end(trace_state);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(synth_event_trace_end);
+
+static int create_synth_event(int argc, const char **argv)
+{
+ const char *name = argv[0];
+ int len;
+
+ if (name[0] != 's' || name[1] != ':')
+ return -ECANCELED;
+ name += 2;
+
+ /* This interface accepts group name prefix */
+ if (strchr(name, '/')) {
+ len = str_has_prefix(name, SYNTH_SYSTEM "/");
+ if (len == 0)
+ return -EINVAL;
+ name += len;
+ }
+ return __create_synth_event(argc - 1, name, argv + 1);
+}
+
+static int synth_event_release(struct dyn_event *ev)
+{
+ struct synth_event *event = to_synth_event(ev);
+ int ret;
+
+ if (event->ref)
+ return -EBUSY;
+
+ ret = unregister_synth_event(event);
+ if (ret)
+ return ret;
+
+ dyn_event_remove(ev);
+ free_synth_event(event);
+ return 0;
+}
+
+static int __synth_event_show(struct seq_file *m, struct synth_event *event)
+{
+ struct synth_field *field;
+ unsigned int i;
+
+ seq_printf(m, "%s\t", event->name);
+
+ for (i = 0; i < event->n_fields; i++) {
+ field = event->fields[i];
+
+ /* parameter values */
+ seq_printf(m, "%s %s%s", field->type, field->name,
+ i == event->n_fields - 1 ? "" : "; ");
+ }
+
+ seq_putc(m, '\n');
+
+ return 0;
+}
+
+static int synth_event_show(struct seq_file *m, struct dyn_event *ev)
+{
+ struct synth_event *event = to_synth_event(ev);
+
+ seq_printf(m, "s:%s/", event->class.system);
+
+ return __synth_event_show(m, event);
+}
+
+static int synth_events_seq_show(struct seq_file *m, void *v)
+{
+ struct dyn_event *ev = v;
+
+ if (!is_synth_event(ev))
+ return 0;
+
+ return __synth_event_show(m, to_synth_event(ev));
+}
+
+static const struct seq_operations synth_events_seq_op = {
+ .start = dyn_event_seq_start,
+ .next = dyn_event_seq_next,
+ .stop = dyn_event_seq_stop,
+ .show = synth_events_seq_show,
+};
+
+static int synth_events_open(struct inode *inode, struct file *file)
+{
+ int ret;
+
+ ret = security_locked_down(LOCKDOWN_TRACEFS);
+ if (ret)
+ return ret;
+
+ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+ ret = dyn_events_release_all(&synth_event_ops);
+ if (ret < 0)
+ return ret;
+ }
+
+ return seq_open(file, &synth_events_seq_op);
+}
+
+static ssize_t synth_events_write(struct file *file,
+ const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ return trace_parse_run_command(file, buffer, count, ppos,
+ create_or_delete_synth_event);
+}
+
+static const struct file_operations synth_events_fops = {
+ .open = synth_events_open,
+ .write = synth_events_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static __init int trace_events_synth_init(void)
+{
+ struct dentry *entry = NULL;
+ struct dentry *d_tracer;
+ int err = 0;
+
+ err = dyn_event_register(&synth_event_ops);
+ if (err) {
+ pr_warn("Could not register synth_event_ops\n");
+ return err;
+ }
+
+ d_tracer = tracing_init_dentry();
+ if (IS_ERR(d_tracer)) {
+ err = PTR_ERR(d_tracer);
+ goto err;
+ }
+
+ entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
+ NULL, &synth_events_fops);
+ if (!entry) {
+ err = -ENODEV;
+ goto err;
+ }
+
+ return err;
+ err:
+ pr_warn("Could not create tracefs 'synthetic_events' entry\n");
+
+ return err;
+}
+
+fs_initcall(trace_events_synth_init);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index ab8b6436d53fd1..b8a928e925c70b 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1006,7 +1006,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event,
INIT_LIST_HEAD(&tp->event->class.fields);
INIT_LIST_HEAD(&tp->event->probes);
INIT_LIST_HEAD(&tp->list);
- list_add(&tp->event->probes, &tp->list);
+ list_add(&tp->list, &tp->event->probes);
call = trace_probe_event_call(tp);
call->class = &tp->event->class;
diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
new file mode 100644
index 00000000000000..ac35c45207c4f7
--- /dev/null
+++ b/kernel/trace/trace_synth.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __TRACE_SYNTH_H
+#define __TRACE_SYNTH_H
+
+#include "trace_dynevent.h"
+
+#define SYNTH_SYSTEM "synthetic"
+#define SYNTH_FIELDS_MAX 32
+
+#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
+
+struct synth_field {
+ char *type;
+ char *name;
+ size_t size;
+ unsigned int offset;
+ bool is_signed;
+ bool is_string;
+};
+
+struct synth_event {
+ struct dyn_event devent;
+ int ref;
+ char *name;
+ struct synth_field **fields;
+ unsigned int n_fields;
+ unsigned int n_u64;
+ struct trace_event_class class;
+ struct trace_event_call call;
+ struct tracepoint *tp;
+ struct module *mod;
+};
+
+extern struct synth_event *find_synth_event(const char *name);
+
+#endif /* __TRACE_SYNTH_H */
diff --git a/lib/bug.c b/lib/bug.c
index 8c98af0bf58570..7103440c0ee1af 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -47,6 +47,7 @@
#include <linux/bug.h>
#include <linux/sched.h>
#include <linux/rculist.h>
+#include <linux/ftrace.h>
extern struct bug_entry __start___bug_table[], __stop___bug_table[];
@@ -153,6 +154,8 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
if (!bug)
return BUG_TRAP_TYPE_NONE;
+ disable_trace_on_warning();
+
file = NULL;
line = 0;
warning = 0;
diff --git a/tools/bootconfig/test-bootconfig.sh b/tools/bootconfig/test-bootconfig.sh
index 81b350ffd03fb1..eff16b77d5ebe6 100755
--- a/tools/bootconfig/test-bootconfig.sh
+++ b/tools/bootconfig/test-bootconfig.sh
@@ -124,9 +124,16 @@ for i in samples/good-* ; do
xpass $BOOTCONF -a $i $INITRD
done
+
+echo
+echo "=== Summary ==="
+echo "# of Passed: $(expr $NO - $NG - 1)"
+echo "# of Failed: $NG"
+
echo
if [ $NG -eq 0 ]; then
echo "All tests passed"
else
echo "$NG tests failed"
+ exit 1
fi
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
index 77be6e1f6e7b34..e232059a8ab200 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test field variable support"
echo 'wakeup_latency u64 lat; pid_t pid; int prio; char comm[16]' > synthetic_events
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
index f3eb8aacec0e75..07cfcb8157b68a 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test create synthetic event"
echo 'waking_latency u64 lat pid_t pid' > synthetic_events
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
index d281f056f98099..73e413c2ca268e 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test multiple actions on hist trigger"
echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
TRIGGER1=events/sched/sched_wakeup/trigger
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
index a708f0e7858a02..ebe0ad827f9f29 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
index dfce6932d8be49..2a2ef767249ede 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
index 0035995c21949b..98d73bfb0296ae 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
index f546c1b66a9b1d..01b01b9c4e07b7 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
@@ -12,6 +12,11 @@ if [ ! -f set_event ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
if [ ! -f snapshot ]; then
echo "snapshot is not supported"
exit_unsupported
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
index 8021d60aafec58..c3baa486aeb47d 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported
fi
+if [ ! -f events/sched/sched_process_fork/hist ]; then
+ echo "hist trigger is not supported"
+ exit_unsupported
+fi
+
grep -q "trace(<synthetic_event>" README || exit_unsupported # version issue
echo "Test create synthetic event"