aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/time/clocksource.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-09clocksource: Convert s[n]printf() to sysfs_emit()Li Zhijian1-1/+1
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240314100402.1326582-1-lizhijian@fujitsu.com
2024-04-08clocksource: Make watchdog and suspend-timing multiplication overflow safeAdrian Hunter1-22/+20
Kernel timekeeping is designed to keep the change in cycles (since the last timer interrupt) below max_cycles, which prevents multiplication overflow when converting cycles to nanoseconds. However, if timer interrupts stop, the clocksource_cyc2ns() calculation will eventually overflow. Add protection against that. Simplify by folding together clocksource_delta() and clocksource_cyc2ns() into cycles_to_nsec_safe(). Check against max_cycles, falling back to a slower higher precision calculation. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240325064023.2997-20-adrian.hunter@intel.com
2024-02-21clocksource: Scale the watchdog read retries automaticallyFeng Tang1-6/+4
On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
2024-02-07clocksource: Make clocksource_subsys constRicardo B. Marliere1-1/+1
Now that the driver core can properly handle constant struct bus_type, move the clocksource_subsys variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20240204-bus_cleanup-time-v1-1-207ec18e24b8@marliere.net
2024-01-25clocksource: Skip watchdog check for large watchdog intervalsJiri Wiesner1-1/+24
There have been reports of the watchdog marking clocksources unstable on machines with 8 NUMA nodes: clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_nsec: 14523447520 clocksource: 'tsc' cs_nsec: 14524115132 The measured clocksource skew - the absolute difference between cs_nsec and wd_nsec - was 668 microseconds: cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612 The kernel used 200 microseconds for the uncertainty_margin of both the clocksource and watchdog, resulting in a threshold of 400 microseconds (the md variable). Both the cs_nsec and the wd_nsec value indicate that the readout interval was circa 14.5 seconds. The observed behaviour is that watchdog checks failed for large readout intervals on 8 NUMA node machines. This indicates that the size of the skew was directly proportinal to the length of the readout interval on those machines. The measured clocksource skew, 668 microseconds, was evaluated against a threshold (the md variable) that is suited for readout intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second. The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") was to tighten the threshold for evaluating skew and set the lower bound for the uncertainty_margin of clocksources to twice WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to 125 microseconds to fit the limit of NTP, which is able to use a clocksource that suffers from up to 500 microseconds of skew per second. Both the TSC and the HPET use default uncertainty_margin. When the readout interval gets stretched the default uncertainty_margin is no longer a suitable lower bound for evaluating skew - it imposes a limit that is far stricter than the skew with which NTP can deal. The root causes of the skew being directly proportinal to the length of the readout interval are: * the inaccuracy of the shift/mult pairs of clocksources and the watchdog * the conversion to nanoseconds is imprecise for large readout intervals Prevent this by skipping the current watchdog check if the readout interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin (of the TSC and HPET) corresponds to a limit on clocksource skew of 250 ppm (microseconds of skew per second). To keep the limit imposed by NTP (500 microseconds of skew per second) for all possible readout intervals, the margins would have to be scaled so that the threshold value is proportional to the length of the actual readout interval. As for why the readout interval may get stretched: Since the watchdog is executed in softirq context the expiration of the watchdog timer can get severely delayed on account of a ksoftirqd thread not getting to run in a timely manner. Surely, a system with such belated softirq execution is not working well and the scheduling issue should be looked into but the clocksource watchdog should be able to deal with it accordingly. Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Suggested-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122172350.GA740@incl
2023-07-14clocksource: Handle negative skews in "skew is too large" messagesPaul E. McKenney1-4/+4
The nanosecond-to-millisecond skew computation uses unsigned arithmetic, which produces user-unfriendly large positive numbers for negative skews. Therefore, use signed arithmetic for this computation in order to preserve the negativity. Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com> Reported-by: Feng Tang <feng.tang@intel.com> Fixes: dd029269947a ("clocksource: Improve "skew is too large" messages") Reviewed-by: Feng Tang <feng.tang@intel.com> Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-06-01clocksource: Replace all non-returning strlcpy with strscpyAzeem Shaikh1-1/+1
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Acked-by: John Stultz <jstultz@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230530163546.986188-1-azeemshaikh38@gmail.com
2023-01-24clocksource: Suspend the watchdog temporarily when high read latency detectedFeng Tang1-13/+32
Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. These issues were initially reported by a customer running on a production system, and this problem was reproduced on several generations of Xeon servers, especially when running the stress-ng test. These Xeon servers were not production systems, but they did have the latest steppings and firmware. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Waiman Long <longman@redhat.com> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-05clocksource: Improve "skew is too large" messagesPaul E. McKenney1-0/+8
When clocksource_watchdog() detects excessive clocksource skew compared to the watchdog clocksource, it marks the clocksource under test as unstable and prints several lines worth of message. But that message is unclear to anyone unfamiliar with the code: clocksource: timekeeping watchdog on CPU2: Marking clocksource 'wdtest-ktime' as unstable because the skew is too large: clocksource: 'kvm-clock' wd_nsec: 400744390 wd_now: 612625c2c wd_last: 5fa7f7c66 mask: ffffffffffffffff clocksource: 'wdtest-ktime' cs_nsec: 600744034 cs_now: 173081397a292d4f cs_last: 17308139565a8ced mask: ffffffffffffffff clocksource: 'kvm-clock' (not 'wdtest-ktime') is current clocksource. Therefore, add the following line near the end of that message: Clocksource 'wdtest-ktime' skewed 199999644 ns (199 ms) over watchdog 'kvm-clock' interval of 400744390 ns (400 ms) This new line clearly indicates the amount of skew between the two clocksources, along with the duration of the time interval over which the skew occurred, both in nanoseconds and milliseconds. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Feng Tang <feng.tang@intel.com>
2023-01-03clocksource: Improve read-back-delay messagePaul E. McKenney1-2/+2
When cs_watchdog_read() is unable to get a qualifying clocksource read within the limit set by max_cswd_read_retries, it prints a message and marks the clocksource under test as unstable. But that message is unclear to anyone unfamiliar with the code: clocksource: timekeeping watchdog on CPU13: wd-tsc-wd read-back delay 1000614ns, attempt 3, marking unstable Therefore, add some context so that the message appears as follows: clocksource: timekeeping watchdog on CPU13: wd-tsc-wd excessive read-back delay of 1000614ns vs. limit of 125000ns, wd-wd read-back delay only 27ns, attempt 3, marking tsc unstable Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Feng Tang <feng.tang@intel.com>
2023-01-03clocksource: Loosen clocksource watchdog constraintsPaul E. McKenney1-6/+9
Currently, MAX_SKEW_USEC is set to 100 microseconds, which has worked reasonably well. However, NTP is willing to tolerate 500 microseconds of skew per second, and a clocksource that is good enough for NTP should be good enough for the clocksource watchdog. The watchdog's skew is controlled by MAX_SKEW_USEC and the CLOCKSOURCE_WATCHDOG_MAX_SKEW_US Kconfig option. However, these values are doubled before being associated with a clocksource's ->uncertainty_margin, and the ->uncertainty_margin values of the pair of clocksource's being compared are summed before checking against the skew. Therefore, set both MAX_SKEW_USEC and the default for the CLOCKSOURCE_WATCHDOG_MAX_SKEW_US Kconfig option to 125 microseconds of skew per second, resulting in 500 microseconds of skew per second in the clocksource watchdog's skew comparison. Suggested-by Rik van Riel <riel@surriel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03clocksource: Print clocksource name when clocksource is tested unstableYunying Sun1-2/+2
Some "TSC fall back to HPET" messages appear on systems having more than 2 NUMA nodes: clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable The "hpet" here is misleading the clocksource watchdog is really doing repeated reads of "hpet" in order to check for unrelated delays. Therefore, print the name of the clocksource under test, prefixed by "wd-" and suffixed by "-wd", for example, "wd-tsc-wd". Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld1-1/+1
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld1-1/+1
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-10clocksource: Replace cpumask_weight() with cpumask_empty()Yury Norov1-1/+1
clocksource_verify_percpu() calls cpumask_weight() to check if any bit of a given cpumask is set. This can be done more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220210224933.379149-24-yury.norov@gmail.com
2022-02-01clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEWWaiman Long1-1/+7
A watchdog maximum skew of 100us may still be too small for some systems or archs. It may also be too small when some kernel debug config options are enabled. So add a new Kconfig option CLOCKSOURCE_WATCHDOG_MAX_SKEW_US to allow kernel builders to have more control on the threshold for marking clocksource as unstable. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds1-2/+2
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-15cpumask: replace cpumask_next_* with cpumask_first_* where appropriateYury Norov1-2/+2
cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial. There's no cpumask_first_zero() function, so create it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2021-11-30clocksource: Reduce the default clocksource_watchdog() retries to 2Waiman Long1-1/+1
With the previous patch, there is an extra watchdog read in each retry. Now the total number of clocksource reads is increased to 4 per iteration. In order to avoid increasing the clock skew check overhead, the default maximum number of retries is reduced from 3 to 2 to maintain the same 12 clocksource reads in the worst case. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30clocksource: Avoid accidental unstable marking of clocksourcesWaiman Long1-9/+41
Since commit db3a34e17433 ("clocksource: Retry clock read if long delays detected") and commit 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold"), it is found that tsc clocksource fallback to hpet can sometimes happen on both Intel and AMD systems especially when they are running stressful benchmarking workloads. Of the 23 systems tested with a v5.14 kernel, 10 of them have switched to hpet clock source during the test run. The result of falling back to hpet is a drastic reduction of performance when running benchmarks. For example, the fio performance tests can drop up to 70% whereas the iperf3 performance can drop up to 80%. 4 hpet fallbacks happened during bootup. They were: [ 8.749399] clocksource: timekeeping watchdog on CPU13: hpet read-back delay of 263750ns, attempt 4, marking unstable [ 12.044610] clocksource: timekeeping watchdog on CPU19: hpet read-back delay of 186166ns, attempt 4, marking unstable [ 17.336941] clocksource: timekeeping watchdog on CPU28: hpet read-back delay of 182291ns, attempt 4, marking unstable [ 17.518565] clocksource: timekeeping watchdog on CPU34: hpet read-back delay of 252196ns, attempt 4, marking unstable Other fallbacks happen when the systems were running stressful benchmarks. For example: [ 2685.867873] clocksource: timekeeping watchdog on CPU117: hpet read-back delay of 57269ns, attempt 4, marking unstable [46215.471228] clocksource: timekeeping watchdog on CPU8: hpet read-back delay of 61460ns, attempt 4, marking unstable Commit 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold"), changed the skew margin from 100us to 50us. I think this is too small and can easily be exceeded when running some stressful workloads on a thermally stressed system. So it is switched back to 100us. Even a maximum skew margin of 100us may be too small in for some systems when booting up especially if those systems are under thermal stress. To eliminate the case that the large skew is due to the system being too busy slowing down the reading of both the watchdog and the clocksource, an extra consecutive read of watchdog clock is being done to check this. The consecutive watchdog read delay is compared against WATCHDOG_MAX_SKEW/2. If the delay exceeds the limit, we assume that the system is just too busy. A warning will be printed to the console and the clock skew check is skipped for this round. Fixes: db3a34e17433 ("clocksource: Retry clock read if long delays detected") Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-10clocksource: Replace deprecated CPU-hotplug functions.Sebastian Andrzej Siewior1-3/+3
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-35-bigeasy@linutronix.de
2021-06-22clocksource: Print deviation in nanoseconds when a clocksource becomes unstableFeng Tang1-4/+4
Currently when an unstable clocksource is detected, the raw counters of that clocksource and watchdog will be printed, which can only be understood after some math calculation. So print the delta in nanoseconds as well to make it easier for humans to check the results. [ paulmck: Fix typo. ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210527190124.440372-6-paulmck@kernel.org
2021-06-22clocksource: Provide kernel module to test clocksource watchdogPaul E. McKenney1-2/+4
When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. It would be good to have a way of testing the clocksource watchdog's ability to distinguish between these two causes of clock skew and instability. Therefore, provide a new clocksource-wdtest module selected by a new TEST_CLOCKSOURCE_WATCHDOG Kconfig option. This module has a single module parameter named "holdoff" that provides the number of seconds of delay before testing should start, which defaults to zero when built as a module and to 10 seconds when built directly into the kernel. Very large systems that boot slowly may need to increase the value of this module parameter. This module uses hand-crafted clocksource structures to do its testing, thus avoiding messing up timing for the rest of the kernel and for user applications. This module first verifies that the ->uncertainty_margin field of the clocksource structures are set sanely. It then tests the delay-detection capability of the clocksource watchdog, increasing the number of consecutive delays injected, first provoking console messages complaining about the delays and finally forcing a clock-skew event. Unexpected test results cause at least one WARN_ON_ONCE() console splat. If there are no splats, the test has passed. Finally, it fuzzes the value returned from a clocksource to test the clocksource watchdog's ability to detect time skew. This module checks the state of its clocksource after each test, and uses WARN_ON_ONCE() to emit a console splat if there are any failures. This should enable all types of test frameworks to detect any such failures. This facility is intended for diagnostic use only, and should be avoided on production systems. Reported-by: Chris Mason <clm@fb.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
2021-06-22clocksource: Reduce clocksource-skew thresholdPaul E. McKenney1-10/+38
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed by more than 12.5% in order to be marked unstable. Except that a clock that is skewed by that much is probably destroying unsuspecting software right and left. And given that there are now checks for false-positive skews due to delays between reading the two clocks, it should be possible to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks such as TSC. Therefore, add a new uncertainty_margin field to the clocksource structure that contains the maximum uncertainty in nanoseconds for the corresponding clock. This field may be initialized manually, as it is for clocksource_tsc_early and clocksource_jiffies, which is copied to refined_jiffies. If the field is not initialized manually, it will be computed at clock-registry time as the period of the clock in question based on the scale and freq parameters to __clocksource_update_freq_scale() function. If either of those two parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative to dividing by zero. No matter how the uncertainty_margin field is calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100 microseconds. Note that manually initialized uncertainty_margin fields are not adjusted, but there is a WARN_ON_ONCE() that triggers if any such field is less than twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage production use of the one-nanosecond uncertainty_margin values that are used to test the clock-skew code itself. The actual clock-skew check uses the sum of the uncertainty_margin fields of the two clocksource structures being compared. Integer overflow is avoided because the largest computed value of the uncertainty_margin fields is one billion (10^9), and double that value fits into an unsigned int. However, if someone manually specifies (say) UINT_MAX, they will get what they deserve. Note that the refined_jiffies uncertainty_margin field is initialized to TICK_NSEC, which means that skew checks involving this clocksource will be sufficently forgiving. In a similar vein, the clocksource_tsc_early uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which replicates the current behavior and allows custom setting if needed in order to address the rare skews detected for this clocksource in current mainline. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
2021-06-22clocksource: Limit number of CPUs checked for clock synchronizationPaul E. McKenney1-2/+72
Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU, that clock is checked on all CPUs. This is thorough, but might not be what you want on a system with a few tens of CPUs, let alone a few hundred of them. Therefore, by default check only up to eight randomly chosen CPUs. Also provide a new clocksource.verify_n_cpus kernel boot parameter. A value of -1 says to check all of the CPUs, and a non-negative value says to randomly select that number of CPUs, without concern about selecting the same CPU multiple times. However, make use of a cpumask so that a given CPU will be checked at most once. Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org
2021-06-22clocksource: Check per-CPU clock synchronization when marked unstablePaul E. McKenney1-0/+60
Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? Therefore apply CPU-to-CPU synchronization checking to newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
2021-06-22clocksource: Retry clock read if long delays detectedPaul E. McKenney1-6/+47
When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. Therefore, re-read the watchdog clock on either side of the read from the clock under test. If the watchdog clock shows an excessive time delta between its pair of reads, the reads are retried. The maximum number of retries is specified by a new kernel boot parameter clocksource.max_cswd_read_retries, which defaults to three, that is, up to four reads, one initial and up to three retries. If more than one retry was required, a message is printed on the console (the occasional single retry is expected behavior, especially in guest OSes). If the maximum number of retries is exceeded, the clock under test will be marked unstable. However, the probability of this happening due to various sorts of delays is quite small. In addition, the reason (clock-read delays) for the unstable marking will be apparent. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...
2021-04-07time: Add mechanism to recognize clocksource in time_get_snapshotThomas Gleixner1-0/+2
System time snapshots are not conveying information about the current clocksource which was used, but callers like the PTP KVM guest implementation have the requirement to evaluate the clocksource type to select the appropriate mechanism. Introduce a clocksource id field in struct clocksource which is by default set to CSID_GENERIC (0). Clocksource implementations can set that field to a value which allows to identify the clocksource. Store the clocksource id of the current clocksource in the system_time_snapshot so callers can evaluate which clocksource was used to take the snapshot and act accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong.wu@arm.com
2021-03-22timekeeping, clocksource: Fix various typos in commentsIngo Molnar1-2/+2
Fix ~56 single-word typos in timekeeping & clocksource code comments. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-kernel@vger.kernel.org
2020-10-30timekeeping: remove arch_gettimeoffsetArnd Bergmann1-8/+0
With Arm EBSA110 gone, nothing uses it any more, so the corresponding code and the Kconfig option can be removed. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-06-09clocksource: Remove obsolete ifdefThomas Gleixner1-2/+0
CONFIG_GENERIC_VDSO_CLOCK_MODE was a transitional config switch which got removed after all architectures got converted to the new storage model. But the removal forgot to remove the #ifdef which guards the vdso_clock_mode sanity check, which effectively disables the sanity check. Remove it now. Fixes: f86fd32db706 ("lib/vdso: Cleanup clock mode storage leftovers") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200606221531.845475036@linutronix.de
2020-02-17clocksource: Add common vdso clock mode storageThomas Gleixner1-0/+9
All architectures which use the generic VDSO code have their own storage for the VDSO clock mode. That's pointless and just requires duplicate code. Provide generic storage for it. The new Kconfig symbol is intermediate and will be removed once all architectures are converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.028046322@linutronix.de
2020-02-01clocksource: Prevent double add_timer_on() for watchdog_timerKonstantin Khlebnikov1-2/+9
Kernel crashes inside QEMU/KVM are observed: kernel BUG at kernel/time/timer.c:1154! BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on(). At the same time another cpu got: general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in: __hlist_del at include/linux/list.h:681 (inlined by) detach_timer at kernel/time/timer.c:818 (inlined by) expire_timers at kernel/time/timer.c:1355 (inlined by) __run_timers at kernel/time/timer.c:1686 (inlined by) run_timer_softirq at kernel/time/timer.c:1699 Unfortunately kernel logs are badly scrambled, stacktraces are lost. Printing the timer->function before the BUG_ON() pointed to clocksource_watchdog(). The execution of clocksource_watchdog() can race with a sequence of clocksource_stop_watchdog() .. clocksource_start_watchdog(): expire_timers() detach_timer(timer, true); timer->entry.pprev = NULL; raw_spin_unlock_irq(&base->lock); call_timer_fn clocksource_watchdog() clocksource_watchdog_kthread() or clocksource_unbind() spin_lock_irqsave(&watchdog_lock, flags); clocksource_stop_watchdog(); del_timer(&watchdog_timer); watchdog_running = 0; spin_unlock_irqrestore(&watchdog_lock, flags); spin_lock_irqsave(&watchdog_lock, flags); clocksource_start_watchdog(); add_timer_on(&watchdog_timer, ...); watchdog_running = 1; spin_unlock_irqrestore(&watchdog_lock, flags); spin_lock(&watchdog_lock); add_timer_on(&watchdog_timer, ...); BUG_ON(timer_pending(timer) || !timer->function); timer_pending() -> true BUG() I.e. inside clocksource_watchdog() watchdog_timer could be already armed. Check timer_pending() before calling add_timer_on(). This is sufficient as all operations are synchronized by watchdog_lock. Fixes: 75c5158f70c0 ("timekeeping: Update clocksource with stop_machine") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
2019-06-14clocksource: Move inline keyword to the beginning of function declarationsMathieu Malaterre1-2/+2
The inline keyword was not at the beginning of the function declarations. Fix the following warnings triggered when using W=1: kernel/time/clocksource.c:108:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] kernel/time/clocksource.c:113:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Cc: kernel-janitors@vger.kernel.org Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190524103339.28787-1-malat@debian.org
2018-11-23time: Remove license boilerplateThomas Gleixner1-14/+0
The SPDX identifier defines the license of the files already. No need for the boilerplates. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20181031182253.132458951@linutronix.de
2018-11-23time: Add SPDX license identifiersThomas Gleixner1-0/+1
Update the time(r) core files files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Philippe Ombredanne, Kate Stewart and myself. The data has been created with two independent license scanners and manual inspection. The following files do not contain any direct license information and have been omitted from the big initial SPDX changes: timeconst.bc: The .bc files were not touched time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL As those files do not contain direct license references they fall under the project license, i.e. GPL V2 only. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de
2018-11-23time: Remove useless filenames in top level commentsThomas Gleixner1-5/+0
Remove the pointless filenames in the top level comments. They have no value at all and just occupy space. While at it tidy up some of the comments and remove a stale one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de
2018-10-04clocksource: Provide clocksource_arch_init()Thomas Gleixner1-0/+2
Architectures have extra archdata in the clocksource, e.g. for VDSO support. There are no sanity checks or general initializations for this available. Add support for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130706.973042587@linutronix.de
2018-09-06clocksource: Revert "Remove kthread"Peter Zijlstra1-10/+30
I turns out that the silly spawn kthread from worker was actually needed. clocksource_watchdog_kthread() cannot be called directly from clocksource_watchdog_work(), because clocksource_select() calls timekeeping_notify() which uses stop_machine(). One cannot use stop_machine() from a workqueue() due lock inversions wrt CPU hotplug. Revert the patch but add a comment that explain why we jump through such apparently silly hoops. Fixes: 7197e77abcb6 ("clocksource: Remove kthread") Reported-by: Siegfried Metz <frame@mailbox.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Kevin Shanahan <kevin@shanahan.id.au> Tested-by: viktor_jaegerskuepper@freenet.de Tested-by: Siegfried Metz <frame@mailbox.org> Cc: rafael.j.wysocki@intel.com Cc: len.brown@intel.com Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: bjorn.andersson@linaro.org Link: https://lkml.kernel.org/r/20180905084158.GR24124@hirez.programming.kicks-ass.net
2018-07-19time: Introduce one suspend clocksource to compensate the suspend timeBaolin Wang1-0/+149
On some hardware with multiple clocksources, we have coarse grained clocksources that support the CLOCK_SOURCE_SUSPEND_NONSTOP flag, but which are less than ideal for timekeeping whereas other clocksources can be better candidates but halt on suspend. Currently, the timekeeping core only supports timing suspend using CLOCK_SOURCE_SUSPEND_NONSTOP clocksources if that clocksource is the current clocksource for timekeeping. As a result, some architectures try to implement read_persistent_clock64() using those non-stop clocksources, but isn't really ideal, which will introduce more duplicate code. To fix this, provide logic to allow a registered SUSPEND_NONSTOP clocksource, which isn't the current clocksource, to be used to calculate the suspend time. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> [jstultz: minor tweaks to merge with previous resume changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-05-16clocksource: Move inline keyword to the beginning of function declarationsMathieu Malaterre1-2/+2
The inline keyword was not at the beginning of the function declarations. Fix the following warnings triggered when using W=1: kernel/time/clocksource.c:456:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] kernel/time/clocksource.c:457:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/20180516195943.31924-1-malat@debian.org
2018-05-02clocksource: Remove kthreadPeter Zijlstra1-21/+10
The clocksource watchdog uses a work to spawn a kthread to run the watchdog. That is about as silly as it sounds, run the watchdog directly from the work. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Link: https://lkml.kernel.org/r/20180430100344.713862818@infradead.org
2018-05-02clocksource: Rework stale commentPeter Zijlstra1-4/+2
AFAICS the hotplug code no longer uses this function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Link: https://lkml.kernel.org/r/20180430100344.656525644@infradead.org
2018-05-02clocksource: Consistent de-rate when marking unstablePeter Zijlstra1-1/+8
When a registered clocksource gets marked unstable the watchdog_kthread will de-rate and re-select the clocksource. Ensure it also de-rates when getting called on an unregistered clocksource. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180430100344.594904898@infradead.org
2018-05-02clocksource: Initialize cs->wd_listPeter Zijlstra1-0/+2
A number of places relies on list_empty(&cs->wd_list), however the list_head does not get initialized. Do so upon registration, such that thereafter it is possible to rely on list_empty() correctly reflecting the list membership status. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Diego Viola <diego.viola@gmail.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org
2018-05-02clocksource: Allow clocksource_mark_unstable() on unregistered clocksourcesPeter Zijlstra1-16/+34
Because of how the code flips between tsc-early and tsc clocksources it might need to mark one or both unstable. The current code in mark_tsc_unstable() only worked because previously it registered the tsc clocksource once and then never touched it. Since it now unregisters the tsc-early clocksource, it needs to know if a clocksource got unregistered and the current cs->mult test doesn't work for that. Instead use list_empty(&cs->list) to test for registration. Furthermore, since clocksource_mark_unstable() needs to place the cs on the wd_list, it links the cs->list and cs->wd_list serialization. It must not see a clocsource registered (!empty cs->list) but already past dequeue_watchdog(). So place {en,de}queue{,_watchdog}() under the same lock. Provided cs->list is initialized to empty, this then allows us to unconditionally use clocksource_mark_unstable(), regardless of the registration state. Fixes: aa83c45762a2 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Diego Viola <diego.viola@gmail.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180502135312.GS12217@hirez.programming.kicks-ass.net
2018-02-28clocksource: Use ATTRIBUTE_GROUPSBaolin Wang1-11/+10
Use ATTRIBUTE_GROUPS instead of manually creating the individual device files. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: sboyd@codeaurora.org Cc: broonie@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/d80dccb981dc2461781ebb8d71a32ccdc1b0e6f9.1516167691.git.baolin.wang@linaro.org
2018-02-28clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributesBaolin Wang1-26/+17
Convert DEVICE_ATTR to DEVICE_ATTR_RW/RO/WO which is the preferred and simpler way of implementation. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: sboyd@codeaurora.org Cc: broonie@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/8f35c77e753e957b61187e8e7b2e4a3d61e4a72b.1516167691.git.baolin.wang@linaro.org
2018-02-28clocksource: Don't walk the clocksource list for empty overrideBaolin Wang1-0/+4
If the override clocksource name is empty there is no point in walking the clocksource list for a match. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: sboyd@codeaurora.org Cc: broonie@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/069ce2a605546bcad6552968cff755f0a03f9f10.1516167691.git.baolin.wang@linaro.org
2017-11-21treewide: setup_timer() -> timer_setup()Kees Cook1-2/+2
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21treewide: init_timer() -> setup_timer()Kees Cook1-2/+1
This mechanically converts all remaining cases of ancient open-coded timer setup with the old setup_timer() API, which is the first step in timer conversions. This has no behavioral changes, since it ultimately just changes the order of assignment to fields of struct timer_list when finding variations of: init_timer(&t); f.function = timer_callback; t.data = timer_callback_arg; to be converted into: setup_timer(&t, timer_callback, timer_callback_arg); The conversion is done with the following Coccinelle script, which is an improved version of scripts/cocci/api/setup_timer.cocci, in the following ways: - assignments-before-init_timer() cases - limit the .data case removal to the specific struct timer_list instance - handling calls by dereference (timer->field vs timer.field) spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/setup_timer.cocci @fix_address_of@ expression e; @@ init_timer( -&(e) +&e , ...) // Match the common cases first to avoid Coccinelle parsing loops with // "... when" clauses. @match_immediate_function_data_after_init_timer@ expression e, func, da; @@ -init_timer +setup_timer ( \(&e\|e\) +, func, da ); ( -\(e.function\|e->function\) = func; -\(e.data\|e->data\) = da; | -\(e.data\|e->data\) = da; -\(e.function\|e->function\) = func; ) @match_immediate_function_data_before_init_timer@ expression e, func, da; @@ ( -\(e.function\|e->function\) = func; -\(e.data\|e->data\) = da; | -\(e.data\|e->data\) = da; -\(e.function\|e->function\) = func; ) -init_timer +setup_timer ( \(&e\|e\) +, func, da ); @match_function_and_data_after_init_timer@ expression e, e2, e3, e4, e5, func, da; @@ -init_timer +setup_timer ( \(&e\|e\) +, func, da ); ... when != func = e2 when != da = e3 ( -e.function = func; ... when != da = e4 -e.data = da; | -e->function = func; ... when != da = e4 -e->data = da; | -e.data = da; ... when != func = e5 -e.function = func; | -e->data = da; ... when != func = e5 -e->function = func; ) @match_function_and_data_before_init_timer@ expression e, e2, e3, e4, e5, func, da; @@ ( -e.function = func; ... when != da = e4 -e.data = da; | -e->function = func; ... when != da = e4 -e->data = da; | -e.data = da; ... when != func = e5 -e.function = func; | -e->data = da; ... when != func = e5 -e->function = func; ) ... when != func = e2 when != da = e3 -init_timer +setup_timer ( \(&e\|e\) +, func, da ); @r1 exists@ expression t; identifier f; position p; @@ f(...) { ... when any init_timer@p(\(&t\|t\)) ... when any } @r2 exists@ expression r1.t; identifier g != r1.f; expression e8; @@ g(...) { ... when any \(t.data\|t->data\) = e8 ... when any } // It is dangerous to use setup_timer if data field is initialized // in another function. @script:python depends on r2@ p << r1.p; @@ cocci.include_match(False) @r3@ expression r1.t, func, e7; position r1.p; @@ ( -init_timer@p(&t); +setup_timer(&t, func, 0UL); ... when != func = e7 -t.function = func; | -t.function = func; ... when != func = e7 -init_timer@p(&t); +setup_timer(&t, func, 0UL); | -init_timer@p(t); +setup_timer(t, func, 0UL); ... when != func = e7 -t->function = func; | -t->function = func; ... when != func = e7 -init_timer@p(t); +setup_timer(t, func, 0UL); ) Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-15x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide ↵Peter Zijlstra1-0/+3
stable sync points Currently we keep sched_clock_tick() active for stable TSC in order to keep the per-CPU state semi up-to-date. The (obvious) problem is that by the time we detect TSC is borked, our per-CPU state is also borked. So hook into the clocksource watchdog and call a method after we've found it to still be stable. There's the obvious race where the TSC goes wonky between finding it stable and us running the callback, but closing that is too much work and not really worth it, since we're already detecting TSC wobbles after the fact, so we cannot, per definition, fully avoid funny clock values. And since the watchdog runs less often than the tick, this is also an optimization. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/clock, clocksource: Add optional cs::mark_unstable() methodThomas Gleixner1-0/+4
PeterZ reported that we'd fail to mark the TSC unstable when the clocksource watchdog finds it unsuitable. Allow a clocksource to run a custom action when its being marked unstable and hook up the TSC unstable code. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner1-1/+1
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-07clocksource: export the clocks_calc_mult_shift to use by timestamp codeMurali Karicheri1-0/+1
The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and requires to know mult and shift factors for timestamp conversion from raw value to nanoseconds (ptp clock). Now these mult and shift factors are calculated manually and provided through DT, which makes very hard to support of a lot number of platforms, especially if CPTS refclk is not the same for some kind of boards and depends on efuse settings (Keystone 2 platforms). Hence, export clocks_calc_mult_shift() to allow drivers like CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of mult and shift factors. Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-31clocksource: Defer override invalidation unless clock is unstableKyle Walker1-3/+12
Clocksources don't get the VALID_FOR_HRES flag until they have been checked by a watchdog. However, when using an override, the clocksource_select logic will clear the override value if the clocksource is not marked VALID_FOR_HRES during that inititial check. When using the boot arguments clocksource=<foo>, this selection can run before the watchdog, and can cause the override to be incorrectly cleared. To address this condition, the override_name is only invalidated for unstable clocksources. Otherwise, the override is left intact until after the watchdog has validated the clocksource as stable/unstable. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Kyle Walker <kwalker@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-06-20clocksource: Make clocksource insert entry more efficientMinfei Huang1-3/+5
In clocksource_enqueue(), it is unnecessary to continue looping the list, if we find there is an entry that the value of rating is smaller than the new one. It is safe to be out the loop, because all of entry are inserted in descending order. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-01-27clocksource: Allow unregistering the watchdogVitaly Kuznetsov1-10/+42
Hyper-V vmbus module registers TSC page clocksource when loaded. This is the clocksource with the highest rating and thus it becomes the watchdog making unloading of the vmbus module impossible. Separate clocksource_select_watchdog() from clocksource_enqueue_watchdog() and use it on clocksource register/rating change/unregister. After all, lobotomized monkeys may need some love too. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1453483913-25672-1-git-send-email-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-07clocksource: Add CPU info to clocksource watchdog reportingSeiichi Ikarashi1-2/+2
The clocksource watchdog reporting was improved by 0b046b217ad4c6. I want to add the info of CPU where the watchdog detects a deviation because it is necessary to identify the trouble spot if the clocksource is TSC. Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com> [jstultz: Tweaked commit message] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-11-09remove abs64()Andrew Morton1-1/+1
Switch everything to the new and more capable implementation of abs(). Mainly to give the new abs() a bit of a workout. Cc: Michal Nazarewicz <mina86@mina86.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-12Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new ↵Ingo Molnar1-1/+1
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-11clocksource: Remove return statement from void functionsGuillaume Gomez1-3/+2
Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%2BDagCF_pfFw0p5xH_BKkKEog@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-02clocksource: Fix abs() usage w/ 64bit valuesJohn Stultz1-1/+1
This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22time: Fix spelling in commentsZhen Lei1-1/+1
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Xunlei Pang <pang.xunlei@linaro.org> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com [ Fixed yet another typo in one of the sentences fixed. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-10clocksource: Use current logging styleJoe Perches1-12/+12
clocksource messages aren't prefixed in dmesg so it's a bit unclear what subsystem emits the messages. Use pr_fmt and pr_<level> to auto-prefix the messages appropriately. Miscellanea: o Remove "Warning" from KERN_WARNING level messages o Align "timekeeping watchdog: " messages o Coalesce formats o Align multiline arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-03clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety ↵John Stultz1-2/+5
margin Ingo noted that the description of clocks_calc_max_nsecs()'s 50% safety margin was somewhat circular. So this patch tries to improve the comment to better explain what we mean by the 50% safety margin and why we need it. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01tick: Move core only declarations and functions to coreThomas Gleixner1-1/+1
No point to expose everything to the world. People just believe such functions can be abused for whatever purposes. Sigh. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased on top of 4.0-rc5 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan [ Merged to latest timers/core ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01tick: Move clocksource related stuff to timekeeping.hThomas Gleixner1-1/+1
Move clocksource related stuff to timekeeping.h and remove the pointless include from ntp.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/2714218.nM5AEfAHj0@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13clocksource: Rename __clocksource_updatefreq_*() to ↵John Stultz1-5/+6
__clocksource_update_freq_*() Ingo requested this function be renamed to improve readability, so I've renamed __clocksource_updatefreq_scale() as well as the __clocksource_updatefreq_hz/khz() functions to avoid squishedtogethernames. This touches some of the sh clocksources, which I've not tested. The arch/arm/plat-omap change is just a comment change for consistency. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13clocksource: Add some debug info about clocksources being registeredJohn Stultz1-0/+3
Print the mask, max_cycles, and max_idle_ns values for clocksources being registered. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13clocksource: Mostly kill clocksource_register()John Stultz1-47/+34
A long running project has been to clean up remaining uses of clocksource_register(), replacing it with the simpler clocksource_register_khz/hz() functions. However, there are a few cases where we need to self-define our mult/shift values, so switch the function to a more obviously internal __clocksource_register() name, and consolidate much of the internal logic so we don't have duplication. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stultz@linaro.org [ Minor cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13clocksource: Improve clocksource watchdog reportingJohn Stultz1-9/+9
The clocksource watchdog reporting has been less helpful then desired, as it just printed the delta between the two clocksources. This prevents any useful analysis of why the skew occurred. Thus this patch tries to improve the output when we mark a clocksource as unstable, printing out the cycle last and now values for both the current clocksource and the watchdog clocksource. This will allow us to see if the result was due to a false positive caused by a problematic watchdog. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stultz@linaro.org [ Minor cleanups of kernel messages. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-12clocksource: Add 'max_cycles' to 'struct clocksource'John Stultz1-12/+16
In order to facilitate clocksource validation, add a 'max_cycles' field to the clocksource structure which will hold the maximum cycle value that can safely be multiplied without potentially causing an overflow. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-12clocksource: Simplify the logic around clocksource wrapping safety marginsJohn Stultz1-14/+12
The clocksource logic has a number of places where we try to include a safety margin. Most of these are 12% safety margins, but they are inconsistently applied and sometimes are applied on top of each other. Additionally, in the previous patch, we corrected an issue where we unintentionally in effect created a 50% safety margin, which these 12.5% margins where then added to. So to simplify the logic here, this patch removes the various 12.5% margins, and consolidates adding the margin in one place: clocks_calc_max_nsecs(). Additionally, Linus prefers a 50% safety margin, as it allows bad clock values to be more easily caught. This should really have no net effect, due to the corrected issue earlier which caused greater then 50% margins to be used w/o issue. Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for the sched_clock.c bit) Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-12clocksource: Simplify the clocks_calc_max_nsecs() logicJohn Stultz1-12/+3
The previous clocks_calc_max_nsecs() code had some unecessarily complex bit logic to find the max interval that could cause multiplication overflows. Since this is not in the hot path, just do the divide to make it easier to read. The previous implementation also had a subtle issue that it avoided overflows with signed 64-bit values, where as the intervals are always unsigned. This resulted in overly conservative intervals, which other safety margins were then added to, reducing the intended interval length. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-30time: move the timecounter/cyclecounter code into its own file.Richard Cochran1-76/+0
The timecounter code has almost nothing to do with the clocksource code. Let it live in its own file. This will help isolate the timecounter users from the clocksource users in the source tree. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29clocksource: Fix 'clcoksource' typo in commentJames Hartley1-1/+1
Simple typo in a comment, so fix it. Signed-off-by: James Hartley <james.hartley@imgtec.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-07-23clocksource: Make delta calculation a functionThomas Gleixner1-5/+7
We want to move the TSC sanity check into core code to make NMI safe accessors to clock monotonic[_raw] possible. For this we need to sanity check the delta calculation. Create a helper function and convert all sites to use it. [ Build fix from jstultz ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-18time: Fix signedness bug in sysfs_get_uname() and its callersPatrick Palka1-1/+1
sysfs_get_uname() is erroneously declared as returning size_t even though it may return a negative value, specifically -EINVAL. Its callers then check whether its return value is less than zero and indeed that is never the case for size_t. This patch changes sysfs_get_uname() to return ssize_t and makes sure its callers use ssize_t accordingly. Signed-off-by: Patrick Palka <patrick@parcs.ath.cx> [jstultz: Didn't apply cleanly, as a similar partial fix was also applied so had to resolve the collisions] Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-09-17clocksource: Fix 'ret' data type of sysfs_override_clocksource() and ↵Elad Wexler1-2/+2
sysfs_unbind_clocksource() sysfs_override_clocksource(): The expression 'if (ret >= 0)' is always true. This will cause clocksource_select() to always run. Thus modified ret to be of type ssize_t. sysfs_unbind_clocksource(): The expression 'if (ret < 0)' is always false. So in case sysfs_get_uname() failed, the expression won't take an effect. Thus modified ret to be of type ssize_t. Signed-off-by: Elad Wexler <elad.wexler@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-09-16Merge branch 'fortglx/3.12/time' into fortglx/3.13/timeJohn Stultz1-0/+1
Merge in the timekeeping changes that missed 3.12 Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-30clocksource: Extract max nsec calculation into separate functionStephen Boyd1-15/+30
We need to calculate the same number in the clocksource code and the sched_clock code, so extract this code into its own function. We also drop the min_t and just use min() because the two types are the same. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-22clocksource: Fix !CONFIG_CLOCKSOURCE_WATCHDOG compilePrarit Bhargava1-0/+1
If I explicitly disable the clocksource watchdog in the x86 Kconfig, the x86 kernel will not compile unless this is properly defined. Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-05clocksource: Reselect clocksource when watchdog validated high-res capabilityThomas Gleixner1-15/+42
Up to commit 5d33b883a (clocksource: Always verify highres capability) we had no sanity check when selecting a clocksource, which prevented that a non highres capable clocksource is used when the system already switched to highres/nohz mode. The new sanity check works as Alex and Tim found out. It prevents the TSC from being used. This happens because on x86 the boot process looks like this: tsc_start_freqency_validation(TSC); clocksource_register(HPET); clocksource_done_booting(); clocksource_select() Selects HPET which is valid for high-res switch_to_highres(); clocksource_register(TSC); TSC is not selected, because it is not yet flagged as VALID_HIGH_RES clocksource_watchdog() Validates TSC for highres, but that does not make TSC the current clocksource. Before the sanity check was added, we installed TSC unvalidated which worked most of the time. If the TSC was really detected as unstable, then the unstable logic removed it and installed HPET again. The sanity check is correct and needed. So the watchdog needs to kick a reselection of the clocksource, when it qualifies TSC as a valid high res clocksource. To solve this, we mark the clocksource which got the flag CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog thread evaluates the flag and invokes clocksource_select() when set. To avoid that the clocksource_done_booting() code, which is about to install the first real clocksource anyway, needs to go through clocksource_select and tick_oneshot_notify() pointlessly, split out the clocksource_watchdog_kthread() list walk code and invoke the select/notify only when called from clocksource_watchdog_kthread(). So clocksource_done_booting() can utilize the same splitout code without the select/notify invocation and the clocksource_mutex unlock/relock dance. Reported-and-tested-by: Alex Shi <alex.shi@intel.com> Cc: Hans Peter Anvin <hpa@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@kernel.org> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-28clocksource: Implement clocksource_select_fallback() for ↵Thomas Gleixner1-0/+1
CONFIG_ARCH_USES_GETTIMEOFFSET=y commit 7eaeb34305 (clocksource: Provide unbind interface in sysfs) implemented clocksource_select_fallback() which is not defined for CONFIG_ARCH_USES_GETTIMEOFFSET=y. Add an empty inline function for that. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clockevents: Implement unbind functionalityThomas Gleixner1-4/+5
Provide a sysfs interface to allow unbinding of clockevent devices. The device is unbound if it is unused or if there is a replacement device available. Unbinding of broadcast devices is not supported as we don't want to foster that nonsense. If no replacement device is available the unbind returns -EBUSY. Unbind is available from the kernel and through sysfs, which is necessary to drop the module refcount. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143436.499216659@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Let clocksource_unregister() return success/errorThomas Gleixner1-21/+12
The unregister call can fail, if the clocksource is the current one and there is no replacement clocksource available. It can also fail, if the clocksource is the watchdog clocksource and I'm not going to provide support for this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143436.029915527@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Provide unbind interface in sysfsThomas Gleixner1-0/+73
With the module refcount held for the current clocksource there is no way to unload the module. Provide a sysfs interface which allows to unbind the clocksource. One could argue that the clocksource override could be (ab)used to do so, but the clocksource override cannot be used from the kernel itself, while an unbind function can be used to programmatically check whether a clocksource can be shutdown or not. The unbind functionality uses the new skip current feature of clocksource_select and verifies that a fallback clocksource has been installed. If the clocksource which should be unbound is the current clocksource and no fallback can be found, unbind returns -EBUSY. This does not support the unbinding of a clocksource which is used as the watchdog clocksource. No point in fostering crappy hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.964218245@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Split out user string inputThomas Gleixner1-14/+23
Split out the user string input for clocksource override. Preparatory patch for unbind. [ jstultz: Fix an off by one error ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.895851338@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Allow clocksource select to skip current clocksourceThomas Gleixner1-11/+20
Preparatory patch for clocksource unbind support. Split out code from clocksource_select and modify it, so it skips the current clocksource on request and tries to find a fallback clocksource. Convert all existing users. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.834965397@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Let timekeeping_notify return success/errorThomas Gleixner1-3/+3
timekeeping_notify() can fail due cs->enable() failure. Though the caller does not notice and happily keeps the wrong clocksource as the current one. Let the caller know about failure, so the current clocksource will be shown correctly in sysfs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.696321912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Always verify highres capabilityThomas Gleixner1-5/+26
If a clocksource has a (wrong) high rating, but can't be used as a timebase for oneshot tick mode, it is unconditionally selected even when the system is already in oneshot tick mode. This causes full system failure. Verify the clocksource selection against the oneshot mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.635040849@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-23kernel-time: fix s/then/than/ spelling errorsJim Cromie1-1/+1
Use than for comparisons, like more than. CC: John Stultz <john.stultz@linaro.org> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-06Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman1-18/+19
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-19time/clocksource: Fix kernel-doc warningsKusanagi Kouichi1-3/+9
Fix various KernelDoc build warnings. Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Cc: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20111219091320.0D5AF6FC03D@msa105.auone-net.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-14clocksource: convert sysdev_class to a regular subsystemKay Sievers1-18/+19
After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-01clocksource: Fix bug with max_deferment margin calculationYang Honggang (Joseph)1-2/+2
In order to leave a margin of 12.5% we should >> 3 not >> 5. CC: stable@kernel.org Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com> [jstultz: Modified commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-10clocksource: Avoid selecting mult values that might overflow when adjustedJohn Stultz1-10/+48
For some frequencies, the clocks_calc_mult_shift() function will unfortunately select mult values very close to 0xffffffff. This has the potential to overflow when NTP adjusts the clock, adding to the mult value. This patch adds a clocksource.maxadj value, which provides an approximation of an 11% adjustment(NTP limits adjustments to 500ppm and the tick adjustment is limited to 10%), which could be made to the clocksource.mult value. This is then used to both check that the current mult value won't overflow/underflow, as well as warning us if the timekeeping_adjust() code pushes over that 11% boundary. v2: Fix max_adjustment calculation, and improve WARN_ONCE messages. v3: Don't warn before maxadj has actually been set CC: Yong Zhang <yong.zhang0@gmail.com> CC: David Daney <ddaney.cavm@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Chen Jie <chenj@lemote.com> CC: zhangfx <zhangfx@lemote.com> CC: stable@kernel.org Reported-by: Chen Jie <chenj@lemote.com> Reported-by: zhangfx <zhangfx@lemote.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-09-13clocksource: Make watchdog reset locklessThomas Gleixner1-20/+18
KGDB needs to trylock watchdog_lock when trying to reset the clocksource watchdog after the system has been stopped to avoid a potential deadlock. When the trylock fails TSC usually becomes unstable. We can be more clever by using an atomic counter and checking it in the clocksource_watchdog callback. We restart the watchdog whenever the counter is > 0 and only decrement the counter when we ran through a full update cycle. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Acked-by: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109121326280.2723@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-06-16clocksource: Make watchdog robust vs. interruptionThomas Gleixner1-11/+13
The clocksource watchdog code is interruptible and it has been observed that this can trigger false positives which disable the TSC. The reason is that an interrupt storm or a long running interrupt handler between the read of the watchdog source and the read of the TSC brings the two far enough apart that the delta is larger than the unstable treshold. Move both reads into a short interrupt disabled region to avoid that. Reported-and-tested-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2011-05-20clockevents/source: Use u64 to make 32bit happyThomas Gleixner1-1/+1
unsigned long is not 64bit on 32bit machine. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-19Merge branch 'timers-clockevents-for-linus' of ↵Linus Torvalds1-19/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-clockevents-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Cleanup the clockevents init and register code x86: Convert PIT to clockevents_config_and_register() clockevents: Provide interface to reconfigure an active clock event device clockevents: Provide combined configure and register function clockevents: Restructure clock_event_device members clocksource: Get rid of the hardcoded 5 seconds sleep time limit clocksource: Restructure clocksource struct members
2011-05-19clocksource: Get rid of the hardcoded 5 seconds sleep time limitThomas Gleixner1-19/+19
Slow clocksources can have a way longer sleep time than 5 seconds and even fast ones can easily cope with 600 seconds and still maintain proper accuracy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Reviewed-by: Ingo Molnar <mingo@elte.hu> Link: http://lkml.kernel.org/r/%3C20110518210136.109811585%40linutronix.de%3E
2011-05-05clocksource: Install completely before selectingjohn stultz1-2/+2
Christian Hoffmann reported that the command line clocksource override with acpi_pm timer fails: Kernel command line: <SNIP> clocksource=acpi_pm hpet clockevent registered Switching to clocksource hpet Override clocksource acpi_pm is not HRT compatible. Cannot switch while in HRT/NOHZ mode. The watchdog code is what enables CLOCK_SOURCE_VALID_FOR_HRES, but we actually end up selecting the clocksource before we enqueue it into the watchdog list, so that's why we see the warning and fail to switch to acpi_pm timer as requested. That's particularly bad when we want to debug timekeeping related problems in early boot. Put the selection call last. Reported-by: Christian Hoffmann <email@christianhoffmann.info> Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: stable@kernel.org # 32... Link: http://lkml.kernel.org/r/%3C1304558210.2943.24.camel%40work-vm%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-15Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', ↵Linus Torvalds1-4/+4
'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: avoid pointless blocked-task warnings rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi() * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: Add missing Kconfig dependencies x86, mrst: Set correct APB timer IRQ affinity for secondary cpu x86: tsc: Fix calibration refinement conditionals to avoid divide by zero x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timekeeping: Make local variables static time: Rename misnamed minsec argument of clocks_calc_mult_shift() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Remove syscall_exit_fields tracing: Only process module tracepoints once perf record: Add "nodelay" mode, disabled by default perf sched: Fix list of events, dropping unsupported ':r' modifier Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return" perf top: Fix annotate segv perf evsel: Fix order of event list deletion
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-12time: Rename misnamed minsec argument of clocks_calc_mult_shift()Nicolas Pitre1-4/+4
The minsec argument to clocks_calc_mult_shift() is misnamed. It is used to clamp the magnitude of the mult factor so that a multiplication with any value in the given range won't overflow a 64 bit result. Let's rename it to match the actual usage. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Acked-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <alpine.LFD.2.00.1101111207140.17086@xanadu.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-22Fix rounding in clocks_calc_mult_shift()john stultz1-0/+1
Russell King reports: | On the ARM dev boards, we have a 32-bit counter running at 24MHz. Calling | clocks_calc_mult_shift(&mult, &shift, 24MHz, NSEC_PER_SEC, 60) gives | us a multiplier of 2796202666 and a shift of 26. | | Over a large counter delta, this produces an error - lets take a count | from 362976315 to 4280663372: | | (4280663372-362976315) * 2796202666 / 2^26 - (4280663372-362976315) * (1000/24) | => -38.91872422891230269990 | | Can we do better? | | (4280663372-362976315) * 2796202667 / 2^26 - (4280663372-362976315) * (1000/24) | 19.45936211449532822051 | | which is about twice as good as the 2796202666 multiplier. | | Looking at the equivalent divisions obtained, 2796202666 / 2^26 gives | 41.66666665673255920410ns per tick, whereas 2796202667 / 2^26 gives | 41.66666667163372039794ns. The actual value wanted is 1000/24 = | 41.66666666666666666666ns. Fix this by ensuring we round to nearest when calculating the multiplier. Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Eric Miao <eric.y.miao@gmail.com> Tested-by: Olof Johansson <olof@lixom.net> Tested-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König1-1/+1
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-27clocksource: Add __clocksource_updatefreq_hz/khz methodsJohn Stultz1-5/+24
To properly handle clocksources that change frequencies at the clocksource->enable() point, this patch adds a method that will update the clocksource's mult/shift and max_idle_ns values. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-12-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27time: Kill off CONFIG_GENERIC_TIMEJohn Stultz1-2/+2
Now that all arches have been converted over to use generic time via clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME config option and simplify the generic code. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-10clocksource: Add clocksource_register_hz/khz interfaceJohn Stultz1-0/+48
How to pick good mult/shift pairs has always been difficult to describe to folks writing clocksource drivers, since it requires careful tradeoffs in adjustment accuracy vs overflow limits. Now, with the clocks_calc_mult_shift function, its much easier. However, not many clocksources have converted to using that function, and there is still the issue of the max interval length assumption being made by each clocksource driver independently. So this patch simplifies the registration process by having clocksources be registered with a hz/khz value and the registration function taking care of setting mult/shift. This should take most of the confusion out of writing a clocksource driver. Additionally it also keeps the shift size tradeoff (more accuracy vs longer possible nohz times) centralized so the timekeeping core can keep track of the assumptions being made. [ tglx: Coding style and comments fixed ] Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1273280858-30143-1-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-03-02timekeeping: Prevent oops when GENERIC_TIME=njohn stultz1-0/+4
Aaro Koskinen reported an issue in kernel.org bugzilla #15366, where on non-GENERIC_TIME systems, accessing /sys/devices/system/clocksource/clocksource0/current_clocksource results in an oops. It seems the timekeeper/clocksource rework missed initializing the curr_clocksource value in the !GENERIC_TIME case. Thanks to Aaro for reporting and diagnosing the issue as well as testing the fix! Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@kernel.org LKML-Reference: <1267475683.4216.61.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-03-01Merge branch 'timers-for-linus' of ↵Linus Torvalds1-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: posix-timers.c: Don't export local functions clocksource: start CMT at clocksource resume clocksource: add suspend callback clocksource: add argument to resume callback ntp: Cleanup xtime references in ntp.c ntp: Make time_esterror and time_maxerror static
2010-02-05clocksource: add suspend callbackMagnus Damm1-0/+12
Add a clocksource suspend callback. This callback can be used by the clocksource driver to shutdown and perform any kind of late suspend activities even though the clocksource driver itself is a non-sysdev driver. One example where this is useful is to fix the sh_cmt.c platform driver that today suspends using the platform bus and shuts down the clocksource too early. With this callback in place the sh_cmt driver will suspend using the clocksource and clockevent hooks and leave the platform device pm callbacks unused. Signed-off-by: Magnus Damm <damm@opensource.se> Cc: Paul Mundt <lethal@linux-sh.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-05clocksource: add argument to resume callbackMagnus Damm1-1/+1
Pass the clocksource as an argument to the clocksource resume callback. Needed so we can point out which CMT channel the sh_cmt.c driver shall resume. Signed-off-by: Magnus Damm <damm@opensource.se> Cc: john stultz <johnstul@us.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-26clocksource: Prevent potential kgdb dead lockThomas Gleixner1-3/+15
commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume logic) introduced a potential kgdb dead lock. When the kernel is stopped by kgdb inside code which holds watchdog_lock then kgdb dead locks in clocksource_resume_watchdog(). clocksource_resume_watchdog() is called from kbdg via clocksource_touch_watchdog() to avoid that the clock source watchdog marks TSC unstable after the kernel has been stopped. Solve this by replacing spin_lock with a spin_trylock and just return in case the lock is held. Not resetting the watchdog might result in TSC becoming marked unstable, but that's an acceptable penalty for using kgdb. The timekeeping is anyway easily screwed up by kgdb when the system uses either jiffies or a clock source which wraps in short intervals (e.g. pm_timer wraps about every 4.6s), so we really do not have to worry about that occasional TSC marked unstable side effect. The second caller of clocksource_resume_watchdog() is clocksource_resume(). The trylock is safe here as well because the system is UP at this point, interrupts are disabled and nothing else can hold watchdog_lock(). Reported-by: Jason Wessel <jason.wessel@windriver.com> LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com> Cc: kgdb-bugreport@lists.sourceforge.net Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-08Merge branch 'timers-for-linus' of ↵Linus Torvalds1-0/+97
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers, init: Limit the number of per cpu calibration bootup messages posix-cpu-timers: optimize and document timer_create callback clockevents: Add missing include to pacify sparse x86: vmiclock: Fix printk format x86: Fix printk format due to variable type change sparc: fix printk for change of variable type clocksource/events: Fix fallout of generic code changes nohz: Allow 32-bit machines to sleep for more than 2.15 seconds nohz: Track last do_timer() cpu nohz: Prevent clocksource wrapping during idle nohz: Type cast printk argument mips: Use generic mult/shift factor calculation for clocks clocksource: Provide a generic mult/shift factor calculation clockevents: Use u32 for mult and shift factors nohz: Introduce arch_needs_cpu nohz: Reuse ktime in sub-functions of tick_check_idle. time: Remove xtime_cache time: Implement logarithmic time accumulation
2009-11-14clocksource/events: Fix fallout of generic code changesThomas Gleixner1-2/+2
powerpc grew a new warning due to the type change of clockevent->mult. The architectures which use parts of the generic time keeping infrastructure tripped over my wrong assumption that clocksource_register is only used when GENERIC_TIME=y. I should have looked and also I should have known better. These renitent Gaul villages are racking my nerves. Some serious deprecating is due. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-13nohz: Prevent clocksource wrapping during idleJon Hunter1-0/+44
The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-13clocksource: Provide a generic mult/shift factor calculationThomas Gleixner1-0/+53
MIPS has two functions to calculcate the mult/shift factors for clock sources and clock events at run time. ARM needs such functions as well. Implement a function which calculates the mult/shift factors based on the frequencies to which and from which is converted. The function also has a parameter to specify the minimum conversion range in seconds. This range is guaranteed not to produce a 64bit overflow when a value is multiplied with the calculated mult factor. The larger the conversion range the less becomes the conversion accuracy. Provide two inline wrappers which handle clock events and clock sources. For clock events the "from" frequency is nano seconds per second which corresponds to 1GHz and "to" is the device frequency. For clock sources "from" is the device frequency and "to" is nano seconds per second. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20091111134229.766673305@linutronix.de>
2009-11-11clocksource/timecompare: Fix symbol exports to be GPL'd.David S. Miller1-3/+3
Noticed by Thomas GLeixner. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-09tree-wide: fix typos "selct" + "slect" -> "select"Uwe Kleine-König1-1/+1
This patch was generated by git grep -E -i -l 's(le|el)ct' | xargs -r perl -p -i -e 's/([Ss])(le|el)ct/$1elect/ with only skipping net/netfilter/xt_SECMARK.c and include/linux/netfilter/xt_SECMARK.h which have a struct member called selctx. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-24clocksource: Resume clocksource without taking the clocksource mutexMartin Schwidefsky1-4/+0
git commit 75c5158f70c065b9 converted the clocksource spinlock to a mutex. This causes the following BUG: BUG: sleeping function called from invalid context at kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473, name: pm-suspend 2 locks held by pm-suspend/2473: #0: (&buffer->mutex){......}, at: [<ffffffff8115ab13>] sysfs_write_file+0x3c/0x137 #1: (pm_mutex){......}, at: [<ffffffff810865b5>] enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31 #1 Call Trace: [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43 [<ffffffff81073537>] clocksource_resume+0x1c/0x60 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf [<ffffffff812aef79>] sysdev_resume+0x6d/0xae [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af [<ffffffff8108665b>] enter_state+0xdf/0x130 [<ffffffff81085dc3>] state_store+0xb6/0xd3 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137 [<ffffffff811057d2>] vfs_write+0xae/0x10b [<ffffffff81208392>] ? __up_read+0x1a/0x7f [<ffffffff811058ef>] sys_write+0x4a/0x6e [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b clocksource_resume is called early in the resume process, there is only one cpu, no processes are running and the interrupts are disabled. It is therefore possible to resume the clocksources without taking the clocksource mutex. Reported-by: Xiaotian Feng <xtfeng@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Xiaotian Feng <xtfeng@gmail.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14clocksource: Delay clocksource down rating to late bootThomas Gleixner1-10/+18
The down rating of clock sources in the early boot process via the clock source watchdog mechanism can happen way before the per cpu event queues are initialized. This leads to a boot crash on x86 when the TSC is marked unstable in the SMP bring up. The selection of a clock source for time keeping happens in the late boot process so we can safely delay the list manipulation until clocksource_done_booting() is called. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-14clocksource: clocksource_select must be called with mutex lockedThomas Gleixner1-0/+2
The callers of clocksource_select must hold clocksource_mutex to protect the clocksource_list. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crashMartin Schwidefsky1-2/+3
The watchdog timer is started after the watchdog clocksource and at least one watched clocksource have been registered. The clocksource work element watchdog_work is initialized just before the clocksource timer is started. This is too late for the clocksource_mark_unstable call from native_cpu_up. To fix this use a static initializer for watchdog_work. This resolves a boot crash reported by multiple people. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090911153305.3fe9a361@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-28clocksource: Resolve cpu hotplug dead lock with TSC unstableThomas Gleixner1-3/+30
Martin Schwidefsky analyzed it: To register a clocksource the clocksource_mutex is acquired and if necessary timekeeping_notify is called to install the clocksource as the timekeeper clock. timekeeping_notify uses stop_machine which needs to take cpu_add_remove_lock mutex. Starting a new cpu is done with the cpu_add_remove_lock mutex held. native_cpu_up checks the tsc of the new cpu and if the tsc is no good clocksource_change_rating is called. Which needs the clocksource_mutex and the deadlock is complete. The solution is to replace the TSC via the clocksource watchdog mechanism. Mark the TSC as unstable and schedule the watchdog work so it gets removed in the watchdog thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>
2009-08-19clocksource: Avoid clocksource watchdog circular locking dependencyMartin Schwidefsky1-3/+16
stop_machine from a multithreaded workqueue is not allowed because of a circular locking dependency between cpu_down and the workqueue execution. Use a kernel thread to do the clocksource downgrade. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090818170942.3ab80c91@skybase> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-19clocksource: Protect the watchdog rating changes with clocksource_mutexThomas Gleixner1-5/+13
Martin pointed out that commit 6ea41d2529 (clocksource: Call clocksource_change_rating() outside of watchdog_lock) has a theoretical reference count problem. The calls to clocksource_change_rating() are now done outside of the clocksource mutex and outside of the watchdog lock. A concurrent clocksource_unregister() could remove the clock. Split out the code which changes the rating from clocksource_change_rating() into __clocksource_change_rating(). Protect the clocksource_watchdog_work() code sequence with the clocksource_mutex() and call __clocksource_change_rating(). LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-08-15clocksource: Call clocksource_change_rating() outside of watchdog_lockThomas Gleixner1-2/+9
The changes to the watchdog logic introduced a lock inversion between watchdog_lock and clocksource_mutex. Change the rating outside of watchdog_lock to avoid it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15timekeeping: Update clocksource with stop_machineMartin Schwidefsky1-68/+44
update_wall_time calls change_clocksource HZ times per second to check if a new clock source is available. In close to 100% of all calls there is no new clock. Replace the tick based check by an update done with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.711836357@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15timekeeping: Introduce struct timekeeperMartin Schwidefsky1-2/+4
Add struct timekeeper to keep the internal values timekeeping.c needs in regard to the currently selected clock source. This moves the timekeeping intervals, xtime_nsec and the ntp error value from struct clocksource to struct timekeeper. The raw_time is removed from the clocksource as well. It gets treated like xtime as a global variable. Eventually xtime raw_time should be moved to struct timekeeper. [ tglx: minor cleanup ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.613209842@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15clocksource: Move watchdog downgrade to a work queue threadMartin Schwidefsky1-17/+39
Move the downgrade of an unstable clocksource from the timer interrupt context into the process context of a work queue thread. This is needed to be able to do the clocksource switch with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.354926067@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15clocksource: Refactor clocksource watchdogMartin Schwidefsky1-28/+69
Refactor clocksource watchdog code to make it more readable. Add clocksource_dequeue_watchdog to remove a clocksource from the watchdog list when it is unregistered. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.110881699@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15clocksource: Simplify clocksource watchdog resume logicMartin Schwidefsky1-8/+15
To resume the clocksource watchdog just remove the CLOCK_SOURCE_WATCHDOG bit from the watched clocksource. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.880925790@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15clocksource: Delay clocksource watchdog highres enablementMartin Schwidefsky1-25/+22
The clocksource watchdog marks a clock as highres capable before it checked the deviation from the watchdog clocksource even for a single time. Make sure that the deviation is at least checked once before doing the switch to highres mode. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.627795883@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15clocksource: Cleanup clocksource selectionMartin Schwidefsky1-83/+51
If a non high-resolution clocksource is first set as override clock and then registered it becomes active even if the system is in one-shot mode. Move the override check from sysfs_override_clocksource to the clocksource selection. That fixes the bug and simplifies the code. The check in clocksource_register for double registration of the same clocksource is removed without replacement. To find the initial clocksource a new weak function in jiffies.c is defined that returns the jiffies clocksource. The architecture code can then override the weak function with a more suitable clocksource, e.g. the TOD clock on s390. [ tglx: Folded in a fix from John Stultz ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.388024160@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-19clocksource: Prevent NULL pointer dereferenceThomas Gleixner1-1/+1
Writing a zero length string to sys/.../current_clocksource will cause a NULL pointer dereference if the clock events system is in one shot (highres or nohz) mode. Pointed-out-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <alpine.DEB.2.00.0907191545580.12306@bicker> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-06-13clocksource: prevent selection of low resolution clocksourse also for nohz=onThomas Gleixner1-8/+10
commit 3f68535adad (clocksource: sanity check sysfs clocksource changes) prevents selection of non high resolution capable clocksources when high resolution mode is active, but did not take into account that the same rules apply for highres=off nohz=on. Check the tick device mode instead of hrtimer_hres_active() to verify whether the system needs to be protected from a switch to jiffies or other non highres capable clock sources. Reported-by: Luming Yu <luming.yu@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-06-11clocksource: sanity check sysfs clocksource changesjohn stultz1-1/+17
Thomas, Andrew and Ingo pointed out that we don't have any safety checks in the clocksource sysfs entries to make sure sysadmins don't try to change the clocksource to a non high-res timer capable clocksource (such as jiffies) when high-res timers (HRT) is enabled. Doing so will likely hang a system. Correct this by filtering non HRT clocksources from available_clocksources and not accepting non HRT clocksources with HRT enabled. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-05-02clocksource: setup mult_orig in clocksource_enable()Magnus Damm1-3/+0
Setup clocksource mult_orig in clocksource_enable(). Clocksource drivers can save power by using keeping the device clock disabled while the clocksource is unused. In practice this means that the enable() and disable() callbacks perform clk_enable() and clk_disable(). The enable() callback may also use clk_get_rate() to get the clock rate from the clock framework. This information can then be used to calculate the shift and mult variables. Currently the mult_orig variable is setup from mult at registration time only. This is conflicting with the above case since the clock is disabled and the mult variable is not yet calculated at the time of registration. Moving the mult_orig setup code to clocksource_enable() allows us to both handle the common case with no enable() callback and the mult-changed-after-enable() case. [ Impact: allow dynamic clock source usage ] Signed-off-by: Magnus Damm <damm@igel.co.jp> LKML-Reference: <20090501054546.8193.10688.sendpatchset@rx1.opensource.se> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-04-21clocksource: pass clocksource to read() callbackMagnus Damm1-4/+4
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-15clocksource: allow usage independent of timekeeping.cPatrick Ohly1-0/+76
So far struct clocksource acted as the interface between time/timekeeping.c and hardware. This patch generalizes the concept so that a similar interface can also be used in other contexts. For that it introduces new structures and related functions *without* touching the existing struct clocksource. The reasons for adding these new structures to clocksource.[ch] are * the APIs are clearly related * struct clocksource could be cleaned up to use the new structs * avoids proliferation of files with similar names (timesource.h? timecounter.h?) As outlined in the discussion with John Stultz, this patch adds * struct cyclecounter: stateless API to hardware which counts clock cycles * struct timecounter: stateful utility code built on a cyclecounter which provides a nanosecond counter * only the function to read the nanosecond counter; deltas are used internally and not exposed to users of timecounter The code does no locking of the shared state. It must be called at least as often as the cycle counter wraps around to detect these wrap arounds. Both is the responsibility of the timecounter user. Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-01cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/Rusty Russell1-1/+2
Impact: cleanup Simple replacement, now the _nr is redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@redhat.com>
2009-01-01cpumask: convert kernel time functionsRusty Russell1-3/+3
Impact: Use new APIs Convert kernel/time functions to use struct cpumask *. Note the ugly bitmap declarations in tick-broadcast.c. These should be cpumask_var_t, but there was no obvious initialization function to put the alloc_cpumask_var() calls in. This was safe. (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK, so we use a bitmap here to show we really mean it). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
2008-08-21clocksource: keep track of original clocksource frequencyJohn Stultz1-0/+3
The clocksource frequency is represented by clocksource->mult/2^(clocksource->shift). Currently, when NTP makes adjustments to the clock frequency, they are made directly to the mult value. This has the drawback that once changed, we cannot know what the orignal mult value was, or how much adjustment has been applied. This property causes problems in calculating proper ntp intervals when switching back and forth between clocksources. This patch separates the current mult value into a mult and mult_orig pair. The mult_orig value stays constant, while the ntp clocksource adjustments are done only to the mult value. This allows for correct ntp interval calculation and additionally lays the groundwork for a new notion of time, what I'm calling the monotonic-raw time, which is introduced in a following patch. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-23Merge branch 'cpus4096-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-21sysdev: Pass the attribute to the low level sysdev show/store functionAndi Kleen1-2/+6
This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-23clocksource/events: use performance variant for_each_cpu_mask_nrMike Travis1-2/+2
Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-03clocksource: allow read access to available/current_clocksourceHeiko Carstens1-2/+2
There is no harm, when users can read the info and we ask often enough during debugging for this kind of information. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-03clocksource: Fix permissions for available_clocksourceHeiko Carstens1-1/+1
File permissions for /sys/devices/system/clocksource/clocksource0/available_clocksource are 600 which allows write access. But this is in fact a read only file. So change permissions to 400. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrtLinus Torvalds1-4/+14
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: clocksource: make clocksource watchdog cycle through online CPUs Documentation: move timer related documentation to a single place clockevents: optimise tick_nohz_stop_sched_tick() a bit locking: remove unused double_spin_lock() hrtimers: simplify lockdep handling timers: simplify lockdep handling posix-timers: fix shadowed variables timer_list: add annotations to workqueue.c hrtimer: use nanosleep specific restart_block fields hrtimer: add nanosleep specific restart_block member
2008-04-17kgdb: clocksource watchdogJason Wessel1-0/+12
In order to not trip the clocksource watchdog, kgdb must touch the clocksource watchdog on the return to normal system run state. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17clocksource: make clocksource watchdog cycle through online CPUsAndi Kleen1-4/+14
This way it checks if the clocks are synchronized between CPUs too. This might be able to detect slowly drifting TSCs which only go wrong over longer time. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-25clocksource: revert: use init_timer_deferrable for clocksource_watchdogThomas Gleixner1-1/+1
Revert commit 1077f5a917b7c630231037826b344b2f7f5b903f Author: Parag Warudkar <parag.warudkar@gmail.com> Date: Wed Jan 30 13:30:01 2008 +0100 clocksource.c: use init_timer_deferrable for clocksource_watchdog clocksource_watchdog can use a deferrable timer - reduces wakeups from idle per second. The watchdog timer needs to run with the specified interval. Otherwise it will miss the possible wrap of the watchdog clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2008-03-19revert "clocksource: make clocksource watchdog cycle through online CPUs"Andrew Morton1-10/+4
Revert commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38 ("clocksource: make clocksource watchdog cycle through online CPUs") due to the regression reported by Gabriel C at http://lkml.org/lkml/2008/2/24/281 (short vesion: it makes TSC be marked as always unstable on his machine). Cc: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Robert Hancock <hancockr@shaw.ca> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Gabriel C <nix.or.die@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08clocksource: remove redundant codeLi Zefan1-1/+0
Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating() won't do anyting with the cs flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06time: fix sysfs_show_{available,current}_clocksources() buffer overflow problemMiao Xie1-9/+10
I found that there is a buffer overflow problem in the following code. Version: 2.6.24-rc2, File: kernel/time/clocksource.c:417-432 -------------------------------------------------------------------- static ssize_t sysfs_show_available_clocksources(struct sys_device *dev, char *buf) { struct clocksource *src; char *curr = buf; spin_lock_irq(&clocksource_lock); list_for_each_entry(src, &clocksource_list, list) { curr += sprintf(curr, "%s ", src->name); } spin_unlock_irq(&clocksource_lock); curr += sprintf(curr, "\n"); return curr - buf; } ----------------------------------------------------------------------- sysfs_show_current_clocksources() also has the same problem though in practice the size of current clocksource's name won't exceed PAGE_SIZE. I fix the bug by using snprintf according to the specification of the kernel (Version:2.6.24-rc2,File:Documentation/filesystems/sysfs.txt) Fix sysfs_show_available_clocksources() and sysfs_show_current_clocksources() buffer overflow problem with snprintf(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-30clocksource: add unregister function to disable unusable clocksourcesThomas Gleixner1-0/+15
On x86 the PIT might become an unusable clocksource. Add an unregister function to provide a possibilty to remove the PIT from the list of available clock sources. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30clocksource: make clocksource watchdog cycle through online CPUsAndi Kleen1-4/+10
This way it checks if the clocks are synchronized between CPUs too. This might be able to detect slowly drifting TSCs which only go wrong over longer time. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30clocksource.c: use init_timer_deferrable for clocksource_watchdogParag Warudkar1-1/+1
clocksource_watchdog can use a deferrable timer - reduces wakeups from idle per second. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-24Driver core: change sysdev classes to use dynamic kobject namesKay Sievers1-1/+1
All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-19kernel/time/clocksource.c: Use list_for_each_entry instead of list_for_eachMatthias Kaehlcke1-15/+7
kernel/time/clocksource.c: Convert list_for_each to list_for_each_entry in clocksource_resume(), sysfs_override_clocksource() and show_available_clocksources() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15clocksource: fix lock order in the resume pathThomas Gleixner1-7/+3
lockdep complains about the lock nesting of clocksource and watchdog lock in the resume path. Change the resume marker to a bit operation and remove the lock from this path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09clocksource: fix resume logicThomas Gleixner1-0/+45
We need to make sure that the clocksources are resumed, when timekeeping is resumed. The current resume logic does not guarantee this. Add a resume function pointer to the clocksource struct, so clocksource drivers which need to reinitialize the clocksource can provide a resume function. Add a resume function, which calls the maybe available clocksource resume functions and resets the watchdog function, so a stable TSC can be used accross suspend/resume. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09clocksource: spelling error in watchdog codeDaniel Walker1-3/+3
There's more that need fixing, and fix my own subject spelling error too. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-25[PATCH] clocksource: Fix thinko in watchdog selectionThomas Gleixner1-1/+2
The watchdog implementation excludes low res / non continuous clocksources from being selected as a watchdog reference unintentionally. Allow using jiffies/PIT as a watchdog reference as long as no better clocksource is available. This is necessary to detect TSC breakage on systems, which have no pmtimer/hpet. The main goal of the initial patch (preventing to switch to highres/nohz when no reliable fallback clocksource is available) is still guaranteed by the checks in clocksource_watchdog(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05[PATCH] clocksource init adjustments (fix bug #7426)john stultz1-3/+5
This patch resolves the issue found here: http://bugme.osdl.org/show_bug.cgi?id=7426 The basic summary is: Currently we register most of i386/x86_64 clocksources at module_init time. Then we enable clocksource selection at late_initcall time. This causes some problems for drivers that use gettimeofday for init calibration routines (specifically the es1968 driver in this case), where durring module_init, the only clocksource available is the low-res jiffies clocksource. This may cause slight calibration errors, due to the small sampling time used. It should be noted that drivers that require fine grained time may not function on architectures that do not have better then jiffies resolution timekeeping (there are a few). However, this does not discount the reasonable need for such fine-grained timekeeping at init time. Thus the solution here is to register clocksources earlier (ideally when the hardware is being initialized), and then we enable clocksource selection at fs_initcall (before device_initcall). This patch should probably get some testing time in -mm, since clocksource selection is one of the most important issues for correct timekeeping, and I've only been able to test this on a few of my own boxes. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16[PATCH] tick-management: dyntick / highres functionalityThomas Gleixner1-0/+8
With Ingo Molnar <mingo@elte.hu> Add functions to provide dynamic ticks and high resolution timers. The code which keeps track of jiffies and handles the long idle periods is shared between tick based and high resolution timer based dynticks. The dyntick functionality can be disabled on the kernel commandline. Provide also the infrastructure to support high resolution timers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16[PATCH] clocksource: Add verification (watchdog) helperThomas Gleixner1-4/+122
The TSC needs to be verified against another clocksource. Instead of using hardwired assumptions of available hardware, provide a generic verification mechanism. The verification uses the best available clocksource and handles the usability for high resolution timers / dynticks of the clocksource which needs to be verified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16[PATCH] Simplify the registration of clocksourcesThomas Gleixner1-62/+58
Enqueue clocksources in rating order to make selection of the clocksource easier. Also check the match with an user override at enqueue time. Preparatory patch for the generic clocksource verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11[PATCH] kernel/time/clocksource.c needs struct task_struct on m68kMathieu Desnoyers1-0/+1
kernel/time/clocksource.c needs struct task_struct on m68k. Because it uses spin_unlock_irq(), which, on m68k, uses hardirq_count(), which uses preempt_count(), which needs to dereference struct task_struct, we have to include sched.h. Because it would cause a loop inclusion, we cannot include sched.h in any other of asm-m68k/system.h, linux/thread_info.h, linux/hardirq.h, which leaves this ugly include in a C file as the only simple solution. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-10[PATCH] clocksource: small cleanupDaniel Walker1-3/+3
Mostly changing alignment. Just some general cleanup. [akpm@osdl.org: build fix] Signed-off-by: Daniel Walker <dwalker@mvista.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] clocksource: add usage of CONFIG_SYSFSDaniel Walker1-0/+2
Simply adds some ifdefs to remove clocksoure sysfs code when CONFIG_SYSFS isn't turn on. Signed-off-by: Daniel Walker <dwalker@mvista.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] time: rename clocksource functionsjohn stultz1-8/+8
As suggested by Roman Zippel, change clocksource functions to use clocksource_xyz rather then xyz_clocksource to avoid polluting the namespace. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] Time: i386 Clocksource Driversjohn stultz1-3/+8
Implement the time sources for i386 (acpi_pm, cyclone, hpet, pit, and tsc). With this patch, the conversion of the i386 arch to the generic timekeeping code should be complete. The patch should be fairly straight forward, only adding the new clocksources. [hirofumi@mail.parknet.co.jp: acpi_pm cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] Time: Use clocksource infrastructure for update_wall_timejohn stultz1-2/+2
Modify the update_wall_time function so it increments time using the clocksource abstraction instead of jiffies. Since the only clocksource driver currently provided is the jiffies clocksource, this should result in no functional change. Additionally, a timekeeping_init and timekeeping_resume function has been added to initialize and maintain some of the new timekeping state. [hirofumi@mail.parknet.co.jp: fixlet] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] Time: Clocksource Infrastructurejohn stultz1-0/+344
This introduces the clocksource management infrastructure. A clocksource is a driver-like architecture generic abstraction of a free-running counter. This code defines the clocksource structure, and provides management code for registering, selecting, accessing and scaling clocksources. Additionally, this includes the trivial jiffies clocksource, a lowest common denominator clocksource, provided mainly for use as an example. [hirofumi@mail.parknet.co.jp: Don't enable IRQ too early] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>