linux-stable/Documentation
Samuel Holland e19ca3fe6c clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
commit c950ca8c35 upstream.

The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).

Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.

Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:

 Count | Event
-------+---------------------------
  9940 | jumped backward      699ms
   268 | jumped backward     1398ms
     1 | jumped backward     2097ms
 16020 | jumped forward       175ms
  6443 | jumped forward       699ms
  2976 | jumped forward      1398ms
     9 | jumped forward    356516ms
     9 | jumped forward    357215ms
     4 | jumped forward    714430ms
     1 | jumped forward   3578440ms

This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.

The largest jump (almost an hour!) was the following sequence of reads:
    0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000

Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.

Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.

Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:

 Count | Event
-------+---------------------------
    17 | jumped backward      699ms
    52 | jumped forward       175ms
  2831 | jumped forward       699ms
     5 | jumped forward      1398ms

Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
 0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
 0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
 0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000

The pattern of errors in CNTV_TVAL seemed to depend on exactly which
value was written to it. For example, after writing 0x10101010:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
 0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
 0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
 0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
 0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
 0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000

I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL
--------------------+------------+--------------------------------------
 0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

========================================================================

Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.

However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.

For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.

Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".

[1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
[2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
[3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
[4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:09:58 +01:00
..
ABI xen/balloon: add runtime control for scrubbing ballooned out pages 2018-09-14 08:51:10 -04:00
accelerators ocxl: Document new OCXL IOCTLs 2018-06-03 20:40:33 +10:00
accounting
acpi ACPI: property: graph: Update graph documentation to use generic references 2018-07-23 12:44:52 +02:00
admin-guide x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off 2019-01-09 17:38:41 +01:00
aoe
arm ARM: Device-tree updates 2018-06-11 17:57:38 -07:00
arm64 clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability 2019-03-23 20:09:58 +01:00
auxdisplay Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/ 2018-04-12 16:08:02 +02:00
backlight
block block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
blockdev zram: introduce zram memory tracking 2018-06-07 17:34:34 -07:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-08-07 11:02:05 -07:00
bus-devices
cdrom Documentation/cdrom: fix German sharp s in LaTex 2018-03-08 19:35:29 -07:00
cgroup-v1 page cache: use xa_lock 2018-04-11 10:28:39 -07:00
cma
connector
console Documentation: corrections to console/console.txt 2018-08-10 16:09:40 -06:00
core-api idr: Change documentation license 2018-10-15 16:31:29 -04:00
cpu-freq cpufreq: Drop cpufreq_table_validate_and_show() 2018-04-10 08:40:45 +02:00
cpuidle cpuidle: Add definition of residency to sysfs documentation 2018-04-09 13:44:37 +02:00
crypto crypto: remove redundant type flags from tfm allocation 2018-07-09 00:30:29 +08:00
dev-tools doc: dev-tools: kselftest.rst: update contributing new tests 2018-06-29 09:01:50 -06:00
device-mapper dm raid: bump target version, update comments and documentation 2018-09-06 17:07:58 -04:00
devicetree dt-bindings: eeprom: at24: add "atmel,24c2048" compatible string 2019-02-20 10:25:35 +01:00
doc-guide Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
driver-api docs: fpga: document fpga manager flags 2018-09-30 08:49:55 -07:00
driver-model dmaengine: add a new helper dmaenginem_async_device_register 2018-07-30 10:50:22 +05:30
early-userspace initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
EDID
extcon
fault-injection Documentation: nvme: Documentation for nvme fault injection 2018-03-26 08:53:43 -06:00
fb uvesafb: Fix URLs in the documentation 2018-09-26 18:11:23 +02:00
features ARM: 8777/1: Hook up SYNC_CORE functionality for sys_membarrier() 2018-07-11 11:02:08 +01:00
filesystems mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps 2019-01-26 09:32:44 +01:00
firmware_class
fmc
fpga docs: fpga: add a document for FPGA Device Feature List (DFL) Framework Overview 2018-07-15 13:55:44 +02:00
gpio Documentation: gpio: Move drivers-on-gpio.txt to driver-api 2018-03-23 04:22:29 +01:00
gpu drm/msm/gpu: Add the buffer objects from the submit to the crash dump 2018-07-30 08:50:10 -04:00
hid
hwmon hwmon: (ina2xx) fix sysfs shunt resistor read access 2018-08-26 17:45:25 -07:00
i2c i2c: refactor function to release a DMA safe buffer 2018-08-30 23:13:15 +02:00
ia64 ia64: doc: tweak whitespace for 'console=' parameter 2018-03-05 14:41:38 -08:00
ide
iio
infiniband Documentation/ABI: update infiniband sysfs interfaces 2018-02-23 08:18:33 -07:00
input input: add MT_TOOL_DIAL 2018-07-17 15:33:47 +02:00
ioctl Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
isdn Documentation/isdn: check and fix dead links ... 2018-03-26 12:31:13 -04:00
kbuild kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt 2018-08-22 23:21:42 +09:00
kdump
kernel-hacking doc:it_IT: translation for kernel-hacking 2018-07-26 16:21:09 -06:00
laptops platform/x86: thinkpad_acpi: silence HKEY 0x6032, 0x60f0, 0x6030 2018-05-07 15:10:31 +03:00
leds
lightnvm
livepatch livepatch: Remove not longer valid limitations from the documentation 2018-05-24 15:37:57 +02:00
locking locking: Implement an algorithm choice for Wound-Wait mutexes 2018-07-03 09:44:36 +02:00
m68k
maintainer docs: Fix more broken references 2018-06-15 18:11:26 -03:00
md
media media: replace ADOBERGB by OPRGB 2018-11-13 11:08:54 -08:00
memory-devices
mic
mips
misc-devices pci_endpoint_test: Add 2 ioctl commands 2018-07-19 11:46:57 +01:00
mmc
mtd
namespaces
netlabel
networking net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int 2018-09-26 20:33:21 -07:00
nfc
nios2
nvdimm
nvmem
openrisc
parisc
PCI Merge branch 'remotes/lorenzo/pci/dwc' 2018-08-15 14:59:11 -05:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf drivers/bus: Move Arm CCN PMU driver 2018-03-06 17:26:15 +01:00
phy
platform
power PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
powerpc powerpc: Document issues with TM on POWER9 2018-07-02 23:54:29 +10:00
pps
process Code of Conduct: Change the contact email address 2018-10-22 07:33:36 +01:00
pti
ptp ptp: Fix documentation to match code. 2018-03-26 12:13:21 -04:00
rapidio Documentation: rapidio: move sysfs interface to ABI 2018-02-23 08:25:45 -07:00
RCU rculist: Improve documentation for list_for_each_entry_from_rcu() 2018-07-12 15:39:25 -07:00
riscv perf: riscv: Add Document for Future Porting Guide 2018-06-04 14:02:11 -07:00
s390 vfio-ccw: update documentation 2018-03-01 17:32:14 +01:00
scheduler sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation 2018-05-14 09:12:27 +02:00
scsi scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters 2018-08-27 12:26:10 -04:00
security Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
serial
sh
sound ASoC: Updates for v4.19 2018-08-13 12:12:31 +02:00
sparc sparc64: Add support for ADI (Application Data Integrity) 2018-03-18 07:38:48 -07:00
sphinx Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
sphinx-static
spi
sysctl namei: allow restricted O_CREAT of FIFOs and regular files 2018-08-23 18:48:43 -07:00
target
thermal thermal: Add cooling device's statistics in sysfs 2018-04-02 21:49:01 +08:00
timers timekeeping.txt: Correct maxCount of n-bit binary counter 2018-07-23 09:33:06 -06:00
trace This was a moderately busy cycle for docs, with the usual collection of 2018-08-14 14:29:31 -07:00
translations This was a moderately busy cycle for docs, with the usual collection of 2018-08-14 14:29:31 -07:00
usb USB-serial updates for v4.19-rc1 2018-07-20 21:47:15 +02:00
userspace-api x86/speculation: Add prctl() control for indirect branch speculation 2018-12-05 19:32:04 +01:00
virtual KVM: x86: Control guest reads of MSR_PLATFORM_INFO 2018-09-20 00:51:46 +02:00
vm docs/vm: move ksm and transhuge from "user" to "internals" section. 2018-05-29 06:45:55 -06:00
w1 w1: fix w1_ds2438 documentation 2018-07-07 17:27:13 +02:00
watchdog watchdog: remove bfin_wdt driver 2018-03-26 15:57:04 +02:00
wimax
x86 x86/mm: Move LDT remap out of KASLR region on 5-level paging 2018-11-27 16:13:08 +01:00
xtensa
.gitignore
00-INDEX docs: admin-guide: add cgroup-v2 documentation 2018-05-10 15:42:41 -06:00
atomic_bitops.txt
atomic_t.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
Changes
clearing-warn-once.txt
CodingStyle
conf.py ext4: import inode data fork chapter from wiki page 2018-07-29 15:45:00 -04:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt Documentation: remove stale firmware API reference 2018-05-14 16:44:41 +02:00
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff
efi-stub.txt
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-08-15 15:04:25 -07:00
Intel-IOMMU.txt
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt
kobject.txt
kprobes.txt kprobes/Documentation: Fix various typos 2018-06-22 11:10:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
Makefile
memory-barriers.txt sched/Documentation: Update wake_up() & co. memory-barrier guarantees 2018-07-17 09:30:34 +02:00
memory-hotplug.txt
men-chameleon-bus.txt
nommu-mmap.txt Documentation: nommu-map: Fix duplicate word typo 2018-06-26 09:01:27 -06:00
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pnp.txt
preempt-locking.txt
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: Fix several typos in documentation 2018-06-15 13:36:08 +02:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
sgi-ioc4.txt
siphash.txt
SM501.txt
smsc_ece1099.txt
speculation.txt
static-keys.txt
SubmittingPatches
svga.txt
switchtec.txt
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt vfio/mdev: Check globally for duplicate devices 2018-06-08 10:24:27 -06:00
vfio.txt vfio: fix documentation 2018-05-08 09:16:41 -06:00
video-output.txt
xillybus.txt
xz.txt
zorro.txt