linux-stable/arch/x86
Ingo Molnar 52648e83c9 x86: Pack loops tightly as well
Packing loops tightly (-falign-loops=1) is beneficial to code size:

     text        data    bss     dec              filename
 12566391        1617840 1089536 15273767         vmlinux.align.16-byte
 12224951        1617840 1089536 14932327         vmlinux.align.1-byte
 11976567        1617840 1089536 14683943         vmlinux.align.1-byte.funcs-1-byte
 11903735        1617840 1089536 14611111         vmlinux.align.1-byte.funcs-1-byte.loops-1-byte

Which reduces the size of the kernel by another 0.6%, so the
the total combined size reduction of the alignment-packing
patches is ~5.5%.

The x86 decoder bandwidth and caching arguments laid out in:

  be6cb02779 ("x86: Align jump targets to 1-byte boundaries")

apply to loop alignment as well.

Furtermore, modern CPU uarchs have a loop cache/buffer that
is a L0 cache before even any uop cache, covering a few
dozen most recently executed instructions.

This loop cache generally does not have the 16-byte alignment
restrictions of the uop cache.

Now loop alignment can still be beneficial if:

 - a loop is cache-hot and its surroundings are not.

 - if the loop is so cache hot that the instruction
   flow becomes x86 decoder bandwidth limited

But loop alignment is harmful if:

 - a loop is cache-cold

 - a loop's surroundings are cache-hot as well

 - two cache-hot loops are close to each other

 - if the loop fits into the loop cache

 - if the code flow is not decoder bandwidth limited

and I'd argue that the latter five scenarios are much
more common in the kernel, as our hottest loops are
typically:

 - pointer chasing: this should fit into the loop cache
   in most cases and is typically data cache and address
   generation limited

 - generic memory ops (memset, memcpy, etc.): these generally
   fit into the loop cache as well, and are likewise data
   cache limited.

So this patch packs loop addresses tightly as well.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20150410123017.GB19918@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-17 07:56:54 +02:00
..
boot * Avoid garbage names in efivarfs due to buggy firmware by zero'ing 2015-05-06 08:30:24 +02:00
configs x86/build/defconfig: Enable USB_EHCI_TT_NEWSCHED=y 2015-02-19 02:21:14 +01:00
crypto crypto: x86/sha512_ssse3 - fixup for asm function prototype change 2015-04-24 20:09:01 +08:00
ia32 x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code 2015-05-08 13:50:02 +02:00
include x86/asm/uaccess: Unify the ALIGN_DESTINATION macro 2015-05-14 07:25:34 +02:00
kernel x86/alternatives: Switch AMD F15h and later to the P6 NOPs 2015-05-11 10:26:05 +02:00
kvm kvm: x86: fix kvmclock update protocol 2015-04-27 15:48:59 +02:00
lguest x86/asm/entry: Remove SYSCALL_VECTOR 2015-05-10 12:34:28 +02:00
lib x86/asm/uaccess: Get rid of copy_user_nocache_64.S 2015-05-14 07:25:35 +02:00
math-emu
mm Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-05-06 10:57:37 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-12-10 15:48:20 -05:00
oprofile x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
pci x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus 2015-04-30 22:17:34 +02:00
platform TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
power x86/asm, x86/power/hibernate: Use local labels in asm 2015-04-15 11:37:51 +02:00
purgatory Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-10 12:35:46 -08:00
realmode Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-16 14:58:12 -08:00
syscalls xen: features and fixes for 4.1-rc0 2015-04-16 14:01:03 -05:00
tools x86, build: replace Perl script with Shell script 2015-01-26 13:37:18 -08:00
um Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2015-04-15 13:53:55 -07:00
vdso x86: pvclock: Really remove the sched notifier for cross-cpu migrations 2015-04-27 15:49:30 +02:00
video
xen x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code 2015-05-08 13:50:02 +02:00
.gitignore x86/build: Add arch/x86/purgatory/ make generated files to gitignore 2014-10-09 09:29:46 +02:00
Kbuild
Kconfig Initial ACPI support for arm64: 2015-04-24 08:23:45 -07:00
Kconfig.cpu
Kconfig.debug x86, intel-mid: remove Intel MID specific serial support 2015-03-07 03:25:18 +01:00
Makefile x86: Pack loops tightly as well 2015-05-17 07:56:54 +02:00
Makefile.um kbuild: use relative path more to include Makefile 2015-04-02 16:42:08 +02:00
Makefile_32.cpu