These are the documentation changes for 4.10.

It's another busy cycle for the docs tree, as the sphinx conversion
 continues.  Highlights include:
 
  - Further work on PDF output, which remains a bit of a pain but should be
    more solid now.
 
  - Five more DocBook template files converted to Sphinx.  Only 27 to go...
    Lots of plain-text files have also been converted and integrated.
 
  - Images in binary formats have been replaced with more source-friendly
    versions.
 
  - Various bits of organizational work, including the renaming of various
    files discussed at the kernel summit.
 
  - New documentation for the device_link mechanism.
 
 ...and, of course, lots of typo fixes and small updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
 T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
 XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
 BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
 r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
 QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
 zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
 A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
 bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
 pmSJR0hVeRUmA4uw6+Su
 =A0EV
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10' of git://git.lwn.net/linux into drm-misc-next

Backmerge the docs-next branch from Jon into drm-misc so that we can
apply the dma-buf documentation cleanup patches. Git found a conflict
where there was none because both drm-misc and docs had identical
patches to clean up file rename issues in the rst include directives.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This commit is contained in:
Daniel Vetter 2016-12-13 10:36:39 +01:00
commit c248891675
344 changed files with 39950 additions and 21197 deletions

View File

@ -14,13 +14,8 @@ Following translations are available on the WWW:
- this file.
ABI/
- info on kernel <-> userspace ABI and relative interface stability.
BUG-HUNTING
- brute force method of doing binary search of patches to find bug.
Changes
- list of changes that break older software packages.
CodingStyle
- how the maintainers expect the C code in the kernel to look.
- nothing here, just a pointer to process/coding-style.rst.
DMA-API.txt
- DMA API, pci_ API & extensions for non-consistent memory machines.
DMA-API-HOWTO.txt
@ -33,8 +28,6 @@ DocBook/
- directory with DocBook templates etc. for kernel documentation.
EDID/
- directory with info on customizing EDID for broken gfx/displays.
HOWTO
- the process and procedures of how to do Linux kernel development.
IPMI.txt
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
IRQ-affinity.txt
@ -46,62 +39,43 @@ IRQ.txt
Intel-IOMMU.txt
- basic info on the Intel IOMMU virtualization support.
Makefile
- This file does nothing. Removing it breaks make htmldocs and
make distclean.
ManagementStyle
- how to (attempt to) manage kernel hackers.
- It's not of interest for those who aren't touching the build system.
Makefile.sphinx
- It's not of interest for those who aren't touching the build system.
PCI/
- info related to PCI drivers.
RCU/
- directory with info on RCU (read-copy update).
SAK.txt
- info on Secure Attention Keys.
SM501.txt
- Silicon Motion SM501 multimedia companion chip
SecurityBugs
- procedure for reporting security bugs found in the kernel.
SubmitChecklist
- Linux kernel patch submission checklist.
SubmittingDrivers
- procedure to get a new driver source included into the kernel tree.
SubmittingPatches
- procedure to get a source patch included into the kernel tree.
VGA-softcursor.txt
- how to change your VGA cursor from a blinking underscore.
- nothing here, just a pointer to process/coding-style.rst.
accounting/
- documentation on accounting and taskstats.
acpi/
- info on ACPI-specific hooks in the kernel.
admin-guide/
- info related to Linux users and system admins.
aoe/
- description of AoE (ATA over Ethernet) along with config examples.
applying-patches.txt
- description of various trees and how to apply their patches.
arm/
- directory with info about Linux on the ARM architecture.
arm64/
- directory with info about Linux on the 64 bit ARM architecture.
assoc_array.txt
- generic associative array intro.
atomic_ops.txt
- semantics and behavior of atomic and bitmask operations.
auxdisplay/
- misc. LCD driver documentation (cfag12864b, ks0108).
backlight/
- directory with info on controlling backlights in flat panel displays
bad_memory.txt
- how to use kernel parameters to exclude bad RAM regions.
basic_profiling.txt
- basic instructions for those who wants to profile Linux kernel.
bcache.txt
- Block-layer cache on fast SSDs to improve slow (raid) I/O performance.
binfmt_misc.txt
- info on the kernel support for extra binary formats.
blackfin/
- directory with documentation for the Blackfin arch.
block/
- info on the Block I/O (BIO) layer.
blockdev/
- info on block devices & drivers
braille-console.txt
- info on how to use serial devices for Braille support.
bt8xxgpio.txt
- info on how to modify a bt8xx video card for GPIO usage.
btmrvl.txt
@ -114,18 +88,24 @@ cachetlb.txt
- describes the cache/TLB flushing interfaces Linux uses.
cdrom/
- directory with information on the CD-ROM drivers that Linux has.
cgroups/
- cgroups features, including cpusets and memory controller.
cgroup-v1/
- cgroups v1 features, including cpusets and memory controller.
cgroup-v2.txt
- cgroups v2 features, including cpusets and memory controller.
circular-buffers.txt
- how to make use of the existing circular buffer infrastructure
clk.txt
- info on the common clock framework
coccinelle.txt
- info on how to get and use the Coccinelle code checking tool.
cma/
- Continuous Memory Area (CMA) debugfs interface.
conf.py
- It's not of interest for those who aren't touching the build system.
connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
core-api/
- documentation on kernel core components.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
@ -150,26 +130,26 @@ debugging-via-ohci1394.txt
- how to use firewire like a hardware debugger memory reader.
dell_rbu.txt
- document demonstrating the use of the Dell Remote BIOS Update driver.
development-process/
- how to work with the mainline kernel development process.
dev-tools/
- directory with info on development tools for the kernel.
device-mapper/
- directory with info on Device Mapper.
devices.txt
- plain ASCII listing of all the nodes in /dev/ with major minor #'s.
dmaengine/
- the DMA engine and controller API guides.
devicetree/
- directory with info on device tree files used by OF/PowerPC/ARM
digsig.txt
-info on the Digital Signature Verification API
dma-buf-sharing.txt
- the DMA Buffer Sharing API Guide
docutils.conf
- nothing here. Just a configuration file for docutils.
dontdiff
- file containing a list of files that should never be diff'ed.
driver-api/
- the Linux driver implementer's API guide.
driver-model/
- directory with info about Linux driver model.
dvb/
- info on Linux Digital Video Broadcast (DVB) subsystem.
dynamic-debug-howto.txt
- how to use the dynamic debug (dyndbg) feature.
early-userspace/
- info about initramfs, klibc, and userspace early during boot.
edac.txt
@ -178,14 +158,16 @@ efi-stub.txt
- How to use the EFI boot stub to bypass GRUB or elilo on EFI systems.
eisa.txt
- info on EISA bus support.
email-clients.txt
- info on how to use e-mail to send un-mangled (git) patches.
extcon/
- directory with porting guide for Android kernel switch driver.
isa.txt
- info on EISA bus support.
fault-injection/
- dir with docs about the fault injection capabilities infrastructure.
fb/
- directory with info on the frame buffer graphics abstraction layer.
features/
- status of feature implementation on different architectures.
filesystems/
- info on the vfs and the various filesystems that Linux supports.
firmware_class/
@ -194,20 +176,22 @@ flexible-arrays.txt
- how to make use of flexible sized arrays in linux
fmc/
- information about the FMC bus abstraction
fpga/
- FPGA Manager Core.
frv/
- Fujitsu FR-V Linux documentation.
futex-requeue-pi.txt
- info on requeueing of tasks from a non-PI futex to a PI futex
gcov.txt
- use of GCC's coverage testing tool "gcov" with the Linux kernel
gcc-plugins.txt
- GCC plugin infrastructure.
gpio/
- gpio related documentation
gpu/
- directory with information on GPU driver developer's guide.
hid/
- directory with information on human interface devices
highuid.txt
- notes on the change from 16 bit to 32 bit user/group IDs.
hsi.txt
- HSI subsystem overview.
hwspinlock.txt
- hardware spinlock provides hardware assistance for synchronization
timers/
@ -218,18 +202,18 @@ hwmon/
- directory with docs on various hardware monitoring drivers.
i2c/
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
i2o/
- directory with info about the Linux I2O subsystem.
x86/i386/
- directory with info about Linux on Intel 32 bit architecture.
ia64/
- directory with info about Linux on Intel 64 bit architecture.
ide/
- Information regarding the Enhanced IDE drive.
iio/
- info on industrial IIO configfs support.
index.rst
- main index for the documentation at ReST format.
infiniband/
- directory with documents concerning Linux InfiniBand support.
init.txt
- what to do when the kernel can't find the 1st process to run.
initrd.txt
- how to use the RAM disk as an initial/temporary root filesystem.
input/
- info on Linux input device support.
intel_txt.txt
@ -248,28 +232,16 @@ isapnp.txt
- info on Linux ISA Plug & Play support.
isdn/
- directory with info on the Linux ISDN support, and supported cards.
java.txt
- info on the in-kernel binary support for Java(tm).
ja_JP/
- directory with Japanese translations of various documents
kbuild/
- directory with info about the kernel build process.
kernel-doc-nano-HOWTO.txt
- outdated info about kernel-doc documentation.
kdump/
- directory with mini HowTo on getting the crash dump code to work.
kernel-docs.txt
- listing of various WWW + books that document kernel internals.
kernel-documentation.rst
doc-guide/
- how to write and format reStructuredText kernel documentation
kernel-parameters.txt
- summary listing of command line / boot prompt args for the kernel.
kernel-per-CPU-kthreads.txt
- List of all per-CPU kthreads and how they introduce jitter.
kmemcheck.txt
- info on dynamic checker that detects uses of uninitialized memory.
kmemleak.txt
- info on how to make use of the kernel memory leak detection system
ko_KR/
- directory with Korean translations of various documents
kobject.txt
- info of the kobject infrastructure of the Linux kernel.
kprobes.txt
@ -284,8 +256,8 @@ ldm.txt
- a brief description of LDM (Windows Dynamic Disks).
leds/
- directory with info about LED handling under Linux.
local_ops.txt
- semantics and behavior of local atomic operations.
livepatch/
- info on kernel live patching.
locking/
- directory with info about kernel locking primitives
lockup-watchdogs.txt
@ -298,22 +270,24 @@ lzo.txt
- kernel LZO decompressor input formats
m68k/
- directory with info about Linux on Motorola 68k architecture.
magic-number.txt
- list of magic numbers used to mark/protect kernel data structures.
mailbox.txt
- How to write drivers for the common mailbox framework (IPC).
md.txt
- info on boot arguments for the multiple devices driver.
media-framework.txt
- info on media framework, its data structures, functions and usage.
md-cluster.txt
- info on shared-device RAID MD cluster.
media/
- info on media drivers: uAPI, kAPI and driver documentation.
memory-barriers.txt
- info on Linux kernel memory barriers.
memory-devices/
- directory with info on parts like the Texas Instruments EMIF driver
memory-hotplug.txt
- Hotpluggable memory support, how to use and current status.
men-chameleon-bus.txt
- info on MEN chameleon bus.
metag/
- directory with info about Linux on Meta architecture.
mic/
- Intel Many Integrated Core (MIC) architecture device driver.
mips/
- directory with info about Linux on MIPS architecture.
misc-devices/
@ -322,12 +296,8 @@ mmc/
- directory with info about the MMC subsystem
mn10300/
- directory with info about the mn10300 architecture port
module-signing.txt
- Kernel module signing for increased security when loading modules.
mtd/
- directory with info about memory technology devices (flash)
mono.txt
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
namespaces/
- directory with various information about namespaces
netlabel/
@ -336,30 +306,42 @@ networking/
- directory with info on various aspects of networking with Linux.
nfc/
- directory relating info about Near Field Communications support.
nios2/
- Linux on the Nios II architecture.
nommu-mmap.txt
- documentation about no-mmu memory mapping support.
numastat.txt
- info on how to read Numa policy hit/miss statistics in sysfs.
oops-tracing.txt
- how to decode those nasty internal kernel error dump messages.
ntb.txt
- info on Non-Transparent Bridge (NTB) drivers.
nvdimm/
- info on non-volatile devices.
nvmem/
- info on non volatile memory framework.
output/
- default directory where html/LaTeX/pdf files will be written.
padata.txt
- An introduction to the "padata" parallel execution API
parisc/
- directory with info on using Linux on PA-RISC architecture.
parport.txt
- how to use the parallel-port driver.
parport-lowlevel.txt
- description and usage of the low level parallel port functions.
pcmcia/
- info on the Linux PCMCIA driver.
percpu-rw-semaphore.txt
- RCU based read-write semaphore optimized for locking for reading
perf/
- info about the APM X-Gene SoC Performance Monitoring Unit (PMU).
phy/
- ino on Samsung USB 2.0 PHY adaptation layer.
phy.txt
- Description of the generic PHY framework.
pi-futex.txt
- documentation on lightweight priority inheritance futexes.
pinctrl.txt
- info on pinctrl subsystem and the PINMUX/PINCONF and drivers
platform/
- List of supported hardware by compal and Dell laptop.
pnp.txt
- Linux Plug and Play documentation.
power/
@ -372,14 +354,16 @@ preempt-locking.txt
- info on locking under a preemptive kernel.
printk-formats.txt
- how to get printk format specifiers right
process/
- how to work with the mainline kernel development process.
pps/
- directory with information on the pulse-per-second support
pti/
- directory with info on Intel MID PTI.
ptp/
- directory with info on support for IEEE 1588 PTP clocks in Linux.
pwm.txt
- info on the pulse width modulation driver subsystem
ramoops.txt
- documentation of the ramoops oops/panic logging module.
rapidio/
- directory with info on RapidIO packet-based fabric interconnect
rbtree.txt
@ -406,8 +390,6 @@ security/
- directory that contains security-related info
serial/
- directory with info on the low level serial API.
serial-console.txt
- how to set up Linux with a serial line console as the default.
sgi-ioc4.txt
- description of the SGI IOC4 PCI (multi function) device.
sh/
@ -416,24 +398,20 @@ smsc_ece1099.txt
-info on the smsc Keyboard Scan Expansion/GPIO Expansion device.
sound/
- directory with info on sound card support.
sparse.txt
- info on how to obtain and use the sparse tool for typechecking.
spi/
- overview of Linux kernel Serial Peripheral Interface (SPI) support.
stable_api_nonsense.txt
- info on why the kernel does not have a stable in-kernel api or abi.
stable_kernel_rules.txt
- rules and procedures for the -stable kernel releases.
sphinx/
- no documentation here, just files required by Sphinx toolchain.
sphinx-static/
- no documentation here, just files required by Sphinx toolchain.
static-keys.txt
- info on how static keys allow debug code in hotpaths via patching
svga.txt
- short guide on selecting video modes at boot via VGA BIOS.
sysfs-rules.txt
- How not to use sysfs.
sync_file.txt
- Sync file API guide.
sysctl/
- directory with info on the /proc/sys/* files.
sysrq.txt
- info on the magic SysRq key.
target/
- directory with info on generating TCM v4 fabric .ko modules
this_cpu_ops.txt
@ -442,39 +420,29 @@ thermal/
- directory with information on managing thermal issues (CPU/temp)
trace/
- directory with info on tracing technologies within linux
translations/
- translations of this document from English to another language
unaligned-memory-access.txt
- info on how to avoid arch breaking unaligned memory access in code.
unicode.txt
- info on the Unicode character/font mapping used in Linux.
unshare.txt
- description of the Linux unshare system call.
usb/
- directory with info regarding the Universal Serial Bus.
vDSO/
- directory with info regarding virtual dynamic shared objects
vfio.txt
- info on Virtual Function I/O used in guest/hypervisor instances.
vgaarbiter.txt
- info on enable/disable the legacy decoding on different VGA devices
video-output.txt
- sysfs class driver interface to enable/disable a video output device.
video4linux/
- directory with info regarding video/TV/radio cards and linux.
virtual/
- directory with information on the various linux virtualizations.
vm/
- directory with info on the Linux vm code.
vme_api.txt
- file relating info on the VME bus API in linux
volatile-considered-harmful.txt
- Why the "volatile" type class should not be used
w1/
- directory with documents regarding the 1-wire (w1) subsystem.
watchdog/
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
wimax/
- directory with info about Intel Wireless Wimax Connections
workqueue.txt
core-api/workqueue.rst
- information on the Concurrency Managed Workqueue implementation
x86/x86_64/
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
@ -484,7 +452,5 @@ xtensa/
- directory with documents relating to arch/xtensa port/implementation
xz.txt
- how to make use of the XZ data compression within linux kernel
zh_CN/
- directory with Chinese translations of various documents
zorro.txt
- info on writing drivers for Zorro bus devices found on Amigas.

View File

@ -84,4 +84,4 @@ stable:
- Kernel-internal symbols. Do not rely on the presence, absence, location, or
type of any kernel symbol, either in System.map files or the kernel binary
itself. See Documentation/stable_api_nonsense.txt.
itself. See Documentation/process/stable-api-nonsense.rst.

View File

@ -347,7 +347,7 @@ Description:
because of fragmentation, SLUB will retry with the minimum order
possible depending on its characteristics.
When debug_guardpage_minorder=N (N > 0) parameter is specified
(see Documentation/kernel-parameters.txt), the minimum possible
(see Documentation/admin-guide/kernel-parameters.rst), the minimum possible
order is used and this sysfs entry can not be used to change
the order at run time.

View File

@ -1,246 +0,0 @@
Table of contents
=================
Last updated: 20 December 2005
Contents
========
- Introduction
- Devices not appearing
- Finding patch that caused a bug
-- Finding using git-bisect
-- Finding it the old way
- Fixing the bug
Introduction
============
Always try the latest kernel from kernel.org and build from source. If you are
not confident in doing that please report the bug to your distribution vendor
instead of to a kernel developer.
Finding bugs is not always easy. Have a go though. If you can't find it don't
give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.
Before you submit a bug report read REPORTING-BUGS.
Devices not appearing
=====================
Often this is caused by udev. Check that first before blaming it on the
kernel.
Finding patch that caused a bug
===============================
Finding using git-bisect
------------------------
Using the provided tools with git makes finding bugs easy provided the bug is
reproducible.
Steps to do it:
- start using git for the kernel source
- read the man page for git-bisect
- have fun
Finding it the old way
----------------------
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
This is how to track down a bug if you know nothing about kernel hacking.
It's a brute force approach but it works pretty well.
You need:
. A reproducible bug - it has to happen predictably (sorry)
. All the kernel tar files from a revision that worked to the
revision that doesn't
You will then do:
. Rebuild a revision that you believe works, install, and verify that.
. Do a binary search over the kernels to figure out which one
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
you know that 1.3.69 does. Pick a kernel in the middle and build
that, like 1.3.50. Build & test; if it works, pick the mid point
between .50 and .69, else the mid point between .28 and .50.
. You'll narrow it down to the kernel that introduced the bug. You
can probably do better than this but it gets tricky.
. Narrow it down to a subdirectory
- Copy kernel that works into "test". Let's say that 3.62 works,
but 3.63 doesn't. So you diff -r those two kernels and come
up with a list of directories that changed. For each of those
directories:
Copy the non-working directory next to the working directory
as "dir.63".
One directory at time, try moving the working directory to
"dir.62" and mv dir.63 dir"time, try
mv dir dir.62
mv dir.63 dir
find dir -name '*.[oa]' -print | xargs rm -f
And then rebuild and retest. Assuming that all related
changes were contained in the sub directory, this should
isolate the change to a directory.
Problems: changes in header files may have occurred; I've
found in my case that they were self explanatory - you may
or may not want to give up when that happens.
. Narrow it down to a file
- You can apply the same technique to each file in the directory,
hoping that the changes in that file are self contained.
. Narrow it down to a routine
- You can take the old file and the new file and manually create
a merged file that has
#ifdef VER62
routine()
{
...
}
#else
routine()
{
...
}
#endif
And then walk through that file, one routine at a time and
prefix it with
#define VER62
/* both routines here */
#undef VER62
Then recompile, retest, move the ifdefs until you find the one
that makes the difference.
Finally, you take all the info that you have, kernel revisions, bug
description, the extent to which you have narrowed it down, and pass
that off to whomever you believe is the maintainer of that section.
A post to linux.dev.kernel isn't such a bad idea if you've done some
work to narrow it down.
If you get it down to a routine, you'll probably get a fix in 24 hours.
My apologies to Linus and the other kernel hackers for describing this
brute force approach, it's hardly what a kernel hacker would do. However,
it does work and it lets non-hackers help fix bugs. And it is cool
because Linux snapshots will let you do this - something that you can't
do with vendor supplied releases.
Fixing the bug
==============
Nobody is going to tell you how to fix bugs. Seriously. You need to work it
out. But below are some hints on how to use the tools.
To debug a kernel, use objdump and look for the hex offset from the crash
output to find the valid line of code/assembler. Without debug symbols, you
will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example:
objdump -r -S -l --disassemble net/dccp/ipv4.o
NB.: you need to be at the top level of the kernel tree for this to pick up
your C files.
If you don't have access to the code you can also debug on some crash dumps
e.g. crash dump output as shown by Dave Miller.
> EIP is at ip_queue_xmit+0x14/0x4c0
> ...
> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
>
> Put the bytes into a "foo.s" file like this:
>
> .text
> .globl foo
> foo:
> .byte .... /* bytes from Code: part of OOPS dump */
>
> Compile it with "gcc -c -o foo.o foo.s" then look at the output of
> "objdump --disassemble foo.o".
>
> Output:
>
> ip_queue_xmit:
> push %ebp
> push %edi
> push %esi
> push %ebx
> sub $0xbc, %esp
> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
In addition, you can use GDB to figure out the exact file and line
number of the OOPS from the vmlinux file. If you have
CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
OOPS:
EIP: 0060:[<c021e50e>] Not tainted VLI
And use GDB to translate that to human-readable form:
gdb vmlinux
(gdb) l *0xc021e50e
If you don't have CONFIG_DEBUG_INFO enabled, you use the function
offset from the OOPS:
EIP is at vt_ioctl+0xda8/0x1482
And recompile the kernel with CONFIG_DEBUG_INFO enabled:
make vmlinux
gdb vmlinux
(gdb) p vt_ioctl
(gdb) l *(0x<address of vt_ioctl> + 0xda8)
or, as one command
(gdb) l *(vt_ioctl + 0xda8)
If you have a call trace, such as :-
>Call Trace:
> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
> ...
this shows the problem in the :jbd: module. You can load that module in gdb
and list the relevant code.
gdb fs/jbd/jbd.ko
(gdb) p log_wait_commit
(gdb) l *(0x<address> + 0xa3)
or
(gdb) l *(log_wait_commit + 0xa3)
Another very useful option of the Kernel Hacking section in menuconfig is
Debug memory allocations. This will help you see whether data has been
initialised and not set before use etc. To see the values that get assigned
with this look at mm/slab.c and search for POISON_INUSE. When using this an
Oops will often show the poisoned data instead of zero which is the default.
Once you have worked out a fix please submit it upstream. After all open
source is about sharing what you do and don't you want to be recognised for
your genius?
Please do read Documentation/SubmittingPatches though to help your code get
accepted.

File diff suppressed because it is too large Load Diff

View File

@ -9,12 +9,10 @@
DOCBOOKS := z8530book.xml \
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
writing_usb_driver.xml networking.xml \
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
kernel-api.xml filesystems.xml lsm.xml kgdb.xml \
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
debugobjects.xml sh.xml regulator.xml \
alsa-driver-api.xml writing-an-alsa-driver.xml \
tracepoint.xml w1.xml \
80211.xml sh.xml regulator.xml w1.xml \
writing_musb_glue_layer.xml crypto-API.xml iio.xml
ifeq ($(DOCBOOKS),)
@ -264,6 +262,7 @@ clean-files := $(DOCBOOKS) \
$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml, $(DOCBOOKS)) \
$(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
$(index)
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man

View File

@ -1,142 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<!-- ****************************************************** -->
<!-- Header -->
<!-- ****************************************************** -->
<book id="ALSA-Driver-API">
<bookinfo>
<title>The ALSA Driver API</title>
<legalnotice>
<para>
This document is free; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
</para>
<para>
This document is distributed in the hope that it will be useful,
but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the
implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE</emphasis>. See the GNU General Public License
for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter><title>Management of Cards and Devices</title>
<sect1><title>Card Management</title>
!Esound/core/init.c
</sect1>
<sect1><title>Device Components</title>
!Esound/core/device.c
</sect1>
<sect1><title>Module requests and Device File Entries</title>
!Esound/core/sound.c
</sect1>
<sect1><title>Memory Management Helpers</title>
!Esound/core/memory.c
!Esound/core/memalloc.c
</sect1>
</chapter>
<chapter><title>PCM API</title>
<sect1><title>PCM Core</title>
!Esound/core/pcm.c
!Esound/core/pcm_lib.c
!Esound/core/pcm_native.c
!Iinclude/sound/pcm.h
</sect1>
<sect1><title>PCM Format Helpers</title>
!Esound/core/pcm_misc.c
</sect1>
<sect1><title>PCM Memory Management</title>
!Esound/core/pcm_memory.c
</sect1>
<sect1><title>PCM DMA Engine API</title>
!Esound/core/pcm_dmaengine.c
!Iinclude/sound/dmaengine_pcm.h
</sect1>
</chapter>
<chapter><title>Control/Mixer API</title>
<sect1><title>General Control Interface</title>
!Esound/core/control.c
</sect1>
<sect1><title>AC97 Codec API</title>
!Esound/pci/ac97/ac97_codec.c
!Esound/pci/ac97/ac97_pcm.c
</sect1>
<sect1><title>Virtual Master Control API</title>
!Esound/core/vmaster.c
!Iinclude/sound/control.h
</sect1>
</chapter>
<chapter><title>MIDI API</title>
<sect1><title>Raw MIDI API</title>
!Esound/core/rawmidi.c
</sect1>
<sect1><title>MPU401-UART API</title>
!Esound/drivers/mpu401/mpu401_uart.c
</sect1>
</chapter>
<chapter><title>Proc Info API</title>
<sect1><title>Proc Info Interface</title>
!Esound/core/info.c
</sect1>
</chapter>
<chapter><title>Compress Offload</title>
<sect1><title>Compress Offload API</title>
!Esound/core/compress_offload.c
!Iinclude/uapi/sound/compress_offload.h
!Iinclude/uapi/sound/compress_params.h
!Iinclude/sound/compress_driver.h
</sect1>
</chapter>
<chapter><title>ASoC</title>
<sect1><title>ASoC Core API</title>
!Iinclude/sound/soc.h
!Esound/soc/soc-core.c
<!-- !Esound/soc/soc-cache.c no docbook comments here -->
!Esound/soc/soc-devres.c
!Esound/soc/soc-io.c
!Esound/soc/soc-pcm.c
!Esound/soc/soc-ops.c
!Esound/soc/soc-compress.c
</sect1>
<sect1><title>ASoC DAPM API</title>
!Esound/soc/soc-dapm.c
</sect1>
<sect1><title>ASoC DMA Engine API</title>
!Esound/soc/soc-generic-dmaengine-pcm.c
</sect1>
</chapter>
<chapter><title>Miscellaneous Functions</title>
<sect1><title>Hardware-Dependent Devices API</title>
!Esound/core/hwdep.c
</sect1>
<sect1><title>Jack Abstraction Layer API</title>
!Iinclude/sound/jack.h
!Esound/core/jack.c
!Esound/soc/soc-jack.c
</sect1>
<sect1><title>ISA DMA Helpers</title>
!Esound/core/isadma.c
</sect1>
<sect1><title>Other Helper Macros</title>
!Iinclude/sound/core.h
</sect1>
</chapter>
</book>

View File

@ -1,443 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="debug-objects-guide">
<bookinfo>
<title>Debug objects life time</title>
<authorgroup>
<author>
<firstname>Thomas</firstname>
<surname>Gleixner</surname>
<affiliation>
<address>
<email>tglx@linutronix.de</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2008</year>
<holder>Thomas Gleixner</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
debugobjects is a generic infrastructure to track the life time
of kernel objects and validate the operations on those.
</para>
<para>
debugobjects is useful to check for the following error patterns:
<itemizedlist>
<listitem><para>Activation of uninitialized objects</para></listitem>
<listitem><para>Initialization of active objects</para></listitem>
<listitem><para>Usage of freed/destroyed objects</para></listitem>
</itemizedlist>
</para>
<para>
debugobjects is not changing the data structure of the real
object so it can be compiled in with a minimal runtime impact
and enabled on demand with a kernel command line option.
</para>
</chapter>
<chapter id="howto">
<title>Howto use debugobjects</title>
<para>
A kernel subsystem needs to provide a data structure which
describes the object type and add calls into the debug code at
appropriate places. The data structure to describe the object
type needs at minimum the name of the object type. Optional
functions can and should be provided to fixup detected problems
so the kernel can continue to work and the debug information can
be retrieved from a live system instead of hard core debugging
with serial consoles and stack trace transcripts from the
monitor.
</para>
<para>
The debug calls provided by debugobjects are:
<itemizedlist>
<listitem><para>debug_object_init</para></listitem>
<listitem><para>debug_object_init_on_stack</para></listitem>
<listitem><para>debug_object_activate</para></listitem>
<listitem><para>debug_object_deactivate</para></listitem>
<listitem><para>debug_object_destroy</para></listitem>
<listitem><para>debug_object_free</para></listitem>
<listitem><para>debug_object_assert_init</para></listitem>
</itemizedlist>
Each of these functions takes the address of the real object and
a pointer to the object type specific debug description
structure.
</para>
<para>
Each detected error is reported in the statistics and a limited
number of errors are printk'ed including a full stack trace.
</para>
<para>
The statistics are available via /sys/kernel/debug/debug_objects/stats.
They provide information about the number of warnings and the
number of successful fixups along with information about the
usage of the internal tracking objects and the state of the
internal tracking objects pool.
</para>
</chapter>
<chapter id="debugfunctions">
<title>Debug functions</title>
<sect1 id="prototypes">
<title>Debug object function reference</title>
!Elib/debugobjects.c
</sect1>
<sect1 id="debug_object_init">
<title>debug_object_init</title>
<para>
This function is called whenever the initialization function
of a real object is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be initialized. Initializing
is not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the fixup_init
function of the object type description structure if provided
by the caller. The fixup function can correct the problem
before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to
the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects,
debugobjects allocates a tracker object for the real object
and sets the tracker object state to ODEBUG_STATE_INIT. It
verifies that the object is not on the callers stack. If it is
on the callers stack then a limited number of warnings
including a full stack trace is printk'ed. The calling code
must use debug_object_init_on_stack() and remove the object
before leaving the function which allocated it. See next
section.
</para>
</sect1>
<sect1 id="debug_object_init_on_stack">
<title>debug_object_init_on_stack</title>
<para>
This function is called whenever the initialization function
of a real object which resides on the stack is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be initialized. Initializing
is not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the fixup_init
function of the object type description structure if provided
by the caller. The fixup function can correct the problem
before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to
the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects
debugobjects allocates a tracker object for the real object
and sets the tracker object state to ODEBUG_STATE_INIT. It
verifies that the object is on the callers stack.
</para>
<para>
An object which is on the stack must be removed from the
tracker by calling debug_object_free() before the function
which allocates the object returns. Otherwise we keep track of
stale objects.
</para>
</sect1>
<sect1 id="debug_object_activate">
<title>debug_object_activate</title>
<para>
This function is called whenever the activation function of a
real object is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be activated. Activating is
not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the
fixup_activate function of the object type description
structure if provided by the caller. The fixup function can
correct the problem before the real activation of the object
happens. E.g. it can deactivate an active object in order to
prevent damage to the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects then
the fixup_activate function is called if available. This is
necessary to allow the legitimate activation of statically
allocated and initialized objects. The fixup function checks
whether the object is valid and calls the debug_objects_init()
function to initialize the tracking of this object.
</para>
<para>
When the activation is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_ACTIVE.
</para>
</sect1>
<sect1 id="debug_object_deactivate">
<title>debug_object_deactivate</title>
<para>
This function is called whenever the deactivation function of
a real object is called.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be deactivated. Deactivating is not
allowed for untracked or destroyed objects.
</para>
<para>
When the deactivation is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_INACTIVE.
</para>
</sect1>
<sect1 id="debug_object_destroy">
<title>debug_object_destroy</title>
<para>
This function is called to mark an object destroyed. This is
useful to prevent the usage of invalid objects, which are
still available in memory: either statically allocated objects
or objects which are freed later.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be destroyed. Destruction is not
allowed for active and destroyed objects. When debugobjects
detects an error, then it calls the fixup_destroy function of
the object type description structure if provided by the
caller. The fixup function can correct the problem before the
real destruction of the object happens. E.g. it can deactivate
an active object in order to prevent damage to the subsystem.
</para>
<para>
When the destruction is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_DESTROYED.
</para>
</sect1>
<sect1 id="debug_object_free">
<title>debug_object_free</title>
<para>
This function is called before an object is freed.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be freed. Free is not allowed for
active objects. When debugobjects detects an error, then it
calls the fixup_free function of the object type description
structure if provided by the caller. The fixup function can
correct the problem before the real free of the object
happens. E.g. it can deactivate an active object in order to
prevent damage to the subsystem.
</para>
<para>
Note that debug_object_free removes the object from the
tracker. Later usage of the object is detected by the other
debug checks.
</para>
</sect1>
<sect1 id="debug_object_assert_init">
<title>debug_object_assert_init</title>
<para>
This function is called to assert that an object has been
initialized.
</para>
<para>
When the real object is not tracked by debugobjects, it calls
fixup_assert_init of the object type description structure
provided by the caller, with the hardcoded object state
ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem
by calling debug_object_init and other specific initializing
functions.
</para>
<para>
When the real object is already tracked by debugobjects it is
ignored.
</para>
</sect1>
</chapter>
<chapter id="fixupfunctions">
<title>Fixup functions</title>
<sect1 id="debug_obj_descr">
<title>Debug object type description structure</title>
!Iinclude/linux/debugobjects.h
</sect1>
<sect1 id="fixup_init">
<title>fixup_init</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_init is detected. The function takes the
address of the object and the state which is currently
recorded in the tracker.
</para>
<para>
Called from debug_object_init when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note, that the function needs to call the debug_object_init()
function again, after the damage has been repaired in order to
keep the state consistent.
</para>
</sect1>
<sect1 id="fixup_activate">
<title>fixup_activate</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_activate is detected.
</para>
<para>
Called from debug_object_activate when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_NOTAVAILABLE</para></listitem>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note that the function needs to call the debug_object_activate()
function again after the damage has been repaired in order to
keep the state consistent.
</para>
<para>
The activation of statically initialized objects is a special
case. When debug_object_activate() has no tracked object for
this object address then fixup_activate() is called with
object state ODEBUG_STATE_NOTAVAILABLE. The fixup function
needs to check whether this is a legitimate case of a
statically initialized object or not. In case it is it calls
debug_object_init() and debug_object_activate() to make the
object known to the tracker and marked active. In this case
the function should return false because this is not a real
fixup.
</para>
</sect1>
<sect1 id="fixup_destroy">
<title>fixup_destroy</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_destroy is detected.
</para>
<para>
Called from debug_object_destroy when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
</sect1>
<sect1 id="fixup_free">
<title>fixup_free</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_free is detected. Further it can be called
from the debug checks in kfree/vfree, when an active object is
detected from the debug_check_no_obj_freed() sanity checks.
</para>
<para>
Called from debug_object_free() or debug_check_no_obj_freed()
when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
</sect1>
<sect1 id="fixup_assert_init">
<title>fixup_assert_init</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_assert_init is detected.
</para>
<para>
Called from debug_object_assert_init() with a hardcoded state
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the
debug bucket.
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note, this function should make sure debug_object_init() is
called before returning.
</para>
<para>
The handling of statically initialized objects is a special
case. The fixup function should check if this is a legitimate
case of a statically initialized object or not. In this case only
debug_object_init() should be called to make the object known to
the tracker. Then the function should return false because this
is not
a real fixup.
</para>
</sect1>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None (knock on wood).
</para>
</chapter>
</book>

View File

@ -1208,8 +1208,8 @@ static struct block_device_operations opt_fops = {
<listitem>
<para>
Finally, don't forget to read <filename>Documentation/SubmittingPatches</filename>
and possibly <filename>Documentation/SubmittingDrivers</filename>.
Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename>
and possibly <filename>Documentation/process/submitting-drivers.rst</filename>.
</para>
</listitem>
</itemizedlist>

View File

@ -1,112 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Tracepoints">
<bookinfo>
<title>The Linux Kernel Tracepoint API</title>
<authorgroup>
<author>
<firstname>Jason</firstname>
<surname>Baron</surname>
<affiliation>
<address>
<email>jbaron@redhat.com</email>
</address>
</affiliation>
</author>
<author>
<firstname>William</firstname>
<surname>Cohen</surname>
<affiliation>
<address>
<email>wcohen@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
Tracepoints are static probe points that are located in strategic points
throughout the kernel. 'Probes' register/unregister with tracepoints
via a callback mechanism. The 'probes' are strictly typed functions that
are passed a unique set of parameters defined by each tracepoint.
</para>
<para>
From this simple callback mechanism, 'probes' can be used to profile, debug,
and understand kernel behavior. There are a number of tools that provide a
framework for using 'probes'. These tools include Systemtap, ftrace, and
LTTng.
</para>
<para>
Tracepoints are defined in a number of header files via various macros. Thus,
the purpose of this document is to provide a clear accounting of the available
tracepoints. The intention is to understand not only what tracepoints are
available but also to understand where future tracepoints might be added.
</para>
<para>
The API presented has functions of the form:
<function>trace_tracepointname(function parameters)</function>. These are the
tracepoints callbacks that are found throughout the code. Registering and
unregistering probes with these callback sites is covered in the
<filename>Documentation/trace/*</filename> directory.
</para>
</chapter>
<chapter id="irq">
<title>IRQ</title>
!Iinclude/trace/events/irq.h
</chapter>
<chapter id="signal">
<title>SIGNAL</title>
!Iinclude/trace/events/signal.h
</chapter>
<chapter id="block">
<title>Block IO</title>
!Iinclude/trace/events/block.h
</chapter>
<chapter id="workqueue">
<title>Workqueue</title>
!Iinclude/trace/events/workqueue.h
</chapter>
</book>

View File

@ -45,6 +45,13 @@ GPL version 2.
</abstract>
<revhistory>
<revision>
<revnumber>0.10</revnumber>
<date>2016-10-17</date>
<authorinitials>sch</authorinitials>
<revremark>Added generic hyperv driver
</revremark>
</revision>
<revision>
<revnumber>0.9</revnumber>
<date>2009-07-16</date>
@ -1033,6 +1040,61 @@ int main()
</chapter>
<chapter id="uio_hv_generic" xreflabel="Using Generic driver for Hyper-V VMBUS">
<?dbhtml filename="uio_hv_generic.html"?>
<title>Generic Hyper-V UIO driver</title>
<para>
The generic driver is a kernel module named uio_hv_generic.
It supports devices on the Hyper-V VMBus similar to uio_pci_generic
on PCI bus.
</para>
<sect1 id="uio_hv_generic_binding">
<title>Making the driver recognize the device</title>
<para>
Since the driver does not declare any device GUID's, it will not get loaded
automatically and will not automatically bind to any devices, you must load it
and allocate id to the driver yourself. For example, to use the network device
GUID:
<programlisting>
modprobe uio_hv_generic
echo &quot;f8615163-df3e-46c5-913f-f2d2f965ed0e&quot; &gt; /sys/bus/vmbus/drivers/uio_hv_generic/new_id
</programlisting>
</para>
<para>
If there already is a hardware specific kernel driver for the device, the
generic driver still won't bind to it, in this case if you want to use the
generic driver (why would you?) you'll have to manually unbind the hardware
specific driver and bind the generic driver, like this:
<programlisting>
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 &gt; /sys/bus/vmbus/drivers/hv_netvsc/unbind
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 &gt; /sys/bus/vmbus/drivers/uio_hv_generic/bind
</programlisting>
</para>
<para>
You can verify that the device has been bound to the driver
by looking for it in sysfs, for example like the following:
<programlisting>
ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver
</programlisting>
Which if successful should print
<programlisting>
.../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -&gt; ../../../bus/vmbus/drivers/uio_hv_generic
</programlisting>
</para>
</sect1>
<sect1 id="uio_hv_generic_internals">
<title>Things to know about uio_hv_generic</title>
<para>
On each interrupt, uio_hv_generic sets the Interrupt Disable bit.
This prevents the device from generating further interrupts
until the bit is cleared. The userspace driver should clear this
bit before blocking and waiting for more interrupts.
</para>
</sect1>
</chapter>
<appendix id="app1">
<title>Further information</title>
<itemizedlist>

View File

@ -1,992 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Linux-USB-API">
<bookinfo>
<title>The Linux-USB Host Side API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction to USB on Linux</title>
<para>A Universal Serial Bus (USB) is used to connect a host,
such as a PC or workstation, to a number of peripheral
devices. USB uses a tree structure, with the host as the
root (the system's master), hubs as interior nodes, and
peripherals as leaves (and slaves).
Modern PCs support several such trees of USB devices, usually
one USB 2.0 tree (480 Mbit/sec each) with
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
connect a USB 1.1 device directly to the machine's "root hub".
</para>
<para>That master/slave asymmetry was designed-in for a number of
reasons, one being ease of use. It is not physically possible to
assemble (legal) USB cables incorrectly: all upstream "to the host"
connectors are the rectangular type (matching the sockets on
root hubs), and all downstream connectors are the squarish type
(or they are built into the peripheral).
Also, the host software doesn't need to deal with distributed
auto-configuration since the pre-designated master node manages all that.
And finally, at the electrical level, bus protocol overhead is reduced by
eliminating arbitration and moving scheduling into the host software.
</para>
<para>USB 1.0 was announced in January 1996 and was revised
as USB 1.1 (with improvements in hub specification and
support for interrupt-out transfers) in September 1998.
USB 2.0 was released in April 2000, adding high-speed
transfers and transaction-translating hubs (used for USB 1.1
and 1.0 backward compatibility).
</para>
<para>Kernel developers added USB support to Linux early in the 2.2 kernel
series, shortly before 2.3 development forked. Updates from 2.3 were
regularly folded back into 2.2 releases, which improved reliability and
brought <filename>/sbin/hotplug</filename> support as well more drivers.
Such improvements were continued in the 2.5 kernel series, where they added
USB 2.0 support, improved performance, and made the host controller drivers
(HCDs) more consistent. They also simplified the API (to make bugs less
likely) and added internal "kerneldoc" documentation.
</para>
<para>Linux can run inside USB devices as well as on
the hosts that control the devices.
But USB device drivers running inside those peripherals
don't do the same things as the ones running inside hosts,
so they've been given a different name:
<emphasis>gadget drivers</emphasis>.
This document does not cover gadget drivers.
</para>
</chapter>
<chapter id="host">
<title>USB Host-Side API Model</title>
<para>Host-side drivers for USB devices talk to the "usbcore" APIs.
There are two. One is intended for
<emphasis>general-purpose</emphasis> drivers (exposed through
driver frameworks), and the other is for drivers that are
<emphasis>part of the core</emphasis>.
Such core drivers include the <emphasis>hub</emphasis> driver
(which manages trees of USB devices) and several different kinds
of <emphasis>host controller drivers</emphasis>,
which control individual busses.
</para>
<para>The device model seen by USB drivers is relatively complex.
</para>
<itemizedlist>
<listitem><para>USB supports four kinds of data transfers
(control, bulk, interrupt, and isochronous). Two of them (control
and bulk) use bandwidth as it's available,
while the other two (interrupt and isochronous)
are scheduled to provide guaranteed bandwidth.
</para></listitem>
<listitem><para>The device description model includes one or more
"configurations" per device, only one of which is active at a time.
Devices that are capable of high-speed operation must also support
full-speed configurations, along with a way to ask about the
"other speed" configurations which might be used.
</para></listitem>
<listitem><para>Configurations have one or more "interfaces", each
of which may have "alternate settings". Interfaces may be
standardized by USB "Class" specifications, or may be specific to
a vendor or device.</para>
<para>USB device drivers actually bind to interfaces, not devices.
Think of them as "interface drivers", though you
may not see many devices where the distinction is important.
<emphasis>Most USB devices are simple, with only one configuration,
one interface, and one alternate setting.</emphasis>
</para></listitem>
<listitem><para>Interfaces have one or more "endpoints", each of
which supports one type and direction of data transfer such as
"bulk out" or "interrupt in". The entire configuration may have
up to sixteen endpoints in each direction, allocated as needed
among all the interfaces.
</para></listitem>
<listitem><para>Data transfer on USB is packetized; each endpoint
has a maximum packet size.
Drivers must often be aware of conventions such as flagging the end
of bulk transfers using "short" (including zero length) packets.
</para></listitem>
<listitem><para>The Linux USB API supports synchronous calls for
control and bulk messages.
It also supports asynchronous calls for all kinds of data transfer,
using request structures called "URBs" (USB Request Blocks).
</para></listitem>
</itemizedlist>
<para>Accordingly, the USB Core API exposed to device drivers
covers quite a lot of territory. You'll probably need to consult
the USB 2.0 specification, available online from www.usb.org at
no cost, as well as class or device specifications.
</para>
<para>The only host-side drivers that actually touch hardware
(reading/writing registers, handling IRQs, and so on) are the HCDs.
In theory, all HCDs provide the same functionality through the same
API. In practice, that's becoming more true on the 2.5 kernels,
but there are still differences that crop up especially with
fault handling. Different controllers don't necessarily report
the same aspects of failures, and recovery from faults (including
software-induced ones like unlinking an URB) isn't yet fully
consistent.
Device driver authors should make a point of doing disconnect
testing (while the device is active) with each different host
controller driver, to make sure drivers don't have bugs of
their own as well as to make sure they aren't relying on some
HCD-specific behavior.
(You will need external USB 1.1 and/or
USB 2.0 hubs to perform all those tests.)
</para>
</chapter>
<chapter id="types"><title>USB-Standard Types</title>
<para>In <filename>&lt;linux/usb/ch9.h&gt;</filename> you will find
the USB data types defined in chapter 9 of the USB specification.
These data types are used throughout USB, and in APIs including
this host side API, gadget APIs, and usbfs.
</para>
!Iinclude/linux/usb/ch9.h
</chapter>
<chapter id="hostside"><title>Host-Side Data Types and Macros</title>
<para>The host side API exposes several layers to drivers, some of
which are more necessary than others.
These support lifecycle models for host side drivers
and devices, and support passing buffers through usbcore to
some HCD that performs the I/O for the device driver.
</para>
!Iinclude/linux/usb.h
</chapter>
<chapter id="usbcore"><title>USB Core APIs</title>
<para>There are two basic I/O models in the USB API.
The most elemental one is asynchronous: drivers submit requests
in the form of an URB, and the URB's completion callback
handle the next step.
All USB transfer types support that model, although there
are special cases for control URBs (which always have setup
and status stages, but may not have a data stage) and
isochronous URBs (which allow large packets and include
per-packet fault reports).
Built on top of that is synchronous API support, where a
driver calls a routine that allocates one or more URBs,
submits them, and waits until they complete.
There are synchronous wrappers for single-buffer control
and bulk transfers (which are awkward to use in some
driver disconnect scenarios), and for scatterlist based
streaming i/o (bulk or interrupt).
</para>
<para>USB drivers need to provide buffers that can be
used for DMA, although they don't necessarily need to
provide the DMA mapping themselves.
There are APIs to use used when allocating DMA buffers,
which can prevent use of bounce buffers on some systems.
In some cases, drivers may be able to rely on 64bit DMA
to eliminate another kind of bounce buffer.
</para>
!Edrivers/usb/core/urb.c
!Edrivers/usb/core/message.c
!Edrivers/usb/core/file.c
!Edrivers/usb/core/driver.c
!Edrivers/usb/core/usb.c
!Edrivers/usb/core/hub.c
</chapter>
<chapter id="hcd"><title>Host Controller APIs</title>
<para>These APIs are only for use by host controller drivers,
most of which implement standard register interfaces such as
EHCI, OHCI, or UHCI.
UHCI was one of the first interfaces, designed by Intel and
also used by VIA; it doesn't do much in hardware.
OHCI was designed later, to have the hardware do more work
(bigger transfers, tracking protocol state, and so on).
EHCI was designed with USB 2.0; its design has features that
resemble OHCI (hardware does much more work) as well as
UHCI (some parts of ISO support, TD list processing).
</para>
<para>There are host controllers other than the "big three",
although most PCI based controllers (and a few non-PCI based
ones) use one of those interfaces.
Not all host controllers use DMA; some use PIO, and there
is also a simulator.
</para>
<para>The same basic APIs are available to drivers for all
those controllers.
For historical reasons they are in two layers:
<structname>struct usb_bus</structname> is a rather thin
layer that became available in the 2.2 kernels, while
<structname>struct usb_hcd</structname> is a more featureful
layer (available in later 2.4 kernels and in 2.5) that
lets HCDs share common code, to shrink driver size
and significantly reduce hcd-specific behaviors.
</para>
!Edrivers/usb/core/hcd.c
!Edrivers/usb/core/hcd-pci.c
!Idrivers/usb/core/buffer.c
</chapter>
<chapter id="usbfs">
<title>The USB Filesystem (usbfs)</title>
<para>This chapter presents the Linux <emphasis>usbfs</emphasis>.
You may prefer to avoid writing new kernel code for your
USB driver; that's the problem that usbfs set out to solve.
User mode device drivers are usually packaged as applications
or libraries, and may use usbfs through some programming library
that wraps it. Such libraries include
<ulink url="http://libusb.sourceforge.net">libusb</ulink>
for C/C++, and
<ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java.
</para>
<note><title>Unfinished</title>
<para>This particular documentation is incomplete,
especially with respect to the asynchronous mode.
As of kernel 2.5.66 the code and this (new) documentation
need to be cross-reviewed.
</para>
</note>
<para>Configure usbfs into Linux kernels by enabling the
<emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS),
and you get basic support for user mode USB device drivers.
Until relatively recently it was often (confusingly) called
<emphasis>usbdevfs</emphasis> although it wasn't solving what
<emphasis>devfs</emphasis> was.
Every USB device will appear in usbfs, regardless of whether or
not it has a kernel driver.
</para>
<sect1 id="usbfs-files">
<title>What files are in "usbfs"?</title>
<para>Conventionally mounted at
<filename>/proc/bus/usb</filename>, usbfs
features include:
<itemizedlist>
<listitem><para><filename>/proc/bus/usb/devices</filename>
... a text file
showing each of the USB devices on known to the kernel,
and their configuration descriptors.
You can also poll() this to learn about new devices.
</para></listitem>
<listitem><para><filename>/proc/bus/usb/BBB/DDD</filename>
... magic files
exposing the each device's configuration descriptors, and
supporting a series of ioctls for making device requests,
including I/O to devices. (Purely for access by programs.)
</para></listitem>
</itemizedlist>
</para>
<para> Each bus is given a number (BBB) based on when it was
enumerated; within each bus, each device is given a similar
number (DDD).
Those BBB/DDD paths are not "stable" identifiers;
expect them to change even if you always leave the devices
plugged in to the same hub port.
<emphasis>Don't even think of saving these in application
configuration files.</emphasis>
Stable identifiers are available, for user mode applications
that want to use them. HID and networking devices expose
these stable IDs, so that for example you can be sure that
you told the right UPS to power down its second server.
"usbfs" doesn't (yet) expose those IDs.
</para>
</sect1>
<sect1 id="usbfs-fstab">
<title>Mounting and Access Control</title>
<para>There are a number of mount options for usbfs, which will
be of most interest to you if you need to override the default
access control policy.
That policy is that only root may read or write device files
(<filename>/proc/bus/BBB/DDD</filename>) although anyone may read
the <filename>devices</filename>
or <filename>drivers</filename> files.
I/O requests to the device also need the CAP_SYS_RAWIO capability,
</para>
<para>The significance of that is that by default, all user mode
device drivers need super-user privileges.
You can change modes or ownership in a driver setup
when the device hotplugs, or maye just start the
driver right then, as a privileged server (or some activity
within one).
That's the most secure approach for multi-user systems,
but for single user systems ("trusted" by that user)
it's more convenient just to grant everyone all access
(using the <emphasis>devmode=0666</emphasis> option)
so the driver can start whenever it's needed.
</para>
<para>The mount options for usbfs, usable in /etc/fstab or
in command line invocations of <emphasis>mount</emphasis>, are:
<variablelist>
<varlistentry>
<term><emphasis>busgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>busmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB
directories. (Default: 0555)
</para></listitem></varlistentry>
<varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0644)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/devices and drivers files.
(Default: 0444)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
</variablelist>
</para>
<para>Note that many Linux distributions hard-wire the mount options
for usbfs in their init scripts, such as
<filename>/etc/rc.d/rc.sysinit</filename>,
rather than making it easy to set this per-system
policy in <filename>/etc/fstab</filename>.
</para>
</sect1>
<sect1 id="usbfs-devices">
<title>/proc/bus/usb/devices</title>
<para>This file is handy for status viewing tools in user
mode, which can scan the text format and ignore most of it.
More detailed device status (including class and vendor
status) is available from device-specific files.
For information about the current format of this file,
see the
<filename>Documentation/usb/proc_usb_info.txt</filename>
file in your Linux kernel sources.
</para>
<para>This file, in combination with the poll() system call, can
also be used to detect when devices are added or removed:
<programlisting>int fd;
struct pollfd pfd;
fd = open("/proc/bus/usb/devices", O_RDONLY);
pfd = { fd, POLLIN, 0 };
for (;;) {
/* The first time through, this call will return immediately. */
poll(&amp;pfd, 1, -1);
/* To see what's changed, compare the file's previous and current
contents or scan the filesystem. (Scanning is more precise.) */
}</programlisting>
Note that this behavior is intended to be used for informational
and debug purposes. It would be more appropriate to use programs
such as udev or HAL to initialize a device or start a user-mode
helper program, for instance.
</para>
</sect1>
<sect1 id="usbfs-bbbddd">
<title>/proc/bus/usb/BBB/DDD</title>
<para>Use these files in one of these basic ways:
</para>
<para><emphasis>They can be read,</emphasis>
producing first the device descriptor
(18 bytes) and then the descriptors for the current configuration.
See the USB 2.0 spec for details about those binary data formats.
You'll need to convert most multibyte values from little endian
format to your native host byte order, although a few of the
fields in the device descriptor (both of the BCD-encoded fields,
and the vendor and product IDs) will be byteswapped for you.
Note that configuration descriptors include descriptors for
interfaces, altsettings, endpoints, and maybe additional
class descriptors.
</para>
<para><emphasis>Perform USB operations</emphasis> using
<emphasis>ioctl()</emphasis> requests to make endpoint I/O
requests (synchronously or asynchronously) or manage
the device.
These requests need the CAP_SYS_RAWIO capability,
as well as filesystem access permissions.
Only one ioctl request can be made on one of these
device files at a time.
This means that if you are synchronously reading an endpoint
from one thread, you won't be able to write to a different
endpoint from another thread until the read completes.
This works for <emphasis>half duplex</emphasis> protocols,
but otherwise you'd use asynchronous i/o requests.
</para>
</sect1>
<sect1 id="usbfs-lifecycle">
<title>Life Cycle of User Mode Drivers</title>
<para>Such a driver first needs to find a device file
for a device it knows how to handle.
Maybe it was told about it because a
<filename>/sbin/hotplug</filename> event handling agent
chose that driver to handle the new device.
Or maybe it's an application that scans all the
/proc/bus/usb device files, and ignores most devices.
In either case, it should <function>read()</function> all
the descriptors from the device file,
and check them against what it knows how to handle.
It might just reject everything except a particular
vendor and product ID, or need a more complex policy.
</para>
<para>Never assume there will only be one such device
on the system at a time!
If your code can't handle more than one device at
a time, at least detect when there's more than one, and
have your users choose which device to use.
</para>
<para>Once your user mode driver knows what device to use,
it interacts with it in either of two styles.
The simple style is to make only control requests; some
devices don't need more complex interactions than those.
(An example might be software using vendor-specific control
requests for some initialization or configuration tasks,
with a kernel driver for the rest.)
</para>
<para>More likely, you need a more complex style driver:
one using non-control endpoints, reading or writing data
and claiming exclusive use of an interface.
<emphasis>Bulk</emphasis> transfers are easiest to use,
but only their sibling <emphasis>interrupt</emphasis> transfers
work with low speed devices.
Both interrupt and <emphasis>isochronous</emphasis> transfers
offer service guarantees because their bandwidth is reserved.
Such "periodic" transfers are awkward to use through usbfs,
unless you're using the asynchronous calls. However, interrupt
transfers can also be used in a synchronous "one shot" style.
</para>
<para>Your user-mode driver should never need to worry
about cleaning up request state when the device is
disconnected, although it should close its open file
descriptors as soon as it starts seeing the ENODEV
errors.
</para>
</sect1>
<sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title>
<para>To use these ioctls, you need to include the following
headers in your userspace program:
<programlisting>#include &lt;linux/usb.h&gt;
#include &lt;linux/usbdevice_fs.h&gt;
#include &lt;asm/byteorder.h&gt;</programlisting>
The standard USB device model requests, from "Chapter 9" of
the USB 2.0 specification, are automatically included from
the <filename>&lt;linux/usb/ch9.h&gt;</filename> header.
</para>
<para>Unless noted otherwise, the ioctl requests
described here will
update the modification time on the usbfs file to which
they are applied (unless they fail).
A return of zero indicates success; otherwise, a
standard USB error code is returned. (These are
documented in
<filename>Documentation/usb/error-codes.txt</filename>
in your kernel sources.)
</para>
<para>Each of these files multiplexes access to several
I/O streams, one per endpoint.
Each device has one control endpoint (endpoint zero)
which supports a limited RPC style RPC access.
Devices are configured
by hub_wq (in the kernel) setting a device-wide
<emphasis>configuration</emphasis> that affects things
like power consumption and basic functionality.
The endpoints are part of USB <emphasis>interfaces</emphasis>,
which may have <emphasis>altsettings</emphasis>
affecting things like which endpoints are available.
Many devices only have a single configuration and interface,
so drivers for them will ignore configurations and altsettings.
</para>
<sect2 id="usbfs-mgmt">
<title>Management/Status Requests</title>
<para>A number of usbfs requests don't deal very directly
with device I/O.
They mostly relate to device management and status.
These are all synchronous requests.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_CLAIMINTERFACE</term>
<listitem><para>This is used to force usbfs to
claim a specific interface,
which has not previously been claimed by usbfs or any other
kernel driver.
The ioctl parameter is an integer holding the number of
the interface (bInterfaceNumber from descriptor).
</para><para>
Note that if your driver doesn't claim an interface
before trying to use one of its endpoints, and no
other driver has bound to it, then the interface is
automatically claimed by usbfs.
</para><para>
This claim will be released by a RELEASEINTERFACE ioctl,
or by closing the file descriptor.
File modification time is not updated by this request.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CONNECTINFO</term>
<listitem><para>Says whether the device is lowspeed.
The ioctl parameter points to a structure like this:
<programlisting>struct usbdevfs_connectinfo {
unsigned int devnum;
unsigned char slow;
}; </programlisting>
File modification time is not updated by this request.
</para><para>
<emphasis>You can't tell whether a "not slow"
device is connected at high speed (480 MBit/sec)
or just full speed (12 MBit/sec).</emphasis>
You should know the devnum value already,
it's the DDD value of the device file name.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_GETDRIVER</term>
<listitem><para>Returns the name of the kernel driver
bound to a given interface (a string). Parameter
is a pointer to this structure, which is modified:
<programlisting>struct usbdevfs_getdriver {
unsigned int interface;
char driver[USBDEVFS_MAXDRIVERNAME + 1];
};</programlisting>
File modification time is not updated by this request.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_IOCTL</term>
<listitem><para>Passes a request from userspace through
to a kernel driver that has an ioctl entry in the
<emphasis>struct usb_driver</emphasis> it registered.
<programlisting>struct usbdevfs_ioctl {
int ifno;
int ioctl_code;
void *data;
};
/* user mode call looks like this.
* 'request' becomes the driver->ioctl() 'code' parameter.
* the size of 'param' is encoded in 'request', and that data
* is copied to or from the driver->ioctl() 'buf' parameter.
*/
static int
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
{
struct usbdevfs_ioctl wrapper;
wrapper.ifno = ifno;
wrapper.ioctl_code = request;
wrapper.data = param;
return ioctl (fd, USBDEVFS_IOCTL, &amp;wrapper);
} </programlisting>
File modification time is not updated by this request.
</para><para>
This request lets kernel drivers talk to user mode code
through filesystem operations even when they don't create
a character or block special device.
It's also been used to do things like ask devices what
device special file should be used.
Two pre-defined ioctls are used
to disconnect and reconnect kernel drivers, so
that user mode code can completely manage binding
and configuration of devices.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RELEASEINTERFACE</term>
<listitem><para>This is used to release the claim usbfs
made on interface, either implicitly or because of a
USBDEVFS_CLAIMINTERFACE call, before the file
descriptor is closed.
The ioctl parameter is an integer holding the number of
the interface (bInterfaceNumber from descriptor);
File modification time is not updated by this request.
</para><warning><para>
<emphasis>No security check is made to ensure
that the task which made the claim is the one
which is releasing it.
This means that user mode driver may interfere
other ones. </emphasis>
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RESETEP</term>
<listitem><para>Resets the data toggle value for an endpoint
(bulk or interrupt) to DATA0.
The ioctl parameter is an integer endpoint number
(1 to 15, as identified in the endpoint descriptor),
with USB_DIR_IN added if the device's endpoint sends
data to the host.
</para><warning><para>
<emphasis>Avoid using this request.
It should probably be removed.</emphasis>
Using it typically means the device and driver will lose
toggle synchronization. If you really lost synchronization,
you likely need to completely handshake with the device,
using a request like CLEAR_HALT
or SET_INTERFACE.
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_DROP_PRIVILEGES</term>
<listitem><para>This is used to relinquish the ability
to do certain operations which are considered to be
privileged on a usbfs file descriptor.
This includes claiming arbitrary interfaces, resetting
a device on which there are currently claimed interfaces
from other users, and issuing USBDEVFS_IOCTL calls.
The ioctl parameter is a 32 bit mask of interfaces
the user is allowed to claim on this file descriptor.
You may issue this ioctl more than one time to narrow
said mask.
</para></listitem></varlistentry>
</variablelist>
</sect2>
<sect2 id="usbfs-sync">
<title>Synchronous I/O Support</title>
<para>Synchronous requests involve the kernel blocking
until the user mode request completes, either by
finishing successfully or by reporting an error.
In most cases this is the simplest way to use usbfs,
although as noted above it does prevent performing I/O
to more than one endpoint at a time.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_BULK</term>
<listitem><para>Issues a bulk read or write request to the
device.
The ioctl parameter is a pointer to this structure:
<programlisting>struct usbdevfs_bulktransfer {
unsigned int ep;
unsigned int len;
unsigned int timeout; /* in milliseconds */
void *data;
};</programlisting>
</para><para>The "ep" value identifies a
bulk endpoint number (1 to 15, as identified in an endpoint
descriptor),
masked with USB_DIR_IN when referring to an endpoint which
sends data to the host from the device.
The length of the data buffer is identified by "len";
Recent kernels support requests up to about 128KBytes.
<emphasis>FIXME say how read length is returned,
and how short reads are handled.</emphasis>.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CLEAR_HALT</term>
<listitem><para>Clears endpoint halt (stall) and
resets the endpoint toggle. This is only
meaningful for bulk or interrupt endpoints.
The ioctl parameter is an integer endpoint number
(1 to 15, as identified in an endpoint descriptor),
masked with USB_DIR_IN when referring to an endpoint which
sends data to the host from the device.
</para><para>
Use this on bulk or interrupt endpoints which have
stalled, returning <emphasis>-EPIPE</emphasis> status
to a data transfer request.
Do not issue the control request directly, since
that could invalidate the host's record of the
data toggle.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CONTROL</term>
<listitem><para>Issues a control request to the device.
The ioctl parameter points to a structure like this:
<programlisting>struct usbdevfs_ctrltransfer {
__u8 bRequestType;
__u8 bRequest;
__u16 wValue;
__u16 wIndex;
__u16 wLength;
__u32 timeout; /* in milliseconds */
void *data;
};</programlisting>
</para><para>
The first eight bytes of this structure are the contents
of the SETUP packet to be sent to the device; see the
USB 2.0 specification for details.
The bRequestType value is composed by combining a
USB_TYPE_* value, a USB_DIR_* value, and a
USB_RECIP_* value (from
<emphasis>&lt;linux/usb.h&gt;</emphasis>).
If wLength is nonzero, it describes the length of the data
buffer, which is either written to the device
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
</para><para>
At this writing, you can't transfer more than 4 KBytes
of data to or from a device; usbfs has a limit, and
some host controller drivers have a limit.
(That's not usually a problem.)
<emphasis>Also</emphasis> there's no way to say it's
not OK to get a short read back from the device.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RESET</term>
<listitem><para>Does a USB level device reset.
The ioctl parameter is ignored.
After the reset, this rebinds all device interfaces.
File modification time is not updated by this request.
</para><warning><para>
<emphasis>Avoid using this call</emphasis>
until some usbcore bugs get fixed,
since it does not fully synchronize device, interface,
and driver (not just usbfs) state.
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SETINTERFACE</term>
<listitem><para>Sets the alternate setting for an
interface. The ioctl parameter is a pointer to a
structure like this:
<programlisting>struct usbdevfs_setinterface {
unsigned int interface;
unsigned int altsetting;
}; </programlisting>
File modification time is not updated by this request.
</para><para>
Those struct members are from some interface descriptor
applying to the current configuration.
The interface number is the bInterfaceNumber value, and
the altsetting number is the bAlternateSetting value.
(This resets each endpoint in the interface.)
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SETCONFIGURATION</term>
<listitem><para>Issues the
<function>usb_set_configuration</function> call
for the device.
The parameter is an integer holding the number of
a configuration (bConfigurationValue from descriptor).
File modification time is not updated by this request.
</para><warning><para>
<emphasis>Avoid using this call</emphasis>
until some usbcore bugs get fixed,
since it does not fully synchronize device, interface,
and driver (not just usbfs) state.
</para></warning></listitem></varlistentry>
</variablelist>
</sect2>
<sect2 id="usbfs-async">
<title>Asynchronous I/O Support</title>
<para>As mentioned above, there are situations where it may be
important to initiate concurrent operations from user mode code.
This is particularly important for periodic transfers
(interrupt and isochronous), but it can be used for other
kinds of USB requests too.
In such cases, the asynchronous requests described here
are essential. Rather than submitting one request and having
the kernel block until it completes, the blocking is separate.
</para>
<para>These requests are packaged into a structure that
resembles the URB used by kernel device drivers.
(No POSIX Async I/O support here, sorry.)
It identifies the endpoint type (USBDEVFS_URB_TYPE_*),
endpoint (number, masked with USB_DIR_IN as appropriate),
buffer and length, and a user "context" value serving to
uniquely identify each request.
(It's usually a pointer to per-request data.)
Flags can modify requests (not as many as supported for
kernel drivers).
</para>
<para>Each request can specify a realtime signal number
(between SIGRTMIN and SIGRTMAX, inclusive) to request a
signal be sent when the request completes.
</para>
<para>When usbfs returns these urbs, the status value
is updated, and the buffer may have been modified.
Except for isochronous transfers, the actual_length is
updated to say how many bytes were transferred; if the
USBDEVFS_URB_DISABLE_SPD flag is set
("short packets are not OK"), if fewer bytes were read
than were requested then you get an error report.
</para>
<programlisting>struct usbdevfs_iso_packet_desc {
unsigned int length;
unsigned int actual_length;
unsigned int status;
};
struct usbdevfs_urb {
unsigned char type;
unsigned char endpoint;
int status;
unsigned int flags;
void *buffer;
int buffer_length;
int actual_length;
int start_frame;
int number_of_packets;
int error_count;
unsigned int signr;
void *usercontext;
struct usbdevfs_iso_packet_desc iso_frame_desc[];
};</programlisting>
<para> For these asynchronous requests, the file modification
time reflects when the request was initiated.
This contrasts with their use with the synchronous requests,
where it reflects when requests complete.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_DISCARDURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_DISCSIGNAL</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_REAPURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_REAPURBNDELAY</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SUBMITURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
</para><para>
</para></listitem></varlistentry>
</variablelist>
</sect2>
</sect1>
</chapter>
</book>
<!-- vim:syntax=sgml:sw=4
-->

File diff suppressed because it is too large Load Diff

View File

@ -10,6 +10,8 @@ _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(src
SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
@ -29,7 +31,7 @@ else ifneq ($(DOCBOOKS),)
else # HAVE_SPHINX
# User-friendly check for pdflatex
HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
@ -51,8 +53,8 @@ loop_cmd = $(echo-cmd) $(cmd_$(1))
# $5 reST source folder relative to $(srctree)/$(src),
# e.g. "media" for the linux-tv book-set at ./Documentation/media
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4);
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2;\
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(SPHINXBUILD) \
-b $2 \
@ -67,16 +69,19 @@ htmldocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
latexdocs:
ifeq ($(HAVE_PDFLATEX),0)
$(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
endif # HAVE_PDFLATEX
ifeq ($(HAVE_PDFLATEX),0)
pdfdocs:
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
pdfdocs: latexdocs
ifneq ($(HAVE_PDFLATEX),0)
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex)
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex;)
endif # HAVE_PDFLATEX
epubdocs:
@ -93,6 +98,7 @@ installmandocs:
cleandocs:
$(Q)rm -rf $(BUILDDIR)
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) -C Documentation/media clean
endif # HAVE_SPHINX

View File

@ -1,841 +1 @@
.. _submittingpatches:
How to Get Your Change Into the Linux Kernel or Care And Operation Of Your Linus Torvalds
=========================================================================================
For a person or company who wishes to submit a change to the Linux
kernel, the process can sometimes be daunting if you're not familiar
with "the system." This text is a collection of suggestions which
can greatly increase the chances of your change being accepted.
This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process
works, see :ref:`Documentation/development-process <development_process_main>`.
Also, read :ref:`Documentation/SubmitChecklist <submitchecklist>`
for a list of items to check before
submitting code. If you are submitting a driver, also read
:ref:`Documentation/SubmittingDrivers <submittingdrivers>`;
for device tree binding patches, read
Documentation/devicetree/bindings/submitting-patches.txt.
Many of these steps describe the default behavior of the ``git`` version
control system; if you use ``git`` to prepare your patches, you'll find much
of the mechanical work done for you, though you'll still need to prepare
and document a sensible set of patches. In general, use of ``git`` will make
your life as a kernel developer easier.
Creating and Sending your Change
********************************
0) Obtain a current source tree
-------------------------------
If you do not have a repository with the current kernel source handy, use
``git`` to obtain one. You'll want to start with the mainline repository,
which can be grabbed with::
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Note, however, that you may not want to develop against the mainline tree
directly. Most subsystem maintainers run their own trees and want to see
patches prepared against those trees. See the **T:** entry for the subsystem
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
the tree is not listed there.
It is still possible to download kernel releases via tarballs (as described
in the next section), but that is the hard way to do kernel development.
1) ``diff -up``
---------------
If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN``
to create patches. Git generates patches in this form by default; if
you're using ``git``, you can skip this section entirely.
All changes to the Linux kernel occur in the form of patches, as
generated by :manpage:`diff(1)`. When creating your patch, make sure to
create it in "unified diff" format, as supplied by the ``-u`` argument
to :manpage:`diff(1)`.
Also, please use the ``-p`` argument which shows which C function each
change is in - that makes the resultant ``diff`` a lot easier to read.
Patches should be based in the root kernel source directory,
not in any lower subdirectory.
To create a patch for a single file, it is often sufficient to do::
SRCTREE= linux
MYFILE= drivers/net/mydriver.c
cd $SRCTREE
cp $MYFILE $MYFILE.orig
vi $MYFILE # make your change
cd ..
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
To create a patch for multiple files, you should unpack a "vanilla",
or unmodified kernel source tree, and generate a ``diff`` against your
own source tree. For example::
MYSRC= /devel/linux
tar xvfz linux-3.19.tar.gz
mv linux-3.19 linux-3.19-vanilla
diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
linux-3.19-vanilla $MYSRC > /tmp/patch
``dontdiff`` is a list of files which are generated by the kernel during
the build process, and should be ignored in any :manpage:`diff(1)`-generated
patch.
Make sure your patch does not include any extra files which do not
belong in a patch submission. Make sure to review your patch -after-
generating it with :manpage:`diff(1)`, to ensure accuracy.
If your changes produce a lot of deltas, you need to split them into
individual patches which modify things in logical stages; see
:ref:`split_changes`. This will facilitate review by other kernel developers,
very important if you want your patch accepted.
If you're using ``git``, ``git rebase -i`` can help you with this process. If
you're not using ``git``, ``quilt`` <http://savannah.nongnu.org/projects/quilt>
is another popular alternative.
.. _describe_changes:
2) Describe your changes
------------------------
Describe your problem. Whether your patch is a one-line bug fix or
5000 lines of a new feature, there must be an underlying problem that
motivated you to do this work. Convince the reviewer that there is a
problem worth fixing and that it makes sense for them to read past the
first paragraph.
Describe user-visible impact. Straight up crashes and lockups are
pretty convincing, but not all bugs are that blatant. Even if the
problem was spotted during code review, describe the impact you think
it can have on users. Keep in mind that the majority of Linux
installations run kernels from secondary stable trees or
vendor/product-specific trees that cherry-pick only specific patches
from upstream, so include anything that could help route your change
downstream: provoking circumstances, excerpts from dmesg, crash
descriptions, performance regressions, latency spikes, lockups, etc.
Quantify optimizations and trade-offs. If you claim improvements in
performance, memory consumption, stack footprint, or binary size,
include numbers that back them up. But also describe non-obvious
costs. Optimizations usually aren't free but trade-offs between CPU,
memory, and readability; or, when it comes to heuristics, between
different workloads. Describe the expected downsides of your
optimization so that the reviewer can weigh costs against benefits.
Once the problem is established, describe what you are actually doing
about it in technical detail. It's important to describe the change
in plain English for the reviewer to verify that the code is behaving
as you intend it to.
The maintainer will thank you if you write your patch description in a
form which can be easily pulled into Linux's source code management
system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`.
Solve only one problem per patch. If your description starts to get
long, that's a sign that you probably need to split up your patch.
See :ref:`split_changes`.
When you submit or resubmit a patch or patch series, include the
complete patch description and justification for it. Don't just
say that this is version N of the patch (series). Don't expect the
subsystem maintainer to refer back to earlier patch versions or referenced
URLs to find the patch description and put that into the patch.
I.e., the patch (series) and its description should be self-contained.
This benefits both the maintainers and reviewers. Some reviewers
probably didn't even receive earlier versions of the patch.
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.
If the patch fixes a logged bug entry, refer to that bug entry by
number and URL. If the patch follows from a mailing list discussion,
give a URL to the mailing list archive; use the https://lkml.kernel.org/
redirector with a ``Message-Id``, to ensure that the links cannot become
stale.
However, try to make your explanation understandable without external
resources. In addition to giving a URL to a mailing list archive or
bug, summarize the relevant points of the discussion that led to the
patch as submitted.
If you want to refer to a specific commit, don't just refer to the
SHA-1 ID of the commit. Please also include the oneline summary of
the commit, to make it easier for reviewers to know what it is about.
Example::
Commit e21d2170f36602ae2708 ("video: remove unnecessary
platform_set_drvdata()") removed the unnecessary
platform_set_drvdata(), but left the variable "dev" unused,
delete it.
You should also be sure to use at least the first twelve characters of the
SHA-1 ID. The kernel repository holds a *lot* of objects, making
collisions with shorter IDs a real possibility. Bear in mind that, even if
there is no collision with your six-character ID now, that condition may
change five years from now.
If your patch fixes a bug in a specific commit, e.g. you found an issue using
``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
the SHA-1 ID, and the one line summary. For example::
Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
The following ``git config`` settings can be used to add a pretty format for
outputting the above style in the ``git log`` or ``git show`` commands::
[core]
abbrev = 12
[pretty]
fixes = Fixes: %h (\"%s\")
.. _split_changes:
3) Separate your changes
------------------------
Separate each **logical change** into a separate patch.
For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more patches. If your changes include an API update, and a new
driver which uses that new API, separate those into two patches.
On the other hand, if you make a single change to numerous files,
group those changes into a single patch. Thus a single logical change
is contained within a single patch.
The point to remember is that each patch should make an easily understood
change that can be verified by reviewers. Each patch should be justifiable
on its own merits.
If one patch depends on another patch in order for a change to be
complete, that is OK. Simply note **"this patch depends on patch X"**
in your patch description.
When dividing your change into a series of patches, take special care to
ensure that the kernel builds and runs properly after each patch in the
series. Developers using ``git bisect`` to track down a problem can end up
splitting your patch series at any point; they will not thank you if you
introduce bugs in the middle.
If you cannot condense your patch set into a smaller set of patches,
then only post say 15 or so at a time and wait for review and integration.
4) Style-check your changes
---------------------------
Check your patch for basic style violations, details of which can be
found in
:ref:`Documentation/CodingStyle <codingstyle>`.
Failure to do so simply wastes
the reviewers time and will get your patch rejected, probably
without even being read.
One significant exception is when moving code from one file to
another -- in this case you should not modify the moved code at all in
the same patch which moves it. This clearly delineates the act of
moving the code and your changes. This greatly aids review of the
actual differences and allows tools to better track the history of
the code itself.
Check your patches with the patch style checker prior to submission
(scripts/checkpatch.pl). Note, though, that the style checker should be
viewed as a guide, not as a replacement for human judgment. If your code
looks better with a violation then its probably best left alone.
The checker reports at three levels:
- ERROR: things that are very likely to be wrong
- WARNING: things requiring careful review
- CHECK: things requiring thought
You should be able to justify all violations that remain in your
patch.
5) Select the recipients for your patch
---------------------------------------
You should always copy the appropriate subsystem maintainer(s) on any patch
to code that they maintain; look through the MAINTAINERS file and the
source code revision history to see who those maintainers are. The
script scripts/get_maintainer.pl can be very useful at this step. If you
cannot find a maintainer for the subsystem you are working on, Andrew
Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
You should also normally choose at least one mailing list to receive a copy
of your patch set. linux-kernel@vger.kernel.org functions as a list of
last resort, but the volume on that list has caused a number of developers
to tune it out. Look in the MAINTAINERS file for a subsystem-specific
list; your patch will probably get more attention there. Please do not
spam unrelated lists, though.
Many kernel-related lists are hosted on vger.kernel.org; you can find a
list of them at http://vger.kernel.org/vger-lists.html. There are
kernel-related lists hosted elsewhere as well, though.
Do not send more than 15 patches at once to the vger mailing lists!!!
Linus Torvalds is the final arbiter of all changes accepted into the
Linux kernel. His e-mail address is <torvalds@linux-foundation.org>.
He gets a lot of e-mail, and, at this point, very few patches go through
Linus directly, so typically you should do your best to -avoid-
sending him e-mail.
If you have a patch that fixes an exploitable security bug, send that patch
to security@kernel.org. For severe bugs, a short embargo may be considered
to allow distributors to get the patch out to users; in such cases,
obviously, the patch should not be sent to any public lists.
Patches that fix a severe bug in a released kernel should be directed
toward the stable maintainers by putting a line like this::
Cc: stable@vger.kernel.org
into the sign-off area of your patch (note, NOT an email recipient). You
should also read
:ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>`
in addition to this file.
Note, however, that some subsystem maintainers want to come to their own
conclusions on which patches should go to the stable trees. The networking
maintainer, in particular, would rather not see individual developers
adding lines like the above to their patches.
If changes affect userland-kernel interfaces, please send the MAN-PAGES
maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at
least a notification of the change, so that some information makes its way
into the manual pages. User-space API changes should also be copied to
linux-api@vger.kernel.org.
For small patches you may want to CC the Trivial Patch Monkey
trivial@kernel.org which collects "trivial" patches. Have a look
into the MAINTAINERS file for its current manager.
Trivial patches must qualify for one of the following rules:
- Spelling fixes in documentation
- Spelling fixes for errors which could break :manpage:`grep(1)`
- Warning fixes (cluttering with useless warnings is bad)
- Compilation fixes (only if they are actually correct)
- Runtime fixes (only if they actually fix things)
- Removing use of deprecated functions/macros
- Contact detail and documentation fixes
- Non-portable code replaced by portable code (even in arch-specific,
since people copy, as long as it's trivial)
- Any fix by the author/maintainer of the file (ie. patch monkey
in re-transmission mode)
6) No MIME, no links, no compression, no attachments. Just plain text
----------------------------------------------------------------------
Linus and other kernel developers need to be able to read and comment
on the changes you are submitting. It is important for a kernel
developer to be able to "quote" your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.
For this reason, all patches should be submitted by e-mail "inline".
.. warning::
Be wary of your editor's word-wrap corrupting your patch,
if you choose to cut-n-paste your patch.
Do not attach the patch as a MIME attachment, compressed or not.
Many popular e-mail applications will not always transmit a MIME
attachment as plain text, making it impossible to comment on your
code. A MIME attachment also takes Linus a bit more time to process,
decreasing the likelihood of your MIME-attached change being accepted.
Exception: If your mailer is mangling patches then someone may ask
you to re-send them using MIME.
See :ref:`Documentation/email-clients.txt <email_clients>`
for hints about configuring your e-mail client so that it sends your patches
untouched.
7) E-mail size
--------------
Large changes are not appropriate for mailing lists, and some
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
it is preferred that you store your patch on an Internet-accessible
server, and provide instead a URL (link) pointing to your patch. But note
that if your patch exceeds 300 kB, it almost certainly needs to be broken up
anyway.
8) Respond to review comments
-----------------------------
Your patch will almost certainly get comments from reviewers on ways in
which the patch can be improved. You must respond to those comments;
ignoring reviewers is a good way to get ignored in return. Review comments
or questions that do not lead to a code change should almost certainly
bring about a comment or changelog entry so that the next reviewer better
understands what is going on.
Be sure to tell the reviewers what changes you are making and to thank them
for their time. Code review is a tiring and time-consuming process, and
reviewers sometimes get grumpy. Even in that case, though, respond
politely and address the problems they have pointed out.
9) Don't get discouraged - or impatient
---------------------------------------
After you have submitted your change, be patient and wait. Reviewers are
busy people and may not get to your patch right away.
Once upon a time, patches used to disappear into the void without comment,
but the development process works more smoothly than that now. You should
receive comments within a week or so; if that does not happen, make sure
that you have sent your patches to the right place. Wait for a minimum of
one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.
10) Include PATCH in the subject
--------------------------------
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH]. This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.
11) Sign your work
------------------
To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
layers of maintainers, we've introduced a "sign-off" procedure on
patches that are being emailed around.
The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below:
Developer's Certificate of Origin 1.1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
then you just add a line saying::
Signed-off-by: Random J Developer <random@developer.example.org>
using your real name (sorry, no pseudonyms or anonymous contributions.)
Some people also put extra tags at the end. They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
If you are a subsystem or branch maintainer, sometimes you need to slightly
modify patches you receive in order to merge them, because the code is not
exactly the same in your tree and the submitters'. If you stick strictly to
rule (c), you should ask the submitter to rediff, but this is a totally
counter-productive waste of time and energy. Rule (b) allows you to adjust
the code, but then it is very impolite to change one submitter's code and
make him endorse your bugs. To solve this problem, it is recommended that
you add a line between the last Signed-off-by header and yours, indicating
the nature of your changes. While there is nothing mandatory about this, it
seems like prepending the description with your mail and/or name, all
enclosed in square brackets, is noticeable enough to make it obvious that
you are responsible for last-minute changes. Example::
Signed-off-by: Random J Developer <random@developer.example.org>
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
This practice is particularly helpful if you maintain a stable branch and
want at the same time to credit the author, track changes, merge the fix,
and protect the submitter from complaints. Note that under no circumstances
can you change the author's identity (the From header), as it is the one
which appears in the changelog.
Special note to back-porters: It seems to be a common and useful practice
to insert an indication of the origin of a patch at the top of the commit
message (just after the subject line) to facilitate tracking. For instance,
here's what we see in a 3.x-stable release::
Date: Tue Oct 7 07:26:38 2014 -0400
libata: Un-break ATA blacklist
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
And here's what might appear in an older kernel once a patch is backported::
Date: Tue May 13 22:12:27 2008 +0200
wireless, airo: waitbusy() won't delay
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
Whatever the format, this information provides a valuable help to people
tracking your trees, and to people trying to troubleshoot bugs in your
tree.
12) When to use Acked-by: and Cc:
---------------------------------
The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.
If a person was not directly involved in the preparation or handling of a
patch but wishes to signify and record their approval of it then they can
ask to have an Acked-by: line added to the patch's changelog.
Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch.
Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
has at least reviewed the patch and has indicated acceptance. Hence patch
mergers will sometimes manually convert an acker's "yep, looks good to me"
into an Acked-by: (but note that it is usually better to ask for an
explicit ack).
Acked-by: does not necessarily indicate acknowledgement of the entire patch.
For example, if a patch affects multiple subsystems and has an Acked-by: from
one subsystem maintainer then this usually indicates acknowledgement of just
the part which affects that maintainer's code. Judgement should be used here.
When in doubt people should refer to the original discussion in the mailing
list archives.
If a person has had the opportunity to comment on a patch, but has not
provided such comments, you may optionally add a ``Cc:`` tag to the patch.
This is the only tag which might be added without an explicit action by the
person it names - but it should indicate that this person was copied on the
patch. This tag documents that potentially interested parties
have been included in the discussion.
13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
--------------------------------------------------------------------------
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. Please note that if
the bug was reported in private, then ask for permission first before using the
Reported-by tag.
A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
some testing has been performed, provides a means to locate testers for
future patches, and ensures credit for the testers.
Reviewed-by:, instead, indicates that the patch has been reviewed and found
acceptable according to the Reviewer's Statement:
Reviewer's statement of oversight
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By offering my Reviewed-by: tag, I state that:
(a) I have carried out a technical review of this patch to
evaluate its appropriateness and readiness for inclusion into
the mainline kernel.
(b) Any problems, concerns, or questions relating to the patch
have been communicated back to the submitter. I am satisfied
with the submitter's response to my comments.
(c) While there may be things that could be improved with this
submission, I believe that it is, at this time, (1) a
worthwhile modification to the kernel, and (2) free of known
issues which would argue against its inclusion.
(d) While I have reviewed the patch and believe it to be sound, I
do not (unless explicitly stated elsewhere) make any
warranties or guarantees that it will achieve its stated
purpose or function properly in any given situation.
A Reviewed-by tag is a statement of opinion that the patch is an
appropriate modification of the kernel without any remaining serious
technical issues. Any interested reviewer (who has done the work) can
offer a Reviewed-by tag for a patch. This tag serves to give credit to
reviewers and to inform maintainers of the degree of review which has been
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
increase the likelihood of your patch getting into the kernel.
A Suggested-by: tag indicates that the patch idea is suggested by the person
named and ensures credit to the person for the idea. Please note that this
tag should not be added without the reporter's permission, especially if the
idea was not posted in a public forum. That said, if we diligently credit our
idea reporters, they will, hopefully, be inspired to help us again in the
future.
A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
is used to make it easy to determine where a bug originated, which can help
review a bug fix. This tag also assists the stable kernel team in determining
which stable kernel versions should receive your fix. This is the preferred
method for indicating a bug fixed by the patch. See :ref:`describe_changes`
for more details.
14) The canonical patch format
------------------------------
This section describes how the patch itself should be formatted. Note
that, if you have your patches stored in a ``git`` repository, proper patch
formatting can be had with ``git format-patch``. The tools cannot create
the necessary text, though, so read the instructions below anyway.
The canonical patch subject line is::
Subject: [PATCH 001/123] subsystem: summary phrase
The canonical patch message body contains the following:
- A ``from`` line specifying the patch author (only needed if the person
sending the patch is not the author).
- An empty line.
- The body of the explanation, line wrapped at 75 columns, which will
be copied to the permanent changelog to describe this patch.
- The ``Signed-off-by:`` lines, described above, which will
also go in the changelog.
- A marker line containing simply ``---``.
- Any additional comments not suitable for the changelog.
- The actual patch (``diff`` output).
The Subject line format makes it very easy to sort the emails
alphabetically by subject line - pretty much any email reader will
support that - since because the sequence number is zero-padded,
the numerical and alphabetic sort is the same.
The ``subsystem`` in the email's Subject should identify which
area or subsystem of the kernel is being patched.
The ``summary phrase`` in the email's Subject should concisely
describe the patch which that email contains. The ``summary
phrase`` should not be a filename. Do not use the same ``summary
phrase`` for every patch in a whole patch series (where a ``patch
series`` is an ordered sequence of multiple, related patches).
Bear in mind that the ``summary phrase`` of your email becomes a
globally-unique identifier for that patch. It propagates all the way
into the ``git`` changelog. The ``summary phrase`` may later be used in
developer discussions which refer to the patch. People will want to
google for the ``summary phrase`` to read discussion regarding that
patch. It will also be the only thing that people may quickly see
when, two or three months later, they are going through perhaps
thousands of patches using tools such as ``gitk`` or ``git log
--oneline``.
For these reasons, the ``summary`` must be no more than 70-75
characters, and it must describe both what the patch changes, as well
as why the patch might be necessary. It is challenging to be both
succinct and descriptive, but that is what a well-written summary
should do.
The ``summary phrase`` may be prefixed by tags enclosed in square
brackets: "Subject: [PATCH <tag>...] <summary phrase>". The tags are
not considered part of the summary phrase, but describe how the patch
should be treated. Common tags might include a version descriptor if
the multiple versions of the patch have been sent out in response to
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
comments. If there are four patches in a patch series the individual
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
that developers understand the order in which the patches should be
applied and that they have reviewed or applied all of the patches in
the patch series.
A couple of example Subjects::
Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching
Subject: [PATCH v2 01/27] x86: fix eflags tracking
The ``from`` line must be the very first line in the message body,
and has the form:
From: Original Author <author@example.com>
The ``from`` line specifies who will be credited as the author of the
patch in the permanent changelog. If the ``from`` line is missing,
then the ``From:`` line from the email header will be used to determine
the patch author in the changelog.
The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
have led to this patch. Including symptoms of the failure which the
patch addresses (kernel log messages, oops messages, etc.) is
especially useful for people who might be searching the commit logs
looking for the applicable patch. If a patch fixes a compile failure,
it may not be necessary to include _all_ of the compile failures; just
enough that it is likely that someone searching for the patch can find
it. As in the ``summary phrase``, it is important to be both succinct as
well as descriptive.
The ``---`` marker line serves the essential purpose of marking for patch
handling tools where the changelog message ends.
One good use for the additional comments after the ``---`` marker is for
a ``diffstat``, to show what files have changed, and the number of
inserted and deleted lines per file. A ``diffstat`` is especially useful
on bigger patches. Other comments relevant only to the moment or the
maintainer, not suitable for the permanent changelog, should also go
here. A good example of such comments might be ``patch changelogs``
which describe what has changed between the v1 and v2 version of the
patch.
If you are going to include a ``diffstat`` after the ``---`` marker, please
use ``diffstat`` options ``-p 1 -w 70`` so that filenames are listed from
the top of the kernel source tree and don't use too much horizontal
space (easily fit in 80 columns, maybe with some indentation). (``git``
generates appropriate diffstats by default.)
See more details on the proper patch format in the following
references.
.. _explicit_in_reply_to:
15) Explicit In-Reply-To headers
--------------------------------
It can be helpful to manually add In-Reply-To: headers to a patch
(e.g., when using ``git send-email``) to associate the patch with
previous relevant discussion, e.g. to link a bug fix to the email with
the bug report. However, for a multi-patch series, it is generally
best to avoid using In-Reply-To: to link to older versions of the
series. This way multiple versions of the patch don't become an
unmanageable forest of references in email clients. If a link is
helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
the cover email text) to link to an earlier version of the patch series.
16) Sending ``git pull`` requests
---------------------------------
If you have a series of patches, it may be most convenient to have the
maintainer pull them directly into the subsystem repository with a
``git pull`` operation. Note, however, that pulling patches from a developer
requires a higher degree of trust than taking patches from a mailing list.
As a result, many subsystem maintainers are reluctant to take pull
requests, especially from new, unknown developers. If in doubt you can use
the pull request as the cover letter for a normal posting of the patch
series, giving the maintainer the option of using either.
A pull request should have [GIT] or [PULL] in the subject line. The
request itself should include the repository name and the branch of
interest on a single line; it should look something like::
Please pull from
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
to get these changes:
A pull request should also include an overall message saying what will be
included in the request, a ``git shortlog`` listing of the patches
themselves, and a ``diffstat`` showing the overall effect of the patch series.
The easiest way to get all this information together is, of course, to let
``git`` do it for you with the ``git request-pull`` command.
Some maintainers (including Linus) want to see pull requests from signed
commits; that increases their confidence that the request actually came
from you. Linus, in particular, will not pull from public hosting sites
like GitHub in the absence of a signed tag.
The first step toward creating such tags is to make a GNUPG key and get it
signed by one or more core kernel developers. This step can be hard for
new developers, but there is no way around it. Attending conferences can
be a good way to find developers who can sign your key.
Once you have prepared a patch series in ``git`` that you wish to have somebody
pull, create a signed tag with ``git tag -s``. This will create a new tag
identifying the last commit in the series and containing a signature
created with your private key. You will also have the opportunity to add a
changelog-style message to the tag; this is an ideal place to describe the
effects of the pull request as a whole.
If the tree the maintainer will be pulling from is not the repository you
are working from, don't forget to push the signed tag explicitly to the
public tree.
When generating your pull request, use the signed tag as the target. A
command like this will do the trick::
git request-pull master git://my.public.tree/linux.git my-signed-tag
REFERENCES
**********
Andrew Morton, "The perfect patch" (tpp).
<http://www.ozlabs.org/~akpm/stuff/tpp.txt>
Jeff Garzik, "Linux kernel patch submission format".
<http://linux.yyz.us/patch-format.html>
Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
<http://www.kroah.com/log/linux/maintainer.html>
<http://www.kroah.com/log/linux/maintainer-02.html>
<http://www.kroah.com/log/linux/maintainer-03.html>
<http://www.kroah.com/log/linux/maintainer-04.html>
<http://www.kroah.com/log/linux/maintainer-05.html>
<http://www.kroah.com/log/linux/maintainer-06.html>
NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
<https://lkml.org/lkml/2005/7/11/336>
Kernel Documentation/CodingStyle:
:ref:`Documentation/CodingStyle <codingstyle>`
Linus Torvalds's mail on the canonical patch format:
<http://lkml.org/lkml/2005/4/7/183>
Andi Kleen, "On submitting kernel patches"
Some strategies to get difficult or controversial changes in.
http://halobates.de/on-submitting-patches.pdf
This file has moved to process/submitting-patches.rst

View File

@ -1,39 +0,0 @@
Software cursor for VGA by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
======================= and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Linux now has some ability to manipulate cursor appearance. Normally, you
can set the size of hardware cursor (and also work around some ugly bugs in
those miserable Trident cards--see #define TRIDENT_GLITCH in drivers/video/
vgacon.c). You can now play a few new tricks: you can make your cursor look
like a non-blinking red block, make it inverse background of the character it's
over or to highlight that character and still choose whether the original
hardware cursor should remain visible or not. There may be other things I have
never thought of.
The cursor appearance is controlled by a "<ESC>[?1;2;3c" escape sequence
where 1, 2 and 3 are parameters described below. If you omit any of them,
they will default to zeroes.
Parameter 1 specifies cursor size (0=default, 1=invisible, 2=underline, ...,
8=full block) + 16 if you want the software cursor to be applied + 32 if you
want to always change the background color + 64 if you dislike having the
background the same as the foreground. Highlights are ignored for the last two
flags.
The second parameter selects character attribute bits you want to change
(by simply XORing them with the value of this parameter). On standard VGA,
the high four bits specify background and the low four the foreground. In both
groups, low three bits set color (as in normal color codes used by the console)
and the most significant one turns on highlight (or sometimes blinking--it
depends on the configuration of your VGA).
The third parameter consists of character attribute bits you want to set.
Bit setting takes place before bit toggling, so you can simply clear a bit by
including it in both the set mask and the toggle mask.
Examples:
=========
To get normal blinking underline, use: echo -e '\033[?2c'
To get blinking block, use: echo -e '\033[?6c'
To get red non-blinking block, use: echo -e '\033[?17;0;64c'

View File

@ -101,6 +101,6 @@ received a notification, it will set the backlight level accordingly. This does
not affect the sending of event to user space, they are always sent to user
space regardless of whether or not the video module controls the backlight level
directly. This behaviour can be controlled through the brightness_switch_enabled
module parameter as documented in kernel-parameters.txt. It is recommended to
module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
disable this behaviour once a GUI environment starts up and wants to have full
control of the backlight level.

View File

@ -0,0 +1,411 @@
Linux kernel release 4.x <http://kernel.org/>
=============================================
These are the release notes for Linux version 4. Read them carefully,
as they tell you what this is all about, explain how to install the
kernel, and what to do if something goes wrong.
What is Linux?
--------------
Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across
the Net. It aims towards POSIX and Single UNIX Specification compliance.
It has all the features you would expect in a modern fully-fledged Unix,
including true multitasking, virtual memory, shared libraries, demand
loading, shared copy-on-write executables, proper memory management,
and multistack networking including IPv4 and IPv6.
It is distributed under the GNU General Public License - see the
accompanying COPYING file for more details.
On what hardware does it run?
-----------------------------
Although originally developed first for 32-bit x86-based PCs (386 or higher),
today Linux also runs on (at least) the Compaq Alpha AXP, Sun SPARC and
UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, Cell,
IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64, AXIS CRIS,
Xtensa, Tilera TILE, AVR32, ARC and Renesas M32R architectures.
Linux is easily portable to most general-purpose 32- or 64-bit architectures
as long as they have a paged memory management unit (PMMU) and a port of the
GNU C compiler (gcc) (part of The GNU Compiler Collection, GCC). Linux has
also been ported to a number of architectures without a PMMU, although
functionality is then obviously somewhat limited.
Linux has also been ported to itself. You can now run the kernel as a
userspace application - this is called UserMode Linux (UML).
Documentation
-------------
- There is a lot of documentation available both in electronic form on
the Internet and in books, both Linux-specific and pertaining to
general UNIX questions. I'd recommend looking into the documentation
subdirectories on any Linux FTP site for the LDP (Linux Documentation
Project) books. This README is not meant to be documentation on the
system: there are much better sources available.
- There are various README files in the Documentation/ subdirectory:
these typically contain kernel-specific installation notes for some
drivers for example. See Documentation/00-INDEX for a list of what
is contained in each file. Please read the
:ref:`Documentation/process/changes.rst <changes>` file, as it
contains information about the problems, which may result by upgrading
your kernel.
- The Documentation/DocBook/ subdirectory contains several guides for
kernel developers and users. These guides can be rendered in a
number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
or ``make mandocs`` will render the documentation in the requested format.
Installing the kernel source
----------------------------
- If you install the full sources, put the kernel tarball in a
directory where you have permissions (e.g. your home directory) and
unpack it::
xz -cd linux-4.X.tar.xz | tar xvf -
Replace "X" with the version number of the latest kernel.
Do NOT use the /usr/src/linux area! This area has a (usually
incomplete) set of kernel headers that are used by the library header
files. They should match the library, and not get messed up by
whatever the kernel-du-jour happens to be.
- You can also upgrade between 4.x releases by patching. Patches are
distributed in the xz format. To install by patching, get all the
newer patch files, enter the top level directory of the kernel source
(linux-4.X) and execute::
xz -cd ../patch-4.x.xz | patch -p1
Replace "x" for all versions bigger than the version "X" of your current
source tree, **in_order**, and you should be ok. You may want to remove
the backup files (some-file-name~ or some-file-name.orig), and make sure
that there are no failed patches (some-file-name# or some-file-name.rej).
If there are, either you or I have made a mistake.
Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
(also known as the -stable kernels) are not incremental but instead apply
directly to the base 4.x kernel. For example, if your base kernel is 4.0
and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
patch -R) **before** applying the 4.0.3 patch. You can read more on this in
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
Alternatively, the script patch-kernel can be used to automate this
process. It determines the current kernel version and applies any
patches found::
linux/scripts/patch-kernel linux
The first argument in the command above is the location of the
kernel source. Patches are applied from the current directory, but
an alternative directory can be specified as the second argument.
- Make sure you have no stale .o files and dependencies lying around::
cd linux
make mrproper
You should now have the sources correctly installed.
Software requirements
---------------------
Compiling and running the 4.x kernels requires up-to-date
versions of various software packages. Consult
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
required and how to get updates for these packages. Beware that using
excessively old versions of these packages can cause indirect
errors that are very difficult to track down, so don't assume that
you can just update packages when obvious problems arise during
build or operation.
Build directory for the kernel
------------------------------
When compiling the kernel, all output files will per default be
stored together with the kernel source code.
Using the option ``make O=output/dir`` allows you to specify an alternate
place for the output files (including .config).
Example::
kernel source code: /usr/src/linux-4.X
build directory: /home/name/build/kernel
To configure and build the kernel, use::
cd /usr/src/linux-4.X
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
Please note: If the ``O=output/dir`` option is used, then it must be
used for all invocations of make.
Configuring the kernel
----------------------
Do not skip this step even if you are only upgrading one minor
version. New configuration options are added in each release, and
odd problems will turn up if the configuration files are not set up
as expected. If you want to carry your existing configuration to a
new version with minimal work, use ``make oldconfig``, which will
only ask you for the answers to new questions.
- Alternative configuration commands are::
"make config" Plain text interface.
"make menuconfig" Text based color menus, radiolists & dialogs.
"make nconfig" Enhanced text based color menus.
"make xconfig" Qt based configuration tool.
"make gconfig" GTK+ based configuration tool.
"make oldconfig" Default all questions based on the contents of
your existing ./.config file and asking about
new config symbols.
"make silentoldconfig"
Like above, but avoids cluttering the screen
with questions already answered.
Additionally updates the dependencies.
"make olddefconfig"
Like above, but sets new symbols to their default
values without prompting.
"make defconfig" Create a ./.config file by using the default
symbol values from either arch/$ARCH/defconfig
or arch/$ARCH/configs/${PLATFORM}_defconfig,
depending on the architecture.
"make ${PLATFORM}_defconfig"
Create a ./.config file by using the default
symbol values from
arch/$ARCH/configs/${PLATFORM}_defconfig.
Use "make help" to get a list of all available
platforms of your architecture.
"make allyesconfig"
Create a ./.config file by setting symbol
values to 'y' as much as possible.
"make allmodconfig"
Create a ./.config file by setting symbol
values to 'm' as much as possible.
"make allnoconfig" Create a ./.config file by setting symbol
values to 'n' as much as possible.
"make randconfig" Create a ./.config file by setting symbol
values to random values.
"make localmodconfig" Create a config based on current config and
loaded modules (lsmod). Disables any module
option that is not needed for the loaded modules.
To create a localmodconfig for another machine,
store the lsmod of that machine into a file
and pass it in as a LSMOD parameter.
target$ lsmod > /tmp/mylsmod
target$ scp /tmp/mylsmod host:/tmp
host$ make LSMOD=/tmp/mylsmod localmodconfig
The above also works when cross compiling.
"make localyesconfig" Similar to localmodconfig, except it will convert
all module options to built in (=y) options.
You can find more information on using the Linux kernel config tools
in Documentation/kbuild/kconfig.txt.
- NOTES on ``make config``:
- Having unnecessary drivers will make the kernel bigger, and can
under some circumstances lead to problems: probing for a
nonexistent controller card may confuse your other controllers
- A kernel with math-emulation compiled in will still use the
coprocessor if one is present: the math emulation will just
never get used in that case. The kernel will be slightly larger,
but will work on different machines regardless of whether they
have a math coprocessor or not.
- The "kernel hacking" configuration details usually result in a
bigger or slower kernel (or both), and can even make the kernel
less stable by configuring some routines to actively try to
break bad code to find kernel problems (kmalloc()). Thus you
should probably answer 'n' to the questions for "development",
"experimental", or "debugging" features.
Compiling the kernel
--------------------
- Make sure you have at least gcc 3.2 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
Please note that you can still run a.out user programs with this kernel.
- Do a ``make`` to create a compressed kernel image. It is also
possible to do ``make install`` if you have lilo installed to suit the
kernel makefiles, but you may want to check your particular lilo setup first.
To do the actual install, you have to be root, but none of the normal
build should require that. Don't take the name of root in vain.
- If you configured any of the parts of the kernel as ``modules``, you
will also have to do ``make modules_install``.
- Verbose kernel compile/build output:
Normally, the kernel build system runs in a fairly quiet mode (but not
totally silent). However, sometimes you or other kernel developers need
to see compile, link, or other commands exactly as they are executed.
For this, use "verbose" build mode. This is done by passing
``V=1`` to the ``make`` command, e.g.::
make V=1 all
To have the build system also tell the reason for the rebuild of each
target, use ``V=2``. The default is ``V=0``.
- Keep a backup kernel handy in case something goes wrong. This is
especially true for the development releases, since each new release
contains new code which has not been debugged. Make sure you keep a
backup of the modules corresponding to that kernel, as well. If you
are installing a new kernel with the same version number as your
working kernel, make a backup of your modules directory before you
do a ``make modules_install``.
Alternatively, before compiling, use the kernel config option
"LOCALVERSION" to append a unique suffix to the regular kernel version.
LOCALVERSION can be set in the "General Setup" menu.
- In order to boot your new kernel, you'll need to copy the kernel
image (e.g. .../linux/arch/x86/boot/bzImage after compilation)
to the place where your regular bootable kernel is found.
- Booting a kernel directly from a floppy without the assistance of a
bootloader such as LILO, is no longer supported.
If you boot Linux from the hard drive, chances are you use LILO, which
uses the kernel image as specified in the file /etc/lilo.conf. The
kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or
/boot/bzImage. To use the new kernel, save a copy of the old image
and copy the new image over the old one. Then, you MUST RERUN LILO
to update the loading map! If you don't, you won't be able to boot
the new kernel image.
Reinstalling LILO is usually a matter of running /sbin/lilo.
You may wish to edit /etc/lilo.conf to specify an entry for your
old kernel image (say, /vmlinux.old) in case the new one does not
work. See the LILO docs for more information.
After reinstalling LILO, you should be all set. Shutdown the system,
reboot, and enjoy!
If you ever need to change the default root device, video mode,
ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
alternatively the LILO boot options when appropriate). No need to
recompile the kernel to change these parameters.
- Reboot with the new kernel and enjoy.
If something goes wrong
-----------------------
- If you have problems that seem to be due to kernel bugs, please check
the file MAINTAINERS to see if there is a particular person associated
with the part of the kernel that you are having trouble with. If there
isn't anyone listed there, then the second best thing is to mail
them to me (torvalds@linux-foundation.org), and possibly to any other
relevant mailing-list or to the newsgroup.
- In all bug-reports, *please* tell what kernel you are talking about,
how to duplicate the problem, and what your setup is (use your common
sense). If the problem is new, tell me so, and if the problem is
old, please try to tell me when you first noticed it.
- If the bug results in a message like::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
or similar kernel debugging information on your screen or in your
system log, please duplicate it *exactly*. The dump may look
incomprehensible to you, but it does contain information that may
help debugging the problem. The text above the dump is also
important: it tells something about why the kernel dumped code (in
the above example, it's due to a bad kernel pointer). More information
on making sense of the dump is in Documentation/admin-guide/oops-tracing.rst
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
as is, otherwise you will have to use the ``ksymoops`` program to make
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
This utility can be downloaded from
ftp://ftp.<country>.kernel.org/pub/linux/utils/kernel/ksymoops/ .
Alternatively, you can do the dump lookup by hand:
- In debugging dumps like the above, it helps enormously if you can
look up what the EIP value means. The hex value as such doesn't help
me or anybody else very much: it will depend on your particular
kernel setup. What you should do is take the hex value from the EIP
line (ignore the ``0010:``), and look it up in the kernel namelist to
see which kernel function contains the offending address.
To find out the kernel function name, you'll need to find the system
binary associated with the kernel that exhibited the symptom. This is
the file 'linux/vmlinux'. To extract the namelist and match it against
the EIP from the kernel crash, do::
nm vmlinux | sort | less
This will give you a list of kernel addresses sorted in ascending
order, from which it is simple to find the function that contains the
offending address. Note that the address given by the kernel
debugging messages will not necessarily match exactly with the
function addresses (in fact, that is very unlikely), so you can't
just 'grep' the list: the list will, however, give you the starting
point of each kernel function, so by looking for the function that
has a starting address lower than the one you are searching for but
is followed by a function with a higher address you will find the one
you want. In fact, it may be a good idea to include a bit of
"context" in your problem report, giving a few lines around the
interesting one.
If you for some reason cannot do the above (you have a pre-compiled
kernel image or similar), telling me as much about your setup as
possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst <reportingbugs>`
document for details.
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
cannot change values or set break points.) To do this, first compile the
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
You can now use all the usual gdb commands. The command to look up the
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
with the EIP value.)
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
disregards the starting offset for which the kernel is compiled.

View File

@ -0,0 +1,151 @@
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
=====================================================================
This Kernel feature allows you to invoke almost (for restrictions see below)
every program by simply typing its name in the shell.
This includes for example compiled Java(TM), Python or Emacs programs.
To achieve this you must tell binfmt_misc which interpreter has to be invoked
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
at the beginning of the file with a magic byte sequence (masking out specified
bits) you have supplied. Binfmt_misc can also recognise a filename extension
aka ``.com`` or ``.exe``.
First you must mount binfmt_misc::
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
To actually register a new binary type, you have to set up a string looking like
``:name:type:offset:magic:mask:interpreter:flags`` (where you can choose the
``:`` upon your needs) and echo it to ``/proc/sys/fs/binfmt_misc/register``.
Here is what the fields mean:
- ``name``
is an identifier string. A new /proc file will be created with this
``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
obvious reasons.
- ``type``
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
- ``offset``
is the offset of the magic/mask in the file, counted in bytes. This
defaults to 0 if you omit it (i.e. you write ``:name:type::magic...``).
Ignored when using filename extension matching.
- ``magic``
is the byte sequence binfmt_misc is matching for. The magic string
may contain hex-encoded characters like ``\x0a`` or ``\xA4``. Note that you
must escape any NUL bytes; parsing halts at the first one. In a shell
environment you might have to write ``\\x0a`` to prevent the shell from
eating your ``\``.
If you chose filename extension matching, this is the extension to be
recognised (without the ``.``, the ``\x0a`` specials are not allowed).
Extension matching is case sensitive, and slashes ``/`` are not allowed!
- ``mask``
is an (optional, defaults to all 0xff) mask. You can mask out some
bits from matching by supplying a string like magic and as long as magic.
The mask is anded with the byte sequence of the file. Note that you must
escape any NUL bytes; parsing halts at the first one. Ignored when using
filename extension matching.
- ``interpreter``
is the program that should be invoked with the binary as first
argument (specify the full path)
- ``flags``
is an optional field that controls several aspects of the invocation
of the interpreter. It is a string of capital letters, each controls a
certain aspect. The following flags are supported:
``P`` - preserve-argv[0]
Legacy behavior of binfmt_misc is to overwrite
the original argv[0] with the full path to the binary. When this
flag is included, binfmt_misc will add an argument to the argument
vector for this purpose, thus preserving the original ``argv[0]``.
e.g. If your interp is set to ``/bin/foo`` and you run ``blah``
(which is in ``/usr/local/bin``), then the kernel will execute
``/bin/foo`` with ``argv[]`` set to ``["/bin/foo", "/usr/local/bin/blah", "blah"]``. The interp has to be aware of this so it can
execute ``/usr/local/bin/blah``
with ``argv[]`` set to ``["blah"]``.
``O`` - open-binary
Legacy behavior of binfmt_misc is to pass the full path
of the binary to the interpreter as an argument. When this flag is
included, binfmt_misc will open the file for reading and pass its
descriptor as an argument, instead of the full path, thus allowing
the interpreter to execute non-readable binaries. This feature
should be used with care - the interpreter has to be trusted not to
emit the contents of the non-readable binary.
``C`` - credentials
Currently, the behavior of binfmt_misc is to calculate
the credentials and security token of the new process according to
the interpreter. When this flag is included, these attributes are
calculated according to the binary. It also implies the ``O`` flag.
This feature should be used with care as the interpreter
will run with root permissions when a setuid binary owned by root
is run with binfmt_misc.
``F`` - fix binary
The usual behaviour of binfmt_misc is to spawn the
binary lazily when the misc format file is invoked. However,
this doesn``t work very well in the face of mount namespaces and
changeroots, so the ``F`` mode opens the binary as soon as the
emulation is installed and uses the opened image to spawn the
emulator, meaning it is always available once installed,
regardless of how the environment changes.
There are some restrictions:
- the whole register string may not exceed 1920 characters
- the magic must reside in the first 128 bytes of the file, i.e.
offset+size(magic) has to be less than 128
- the interpreter string may not exceed 127 characters
To use binfmt_misc you have to mount it first. You can mount it with
``mount -t binfmt_misc none /proc/sys/fs/binfmt_misc`` command, or you can add
a line ``none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0`` to your
``/etc/fstab`` so it auto mounts on boot.
You may want to add the binary formats in one of your ``/etc/rc`` scripts during
boot-up. Read the manual of your init program to figure out how to do this
right.
Think about the order of adding entries! Later added entries are matched first!
A few examples (assumed you are in ``/proc/sys/fs/binfmt_misc``):
- enable support for em86 (like binfmt_em86, for Alpha AXP only)::
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
- enable support for packed DOS applications (pre-configured dosemu hdimages)::
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
- enable support for Windows executables using wine::
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
For java support see Documentation/admin-guide/java.rst
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
or 1 (to enable) to ``/proc/sys/fs/binfmt_misc/status`` or
``/proc/.../the_name``.
Catting the file tells you the current status of ``binfmt_misc/the_entry``.
You can remove one entry or all entries by echoing -1 to ``/proc/.../the_name``
or ``/proc/sys/fs/binfmt_misc/status``.
Hints
-----
If you want to pass special arguments to your interpreter, you can
write a wrapper script for it. See Documentation/admin-guide/java.rst for an
example.
Your interpreter should NOT look in the PATH for the filename; the kernel
passes it the full filename (or the file descriptor) to use. Using ``$PATH`` can
cause unexpected behaviour and can be a security hazard.
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>

View File

@ -0,0 +1,38 @@
Linux Braille Console
=====================
To get early boot messages on a braille device (before userspace screen
readers can start), you first need to compile the support for the usual serial
console (see :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), and
for braille device
(in :menuselection:`Device Drivers --> Accessibility support --> Console on braille device`).
Then you need to specify a ``console=brl``, option on the kernel command line, the
format is::
console=brl,serial_options...
where ``serial_options...`` are the same as described in
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
So for instance you can use ``console=brl,ttyS0`` if the braille device is connected to the first serial port, and ``console=brl,ttyS0,115200`` to
override the baud rate to 115200, etc.
By default, the braille device will just show the last kernel message (console
mode). To review previous messages, press the Insert key to switch to the VT
review mode. In review mode, the arrow keys permit to browse in the VT content,
:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and
the :kbd:`HOME` key goes back
to the cursor, hence providing very basic screen reviewing facility.
Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel
parameter.
For simplicity, only one braille console can be enabled, other uses of
``console=brl,...`` will be discarded. Also note that it does not interfere with
the console selection mechanism described in
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
For now, only the VisioBraille device is supported.
Samuel Thibault <samuel.thibault@ens-lyon.org>

View File

@ -0,0 +1,76 @@
Bisecting a bug
+++++++++++++++
Last updated: 28 October 2016
Introduction
============
Always try the latest kernel from kernel.org and build from source. If you are
not confident in doing that please report the bug to your distribution vendor
instead of to a kernel developer.
Finding bugs is not always easy. Have a go though. If you can't find it don't
give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.
Before you submit a bug report read
:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
Devices not appearing
=====================
Often this is caused by udev/systemd. Check that first before blaming it
on the kernel.
Finding patch that caused a bug
===============================
Using the provided tools with ``git`` makes finding bugs easy provided the bug
is reproducible.
Steps to do it:
- build the Kernel from its git source
- start bisect with [#f1]_::
$ git bisect start
- mark the broken changeset with::
$ git bisect bad [commit]
- mark a changeset where the code is known to work with::
$ git bisect good [commit]
- rebuild the Kernel and test
- interact with git bisect by using either::
$ git bisect good
or::
$ git bisect bad
depending if the bug happened on the changeset you're testing
- After some interactions, git bisect will give you the changeset that
likely caused the bug.
- For example, if you know that the current version is bad, and version
4.8 is good, you could do::
$ git bisect start
$ git bisect bad # Current version is bad
$ git bisect good v4.8
.. [#f1] You can, optionally, provide both good and bad arguments at git
start with ``git bisect start [BAD] [GOOD]``
For further references, please read:
- The man page for ``git-bisect``
- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_

View File

@ -0,0 +1,369 @@
Bug hunting
===========
Kernel bug reports often come with a stack dump like the one below::
------------[ cut here ]------------
WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1
Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
Call Trace:
[<c12ba080>] ? dump_stack+0x44/0x64
[<c103ed6a>] ? __warn+0xfa/0x120
[<c109e8a7>] ? module_put+0x57/0x70
[<c109e8a7>] ? module_put+0x57/0x70
[<c103ee33>] ? warn_slowpath_null+0x23/0x30
[<c109e8a7>] ? module_put+0x57/0x70
[<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
[<c109f617>] ? symbol_put_addr+0x27/0x50
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
[<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
[<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
[<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
[<c13d2882>] ? usb_unbind_interface+0x62/0x250
[<c136b514>] ? __pm_runtime_idle+0x44/0x70
[<c13620d8>] ? __device_release_driver+0x78/0x120
[<c1362907>] ? driver_detach+0x87/0x90
[<c1361c48>] ? bus_remove_driver+0x38/0x90
[<c13d1c18>] ? usb_deregister+0x58/0xb0
[<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
[<c1055654>] ? task_work_run+0x64/0x80
[<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
[<c10013f0>] ? do_fast_syscall_32+0x80/0x130
[<c1549f43>] ? sysenter_past_esp+0x40/0x6a
---[ end trace 6ebc60ef3981792f ]---
Such stack traces provide enough information to identify the line inside the
Kernel's source code where the bug happened. Depending on the severity of
the issue, it may also contain the word **Oops**, as on this one::
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c06969d4>] iret_exc+0x7d0/0xa59
*pdpt = 000000002258a001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
...
Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter,
we'll refer to "Oops" for all kinds of stack traces that need to be analized.
.. note::
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
format (from ``dmesg``, etc). Ignore any references in this or other docs to
"decoding the Oops" or "running it through ksymoops".
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
people will just tell you to repost it.
Where is the Oops message is located?
-------------------------------------
Normally the Oops text is read from the kernel buffers by klogd and
handed to ``syslogd`` which writes it to a syslog file, typically
``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
systemd, it may also be stored by the ``journald`` daemon, and accessed
by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options:
(1) Hand copy the text from the screen and type it in after the machine
has restarted. Messy but it is the only option if you have not
planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you
may find that booting with a higher resolution (eg, ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
so won't help for 'early' oopses)
(2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
run a null modem to a second machine and capture the output there
using your favourite communication program. Minicom works well.
(3) Use Kdump (see Documentation/kdump/kdump.txt),
extract the kernel ring buffer from old memory with using dmesg
gdbmacro in Documentation/kdump/gdbmacros.txt.
Finding the bug's location
--------------------------
Reporting a bug works best if you point the location of the bug at the
Kernel source file. There are two methods for doing that. Usually, using
``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
gdb
^^^
The GNU debug (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
This can be set by running::
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
EIP value from the OOPS::
EIP: 0060:[<c021e50e>] Not tainted VLI
And use GDB to translate that to human-readable form::
$ gdb vmlinux
(gdb) l *0xc021e50e
If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
offset from the OOPS::
EIP is at vt_ioctl+0xda8/0x1482
And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
$ make vmlinux
$ gdb vmlinux
(gdb) l *vt_ioctl+0xda8
0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
288 {
289 struct vc_data *vc = NULL;
290 int ret = 0;
291
292 console_lock();
293 if (VT_BUSY(vc_num))
294 ret = -EBUSY;
295 else if (vc_num)
296 vc = vc_deallocate(vc_num);
297 console_unlock();
or, if you want to be more verbose::
(gdb) p vt_ioctl
$1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
(gdb) l *0xae0+0xda8
You could, instead, use the object file::
$ make drivers/tty/
$ gdb drivers/tty/vt/vt_ioctl.o
(gdb) l *vt_ioctl+0xda8
If you have a call trace, such as::
Call Trace:
[<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
[<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
...
this shows the problem likely in the :jbd: module. You can load that module
in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko
(gdb) l *log_wait_commit+0xa3
.. note::
You can also do the same for any function call at the stack trace,
like this one::
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
The position where the above call happened can be seen with::
$ gdb drivers/media/usb/dvb-usb/dvb-usb.o
(gdb) l *dvb_usb_adapter_frontend_exit+0x3a
objdump
^^^^^^^
To debug a kernel, use objdump and look for the hex offset from the crash
output to find the valid line of code/assembler. Without debug symbols, you
will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example::
$ objdump -r -S -l --disassemble net/dccp/ipv4.o
.. note::
You need to be at the top level of the kernel tree for this to pick up
your C files.
If you don't have access to the code you can also debug on some crash dumps
e.g. crash dump output as shown by Dave Miller::
EIP is at +0x14/0x4c0
...
Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
<8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
Put the bytes into a "foo.s" file like this:
.text
.globl foo
foo:
.byte .... /* bytes from Code: part of OOPS dump */
Compile it with "gcc -c -o foo.o foo.s" then look at the output of
"objdump --disassemble foo.o".
Output:
ip_queue_xmit:
push %ebp
push %edi
push %esi
push %ebx
sub $0xbc, %esp
mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
Reporting the bug
-----------------
Once you find where the bug happened, by inspecting its location,
you could either try to fix it yourself or report it upstream.
In order to report it upstream, you should identify the mailing list
used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's conex.c file, you can get
their maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
linux-kernel@vger.kernel.org (open list)
Please notice that it will point to:
- The last developers that touched on the source code. On the above example,
Tejun and Bhaktipriya (in this specific case, none really envolved on the
development of this file);
- The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab)
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to
linux-kernel@vger.kernel.org.
Thanks for your help in making Linux as stable as humanly possible.
Fixing the bug
--------------
If you know programming, you could help us by not only reporting the bug,
but also providing us with a solution. After all open source is about
sharing what you do and don't you want to be recognised for your genius?
If you decide to take this way, once you have worked out a fix please submit
it upstream.
Please do read
ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
to help your code get accepted.
---------------------------------------------------------------------------
Notes on Oops tracing with ``klogd``
------------------------------------
In order to help Linus and the other kernel developers there has been
substantial support incorporated into ``klogd`` for processing protection
faults. In order to have full support for address resolution at least
version 1.3-pl3 of the ``sysklogd`` package should be used.
When a protection fault occurs the ``klogd`` daemon automatically
translates important addresses in the kernel log messages to their
symbolic equivalents. This translated kernel message is then
forwarded through whatever reporting mechanism ``klogd`` is using. The
protection fault message can be simply cut out of the message files
and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is
static translation and the second is dynamic translation. Static
translation uses the System.map file in much the same manner that
ksymoops does. In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map
files.
Dynamic address translation is important when kernel loadable modules
are being used. Since memory for kernel modules is allocated from the
kernel's dynamic memory pools there are no fixed locations for either
the start of the module or for functions and symbols in the module.
The kernel supports system calls which allow a program to determine
which modules are loaded and their location in memory. Using these
system calls the klogd daemon builds a symbol table which can be used
to debug a protection fault which occurs in a loadable kernel module.
At the very minimum klogd will provide the name of the module which
generated the protection fault. There may be additional symbolic
information available if the developer of the loadable module chose to
export symbol information from the module.
Since the kernel module environment can be dynamic there must be a
mechanism for notifying the ``klogd`` daemon when a change in module
environment occurs. There are command line options available which
allow klogd to signal the currently executing daemon that symbol
information should be refreshed. See the ``klogd`` manual page for more
information.
A patch is included with the sysklogd distribution which modifies the
``modules-2.0.0`` package to automatically signal klogd whenever a module
is loaded or unloaded. Applying this patch provides essentially
seamless support for debugging protection faults which occur with
kernel loadable modules.
The following is an example of a protection fault in a loadable module
processed by ``klogd``::
Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
Aug 29 09:51:01 blizard kernel: *pde = 00000000
Aug 29 09:51:01 blizard kernel: Oops: 0002
Aug 29 09:51:01 blizard kernel: CPU: 0
Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c
Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c
Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
---------------------------------------------------------------------------
::
Dr. G.W. Wettstein Oncology Research Div. Computing Facility
Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
820 4th St. N.
Fargo, ND 58122
Phone: 701-234-7556

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = 'Linux Kernel User Documentation'
tags.add("subproject")
latex_documents = [
('index', 'linux-user.tex', 'Linux Kernel User Documentation',
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,268 @@
Linux allocated devices (4.x+ version)
======================================
This list is the Linux Device List, the official registry of allocated
device numbers and ``/dev`` directory nodes for the Linux operating
system.
The LaTeX version of this document is no longer maintained, nor is
the document that used to reside at lanana.org. This version in the
mainline Linux kernel is the master document. Updates shall be sent
as patches to the kernel maintainers (see the
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
to involve for character and block devices.
This document is included by reference into the Filesystem Hierarchy
Standard (FHS). The FHS is available from http://www.pathname.com/fhs/.
Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
platform only. Allocations marked (68k/Atari) apply to Linux/68k on
the Atari platform only.
This document is in the public domain. The authors requests, however,
that semantically altered versions are not distributed without
permission of the authors, assuming the authors can be contacted without
an unreasonable effort.
.. attention::
DEVICE DRIVERS AUTHORS PLEASE READ THIS
Linux now has extensive support for dynamic allocation of device numbering
and can use ``sysfs`` and ``udev`` (``systemd``) to handle the naming needs.
There are still some exceptions in the serial and boot device area. Before
asking for a device number make sure you actually need one.
To have a major number allocated, or a minor number in situations
where that applies (e.g. busmice), please submit a patch and send to
the authors as indicated above.
Keep the description of the device *in the same format
as this list*. The reason for this is that it is the only way we have
found to ensure we have all the requisite information to publish your
device and avoid conflicts.
Finally, sometimes we have to play "namespace police." Please don't be
offended. We often get submissions for ``/dev`` names that would be bound
to cause conflicts down the road. We are trying to avoid getting in a
situation where we would have to suffer an incompatible forward
change. Therefore, please consult with us **before** you make your
device names and numbers in any way public, at least to the point
where it would be at all difficult to get them changed.
Your cooperation is appreciated.
.. include:: devices.txt
:literal:
Additional ``/dev/`` directory entries
--------------------------------------
This section details additional entries that should or may exist in
the /dev directory. It is preferred that symbolic links use the same
form (absolute or relative) as is indicated here. Links are
classified as "hard" or "symbolic" depending on the preferred type of
link; if possible, the indicated type of link should be used.
Compulsory links
++++++++++++++++
These links should exist on all systems:
=============== =============== =============== ===============================
/dev/fd /proc/self/fd symbolic File descriptors
/dev/stdin fd/0 symbolic stdin file descriptor
/dev/stdout fd/1 symbolic stdout file descriptor
/dev/stderr fd/2 symbolic stderr file descriptor
/dev/nfsd socksys symbolic Required by iBCS-2
/dev/X0R null symbolic Required by iBCS-2
=============== =============== =============== ===============================
Note: ``/dev/X0R`` is <letter X>-<digit 0>-<letter R>.
Recommended links
+++++++++++++++++
It is recommended that these links exist on all systems:
=============== =============== =============== ===============================
/dev/core /proc/kcore symbolic Backward compatibility
/dev/ramdisk ram0 symbolic Backward compatibility
/dev/ftape qft0 symbolic Backward compatibility
/dev/bttv0 video0 symbolic Backward compatibility
/dev/radio radio0 symbolic Backward compatibility
/dev/i2o* /dev/i2o/* symbolic Backward compatibility
/dev/scd? sr? hard Alternate SCSI CD-ROM name
=============== =============== =============== ===============================
Locally defined links
+++++++++++++++++++++
The following links may be established locally to conform to the
configuration of the system. This is merely a tabulation of existing
practice, and does not constitute a recommendation. However, if they
exist, they should have the following uses.
=============== =============== =============== ===============================
/dev/mouse mouse port symbolic Current mouse device
/dev/tape tape device symbolic Current tape device
/dev/cdrom CD-ROM device symbolic Current CD-ROM device
/dev/cdwriter CD-writer symbolic Current CD-writer device
/dev/scanner scanner symbolic Current scanner device
/dev/modem modem port symbolic Current dialout device
/dev/root root device symbolic Current root filesystem
/dev/swap swap device symbolic Current swap device
=============== =============== =============== ===============================
``/dev/modem`` should not be used for a modem which supports dialin as
well as dialout, as it tends to cause lock file problems. If it
exists, ``/dev/modem`` should point to the appropriate primary TTY device
(the use of the alternate callout devices is deprecated).
For SCSI devices, ``/dev/tape`` and ``/dev/cdrom`` should point to the
*cooked* devices (``/dev/st*`` and ``/dev/sr*``, respectively), whereas
``/dev/cdwriter`` and /dev/scanner should point to the appropriate generic
SCSI devices (/dev/sg*).
``/dev/mouse`` may point to a primary serial TTY device, a hardware mouse
device, or a socket for a mouse driver program (e.g. ``/dev/gpmdata``).
Sockets and pipes
+++++++++++++++++
Non-transient sockets and named pipes may exist in /dev. Common entries are:
=============== =============== ===============================================
/dev/printer socket lpd local socket
/dev/log socket syslog local socket
/dev/gpmdata socket gpm mouse multiplexer
=============== =============== ===============================================
Mount points
++++++++++++
The following names are reserved for mounting special filesystems
under /dev. These special filesystems provide kernel interfaces that
cannot be provided with standard device nodes.
=============== =============== ===============================================
/dev/pts devpts PTY slave filesystem
/dev/shm tmpfs POSIX shared memory maintenance access
=============== =============== ===============================================
Terminal devices
----------------
Terminal, or TTY devices are a special class of character devices. A
terminal device is any device that could act as a controlling terminal
for a session; this includes virtual consoles, serial ports, and
pseudoterminals (PTYs).
All terminal devices share a common set of capabilities known as line
disciplines; these include the common terminal line discipline as well
as SLIP and PPP modes.
All terminal devices are named similarly; this section explains the
naming and use of the various types of TTYs. Note that the naming
conventions include several historical warts; some of these are
Linux-specific, some were inherited from other systems, and some
reflect Linux outgrowing a borrowed convention.
A hash mark (``#``) in a device name is used here to indicate a decimal
number without leading zeroes.
Virtual consoles and the console device
+++++++++++++++++++++++++++++++++++++++
Virtual consoles are full-screen terminal displays on the system video
monitor. Virtual consoles are named ``/dev/tty#``, with numbering
starting at ``/dev/tty1``; ``/dev/tty0`` is the current virtual console.
``/dev/tty0`` is the device that should be used to access the system video
card on those architectures for which the frame buffer devices
(``/dev/fb*``) are not applicable. Do not use ``/dev/console``
for this purpose.
The console device, ``/dev/console``, is the device to which system
messages should be sent, and on which logins should be permitted in
single-user mode. Starting with Linux 2.1.71, ``/dev/console`` is managed
by the kernel; for previous versions it should be a symbolic link to
either ``/dev/tty0``, a specific virtual console such as ``/dev/tty1``, or to
a serial port primary (``tty*``, not ``cu*``) device, depending on the
configuration of the system.
Serial ports
++++++++++++
Serial ports are RS-232 serial ports and any device which simulates
one, either in hardware (such as internal modems) or in software (such
as the ISDN driver.) Under Linux, each serial ports has two device
names, the primary or callin device and the alternate or callout one.
Each kind of device is indicated by a different letter. For any
letter X, the names of the devices are ``/dev/ttyX#`` and ``/dev/cux#``,
respectively; for historical reasons, ``/dev/ttyS#`` and ``/dev/ttyC#``
correspond to ``/dev/cua#`` and ``/dev/cub#``. In the future, it should be
expected that multiple letters will be used; all letters will be upper
case for the "tty" device (e.g. ``/dev/ttyDP#``) and lower case for the
"cu" device (e.g. ``/dev/cudp#``).
The names ``/dev/ttyQ#`` and ``/dev/cuq#`` are reserved for local use.
The alternate devices provide for kernel-based exclusion and somewhat
different defaults than the primary devices. Their main purpose is to
allow the use of serial ports with programs with no inherent or broken
support for serial ports. Their use is deprecated, and they may be
removed from a future version of Linux.
Arbitration of serial ports is provided by the use of lock files with
the names ``/var/lock/LCK..ttyX#``. The contents of the lock file should
be the PID of the locking process as an ASCII number.
It is common practice to install links such as /dev/modem
which point to serial ports. In order to ensure proper locking in the
presence of these links, it is recommended that software chase
symlinks and lock all possible names; additionally, it is recommended
that a lock file be installed with the corresponding alternate
device. In order to avoid deadlocks, it is recommended that the locks
are acquired in the following order, and released in the reverse:
1. The symbolic link name, if any (``/var/lock/LCK..modem``)
2. The "tty" name (``/var/lock/LCK..ttyS2``)
3. The alternate device name (``/var/lock/LCK..cua2``)
In the case of nested symbolic links, the lock files should be
installed in the order the symlinks are resolved.
Under no circumstances should an application hold a lock while waiting
for another to be released. In addition, applications which attempt
to create lock files for the corresponding alternate device names
should take into account the possibility of being used on a non-serial
port TTY, for which no alternate device would exist.
Pseudoterminals (PTYs)
++++++++++++++++++++++
Pseudoterminals, or PTYs, are used to create login sessions or provide
other capabilities requiring a TTY line discipline (including SLIP or
PPP capability) to arbitrary data-generation processes. Each PTY has
a master side, named ``/dev/pty[p-za-e][0-9a-f]``, and a slave side, named
``/dev/tty[p-za-e][0-9a-f]``. The kernel arbitrates the use of PTYs by
allowing each master side to be opened only once.
Once the master side has been opened, the corresponding slave device
can be used in the same manner as any TTY device. The master and
slave devices are connected by the kernel, generating the equivalent
of a bidirectional pipe with TTY capabilities.
Recent versions of the Linux kernels and GNU libc contain support for
the System V/Unix98 naming scheme for PTYs, which assigns a common
device, ``/dev/ptmx``, to all the masters (opening it will automatically
give you a previously unassigned PTY) and a subdirectory, ``/dev/pts``,
for the slaves; the slaves are named with decimal integers (``/dev/pts/#``
in our notation). This removes the problem of exhausting the
namespace and enables the kernel to automatically create the device
nodes for the slaves on demand using the "devpts" filesystem.

View File

@ -0,0 +1,353 @@
Dynamic debug
+++++++++++++
Introduction
============
This document describes how to use the dynamic debug (dyndbg) feature.
Dynamic debug is designed to allow you to dynamically enable/disable
kernel code to obtain additional kernel information. Currently, if
``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is build dynamically.
Dynamic debug has even more useful features:
* Simple query language allows turning on and off debugging
statements by matching any combination of 0 or 1 of:
- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string
* Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
which can be read to display the complete list of known debug
statements, to help guide you
Controlling dynamic debug Behaviour
===================================
The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
control file in the 'debugfs' filesystem. Thus, you must first mount
the debugfs filesystem, in order to make use of this feature.
Subsequently, we refer to the control file as:
``<debugfs>/dynamic_debug/control``. For example, if you want to enable
printing from source file ``svcsock.c``, line 1603 you simply do::
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
If you make a mistake with the syntax, the write will fail thus::
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
<debugfs>/dynamic_debug/control
-bash: echo: write error: Invalid argument
Viewing Dynamic Debug Behaviour
===============================
You can view the currently configured behaviour of all the debug
statements via::
nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
...
You can also apply standard Unix text manipulation filters to this
data, e.g.::
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
62
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
42
The third column shows the currently enabled flags for each debug
statement callsite (see below for definitions of the flags). The
default value, with no flags enabled, is ``=_``. So you can view all
the debug statement callsites with any non-default flags::
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
Command Language Reference
==========================
At the lexical level, a command comprises a sequence of words separated
by spaces or tabs. So these are all equivalent::
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
Command submissions are bounded by a write() system call.
Multiple commands can be written together, separated by ``;`` or ``\n``::
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
> <debugfs>/dynamic_debug/control
If your query set is big, you can batch them too::
~# cat query-batch-file > <debugfs>/dynamic_debug/control
A another way is to use wildcard. The match rule support ``*`` (matches
zero or more characters) and ``?`` (matches exactly one character).For
example, you can match all usb drivers::
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification::
command ::= match-spec* flags-spec
The match-spec's are used to choose a subset of the known pr_debug()
callsites to which to apply the flags-spec. Think of them as a query
with implicit ANDs between each pair. Note that an empty list of
match-specs will select all debug statement callsites.
A match specification comprises a keyword, which controls the
attribute of the callsite to be compared, and a value to compare
against. Possible keywords are:::
match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
'line' line-range
line-range ::= lineno |
'-'lineno |
lineno'-' |
lineno'-'lineno
lineno ::= unsigned-int
.. note::
``line-range`` cannot contain space, e.g.
"1-30" is valid range but "1 - 30" is not.
The meanings of each keyword are:
func
The given string is compared against the function name
of each callsite. Example::
func svc_tcp_accept
file
The given string is compared against either the full pathname, the
src-root relative pathname, or the basename of the source file of
each callsite. Examples::
file svcsock.c
file kernel/freezer.c
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
module
The given string is compared against the module name
of each callsite. The module name is the string as
seen in ``lsmod``, i.e. without the directory or the ``.ko``
suffix and with ``-`` changed to ``_``. Examples::
module sunrpc
module nfsd
format
The given string is searched for in the dynamic debug format
string. Note that the string does not need to match the
entire format, only some part. Whitespace and other
special characters can be escaped using C octal character
escape ``\ooo`` notation, e.g. the space character is ``\040``.
Alternatively, the string can be enclosed in double quote
characters (``"``) or single quote characters (``'``).
Examples::
format svcrdma: // many of the NFS/RDMA server pr_debugs
format readahead // some pr_debugs in the readahead cache
format nfsd:\040SETATTR // one way to match a format with whitespace
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
line
The given line number or range of line numbers is compared
against the line number of each ``pr_debug()`` callsite. A single
line number matches the callsite line number exactly. A
range of line numbers matches any callsite between the first
and last line number inclusive. An empty first number means
the first line in the file, an empty line number means the
last number in the file. Examples::
line 1603 // exactly line 1603
line 1600-1605 // the six lines from line 1600 to line 1605
line -1605 // the 1605 lines from line 1 to line 1605
line 1600- // all lines from line 1600 to the end of the file
The flags specification comprises a change operation followed
by one or more flag characters. The change operation is one
of the characters::
- remove the given flags
+ add the given flags
= set the flags to the given flags
The flags are::
p enables the pr_debug() callsite.
f Include the function name in the printed message
l Include line number in the printed message
m Include module name in the printed message
t Include thread ID in messages not generated from interrupt context
_ No flags are set. (Or'd with others on input)
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
have meaning, other flags ignored.
For display, the flags are preceded by ``=``
(mnemonic: what the flags are currently equal to).
Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
To clear all flags at once, use ``=_`` or ``-flmpt``.
Debug messages during Boot Process
==================================
To activate debug messages for core code and built-in modules during
the boot process, even before userspace and debugfs exists, use
``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"``
(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows
the syntax described above, but must not exceed 1023 characters. Your
bootloader may impose lower limits.
These ``dyndbg`` params are processed just after the ddebug tables are
processed, as part of the arch_initcall. Thus you can enable debug
messages in all code run after this arch_initcall via this boot
parameter.
On an x86 system for example ACPI enablement is a subsys_initcall and::
dyndbg="file ec.c +p"
will show early Embedded Controller transactions during ACPI setup if
your machine (typically a laptop) has an Embedded Controller.
PCI (or other devices) initialization also is a hot candidate for using
this boot parameter for debugging purposes.
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
boot time, without effect, but will be reprocessed when module is
loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at
boot.
Debug Messages at Module Initialization Time
============================================
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
``foo.params``, strips ``foo.``, and passes them to the kernel along with
params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
in the following order:
1. parameters given via ``/etc/modprobe.d/*.conf``::
options foo dyndbg=+pt
options foo dyndbg # defaults to +p
2. ``foo.dyndbg`` as given in boot args, ``foo.`` is stripped and passed::
foo.dyndbg=" func bar +p; func buz +mp"
3. args to modprobe::
modprobe foo dyndbg==pmf # override previous settings
These ``dyndbg`` queries are applied in order, with last having final say.
This allows boot args to override or modify those from ``/etc/modprobe.d``
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
modprobe args to override both.
In the ``foo.dyndbg="QUERY"`` form, the query must exclude ``module foo``.
``foo`` is extracted from the param-name, and applied to each query in
``QUERY``, and only 1 match-spec of each type is allowed.
The ``dyndbg`` option is a "fake" module parameter, which means:
- modules do not need to define it explicitly
- every module gets it tacitly, whether they use pr_debug or not
- it doesn't appear in ``/sys/module/$module/parameters/``
To see it, grep the control file, or inspect ``/proc/cmdline.``
For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
the sysfs interface if the debug messages are no longer needed::
echo "module module_name -p" > <debugfs>/dynamic_debug/control
Examples
========
::
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
<debugfs>/dynamic_debug/control
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
<debugfs>/dynamic_debug/control
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
<debugfs>/dynamic_debug/control
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
<debugfs>/dynamic_debug/control
// enable messages in files of which the paths include string "usb"
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
// enable all messages
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
// add module, function to all enabled messages
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
// boot-args example, with newlines and comments for readability
Kernel command line: ...
// see whats going on in dyndbg=value processing
dynamic_debug.verbose=1
// enable pr_debugs in 2 builtins, #cmt is stripped
dyndbg="module params +p #cmt ; module sys +p"
// enable pr_debugs in 2 functions in a module loaded later
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"

View File

@ -0,0 +1,68 @@
The Linux kernel user's and administrator's guide
=================================================
The following is a collection of user-oriented documents that have been
added to the kernel over time. There is, as yet, little overall order or
organization here — this material was not written to be a single, coherent
document! With luck things will improve quickly over time.
This initial section contains overall information, including the README
file describing the kernel as a whole, documentation on kernel parameters,
etc.
.. toctree::
:maxdepth: 1
README
kernel-parameters
devices
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.
.. toctree::
:maxdepth: 1
reporting-bugs
security-bugs
bug-hunting
bug-bisect
tainted-kernels
ramoops
dynamic-debug-howto
init
This is the beginning of a section with information of interest to
application developers. Documents covering various aspects of the kernel
ABI will be found here.
.. toctree::
:maxdepth: 1
sysfs-rules
The rest of this manual consists of various unordered guides on how to
configure specific aspects of kernel behavior to your liking.
.. toctree::
:maxdepth: 1
initrd
serial-console
braille-console
parport
md
module-signing
sysrq
unicode
vga-softcursor
binfmt-misc
mono
java
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@ -5,6 +5,7 @@ OK, so you've got this pretty unintuitive message (currently located
in init/main.c) and are wondering what the H*** went wrong.
Some high-level reasons for failure (listed roughly in order of execution)
to load the init binary are:
A) Unable to mount root FS
B) init binary doesn't exist on rootfs
C) broken console device
@ -12,37 +13,39 @@ D) binary exists but dependencies not available
E) binary cannot be loaded
Detailed explanations:
0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
to get more detailed kernel messages.
A) make sure you have the correct root FS type
(and root= kernel parameter points to the correct partition),
B) make sure you have the correct root FS type
(and ``root=`` kernel parameter points to the correct partition),
required drivers such as storage hardware (such as SCSI or USB!)
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
to be pre-loaded by an initrd)
C) Possibly a conflict in console= setup --> initial console unavailable.
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
missing interrupt-based configuration).
Try using a different console= device or e.g. netconsole= .
Try using a different ``console= device`` or e.g. ``netconsole=``.
D) e.g. required library dependencies of the init binary such as
/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
to find out which libraries are required.
``/lib/ld-linux.so.2`` missing or broken. Use
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
E) make sure the binary's architecture matches your hardware.
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
In case you tried loading a non-binary file here (shell script?),
you should make sure that the script specifies an interpreter in its shebang
header line (#!/...) that is fully working (including its library
header line (``#!/...``) that is fully working (including its library
dependencies). And before tackling scripts, better first test a simple
non-script binary such as /bin/sh and confirm its successful execution.
To find out more, add code to init/main.c to display kernel_execve()s
non-script binary such as ``/bin/sh`` and confirm its successful execution.
To find out more, add code ``to init/main.c`` to display kernel_execve()s
return values.
Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step
which needs to be made as painless as possible), then submit patch to LKML.
Further TODOs:
- Implement the various run_init_process() invocations via a struct array
which can then store the kernel_execve() result value and on failure
log it all by iterating over _all_ results (very important usability fix).
- Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix).
- try to make the implementation itself more helpful in general,
e.g. by providing additional error messages at affected places.

View File

@ -2,7 +2,7 @@ Using the initial RAM disk (initrd)
===================================
Written 1996,2000 by Werner Almesberger <werner.almesberger@epfl.ch> and
Hans Lermen <lermen@fgan.de>
Hans Lermen <lermen@fgan.de>
initrd provides the capability to load a RAM disk by the boot loader.
@ -16,7 +16,7 @@ where the kernel comes up with a minimum set of compiled-in drivers, and
where additional modules are loaded from initrd.
This document gives a brief overview of the use of initrd. A more detailed
discussion of the boot process can be found in [1].
discussion of the boot process can be found in [#f1]_.
Operation
@ -27,10 +27,10 @@ When using initrd, the system typically boots as follows:
1) the boot loader loads the kernel and the initial RAM disk
2) the kernel converts initrd into a "normal" RAM disk and
frees the memory used by initrd
3) if the root device is not /dev/ram0, the old (deprecated)
3) if the root device is not ``/dev/ram0``, the old (deprecated)
change_root procedure is followed. see the "Obsolete root change
mechanism" section below.
4) root device is mounted. if it is /dev/ram0, the initrd image is
4) root device is mounted. if it is ``/dev/ram0``, the initrd image is
then mounted as root
5) /sbin/init is executed (this can be any valid executable, including
shell scripts; it is run with uid 0 and can do basically everything
@ -38,7 +38,7 @@ When using initrd, the system typically boots as follows:
6) init mounts the "real" root file system
7) init places the root file system at the root directory using the
pivot_root system call
8) init execs the /sbin/init on the new root filesystem, performing
8) init execs the ``/sbin/init`` on the new root filesystem, performing
the usual boot sequence
9) the initrd file system is removed
@ -51,7 +51,7 @@ be accessible.
Boot command-line options
-------------------------
initrd adds the following new options:
initrd adds the following new options::
initrd=<path> (e.g. LOADLIN)
@ -83,36 +83,36 @@ Recent kernels have support for populating a ramdisk from a compressed cpio
archive. On such systems, the creation of a ramdisk image doesn't need to
involve special block devices or loopbacks; you merely create a directory on
disk with the desired initrd content, cd to that directory, and run (as an
example):
example)::
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
Examining the contents of an existing image file is just as simple:
Examining the contents of an existing image file is just as simple::
mkdir /tmp/imagefile
cd /tmp/imagefile
gzip -cd /boot/imagefile.img | cpio -imd --quiet
mkdir /tmp/imagefile
cd /tmp/imagefile
gzip -cd /boot/imagefile.img | cpio -imd --quiet
Installation
------------
First, a directory for the initrd file system has to be created on the
"normal" root file system, e.g.
"normal" root file system, e.g.::
# mkdir /initrd
# mkdir /initrd
The name is not relevant. More details can be found on the pivot_root(2)
man page.
The name is not relevant. More details can be found on the
:manpage:`pivot_root(2)` man page.
If the root file system is created during the boot procedure (i.e. if
you're building an install floppy), the root file system creation
procedure should create the /initrd directory.
procedure should create the ``/initrd`` directory.
If initrd will not be mounted in some cases, its content is still
accessible if the following device has been created:
accessible if the following device has been created::
# mknod /dev/initrd b 1 250
# chmod 400 /dev/initrd
# mknod /dev/initrd b 1 250
# chmod 400 /dev/initrd
Second, the kernel has to be compiled with RAM disk support and with
support for the initial RAM disk enabled. Also, at least all components
@ -131,60 +131,76 @@ kernels, at least three types of devices are suitable for that:
We'll describe the loopback device method:
1) make sure loopback block devices are configured into the kernel
2) create an empty file system of the appropriate size, e.g.
# dd if=/dev/zero of=initrd bs=300k count=1
# mke2fs -F -m0 initrd
2) create an empty file system of the appropriate size, e.g.::
# dd if=/dev/zero of=initrd bs=300k count=1
# mke2fs -F -m0 initrd
(if space is critical, you may want to use the Minix FS instead of Ext2)
3) mount the file system, e.g.
# mount -t ext2 -o loop initrd /mnt
4) create the console device:
3) mount the file system, e.g.::
# mount -t ext2 -o loop initrd /mnt
4) create the console device::
# mkdir /mnt/dev
# mknod /mnt/dev/console c 5 1
5) copy all the files that are needed to properly use the initrd
environment. Don't forget the most important file, /sbin/init
Note that /sbin/init's permissions must include "x" (execute).
environment. Don't forget the most important file, ``/sbin/init``
.. note:: ``/sbin/init`` permissions must include "x" (execute).
6) correct operation the initrd environment can frequently be tested
even without rebooting with the command
# chroot /mnt /sbin/init
even without rebooting with the command::
# chroot /mnt /sbin/init
This is of course limited to initrds that do not interfere with the
general system state (e.g. by reconfiguring network interfaces,
overwriting mounted devices, trying to start already running demons,
etc. Note however that it is usually possible to use pivot_root in
such a chroot'ed initrd environment.)
7) unmount the file system
# umount /mnt
7) unmount the file system::
# umount /mnt
8) the initrd is now in the file "initrd". Optionally, it can now be
compressed
# gzip -9 initrd
compressed::
# gzip -9 initrd
For experimenting with initrd, you may want to take a rescue floppy and
only add a symbolic link from /sbin/init to /bin/sh. Alternatively, you
can try the experimental newlib environment [2] to create a small
only add a symbolic link from ``/sbin/init`` to ``/bin/sh``. Alternatively, you
can try the experimental newlib environment [#f2]_ to create a small
initrd.
Finally, you have to boot the kernel and load initrd. Almost all Linux
boot loaders support initrd. Since the boot process is still compatible
with an older mechanism, the following boot command line parameters
have to be given:
have to be given::
root=/dev/ram0 rw
(rw is only necessary if writing to the initrd file system.)
With LOADLIN, you simply execute
With LOADLIN, you simply execute::
LOADLIN <kernel> initrd=<disk_image>
e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
With LILO, you add the option INITRD=<path> to either the global section
or to the section of the respective kernel in /etc/lilo.conf, and pass
the options using APPEND, e.g.
e.g.::
LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
With LILO, you add the option ``INITRD=<path>`` to either the global section
or to the section of the respective kernel in ``/etc/lilo.conf``, and pass
the options using APPEND, e.g.::
image = /bzImage
initrd = /boot/initrd.gz
append = "root=/dev/ram0 rw"
and run /sbin/lilo
and run ``/sbin/lilo``
For other boot loaders, please refer to the respective documentation.
@ -204,33 +220,33 @@ The procedure involves the following steps:
- unmounting the initrd file system and de-allocating the RAM disk
Mounting the new root file system is easy: it just needs to be mounted on
a directory under the current root. Example:
a directory under the current root. Example::
# mkdir /new-root
# mount -o ro /dev/hda1 /new-root
# mkdir /new-root
# mount -o ro /dev/hda1 /new-root
The root change is accomplished with the pivot_root system call, which
is also available via the pivot_root utility (see pivot_root(8) man
page; pivot_root is distributed with util-linux version 2.10h or higher
[3]). pivot_root moves the current root to a directory under the new
is also available via the ``pivot_root`` utility (see :manpage:`pivot_root(8)`
man page; ``pivot_root`` is distributed with util-linux version 2.10h or higher
[#f3]_). ``pivot_root`` moves the current root to a directory under the new
root, and puts the new root at its place. The directory for the old root
must exist before calling pivot_root. Example:
must exist before calling ``pivot_root``. Example::
# cd /new-root
# mkdir initrd
# pivot_root . initrd
# cd /new-root
# mkdir initrd
# pivot_root . initrd
Now, the init process may still access the old root via its
executable, shared libraries, standard input/output/error, and its
current root directory. All these references are dropped by the
following command:
following command::
# exec chroot . what-follows <dev/console >dev/console 2>&1
# exec chroot . what-follows <dev/console >dev/console 2>&1
Where what-follows is a program under the new root, e.g. /sbin/init
Where what-follows is a program under the new root, e.g. ``/sbin/init``
If the new root file system will be used with udev and has no valid
/dev directory, udev must be initialized before invoking chroot in order
to provide /dev/console.
``/dev`` directory, udev must be initialized before invoking chroot in order
to provide ``/dev/console``.
Note: implementation details of pivot_root may change with time. In order
to ensure compatibility, the following points should be observed:
@ -244,13 +260,13 @@ to ensure compatibility, the following points should be observed:
- use relative paths for dev/console in the exec command
Now, the initrd can be unmounted and the memory allocated by the RAM
disk can be freed:
disk can be freed::
# umount /initrd
# blockdev --flushbufs /dev/ram0
# umount /initrd
# blockdev --flushbufs /dev/ram0
It is also possible to use initrd with an NFS-mounted root, see the
pivot_root(8) man page for details.
:manpage:`pivot_root(8)` man page for details.
Usage scenarios
@ -263,21 +279,21 @@ as follows:
1) system boots from floppy or other media with a minimal kernel
(e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and
loads initrd
2) /sbin/init determines what is needed to (1) mount the "real" root FS
2) ``/sbin/init`` determines what is needed to (1) mount the "real" root FS
(i.e. device type, device drivers, file system) and (2) the
distribution media (e.g. CD-ROM, network, tape, ...). This can be
done by asking the user, by auto-probing, or by using a hybrid
approach.
3) /sbin/init loads the necessary kernel modules
4) /sbin/init creates and populates the root file system (this doesn't
3) ``/sbin/init`` loads the necessary kernel modules
4) ``/sbin/init`` creates and populates the root file system (this doesn't
have to be a very usable system yet)
5) /sbin/init invokes pivot_root to change the root file system and
5) ``/sbin/init`` invokes ``pivot_root`` to change the root file system and
execs - via chroot - a program that continues the installation
6) the boot loader is installed
7) the boot loader is configured to load an initrd with the set of
modules that was used to bring up the system (e.g. /initrd can be
modules that was used to bring up the system (e.g. ``/initrd`` can be
modified, then unmounted, and finally, the image is written from
/dev/ram0 or /dev/rd/0 to a file)
``/dev/ram0`` or ``/dev/rd/0`` to a file)
8) now the system is bootable and additional installation tasks can be
performed
@ -290,7 +306,7 @@ different hardware configurations in a single administrative domain. In
such cases, it is desirable to generate only a small set of kernels
(ideally only one) and to keep the system-specific part of configuration
information as small as possible. In this case, a common initrd could be
generated with all the necessary modules. Then, only /sbin/init or a file
generated with all the necessary modules. Then, only ``/sbin/init`` or a file
read by it would have to be different.
A third scenario is more convenient recovery disks, because information
@ -301,9 +317,9 @@ auto-detection).
Last not least, CD-ROM distributors may use it for better installation
from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk
via initrd from CD; or by booting via a loader like LOADLIN or directly
via initrd from CD; or by booting via a loader like ``LOADLIN`` or directly
from the CD-ROM, and loading the RAM disk from CD without need of
floppies.
floppies.
Obsolete root change mechanism
@ -316,51 +332,52 @@ continued availability.
It works by mounting the "real" root device (i.e. the one set with rdev
in the kernel image or with root=... at the boot command line) as the
root file system when linuxrc exits. The initrd file system is then
unmounted, or, if it is still busy, moved to a directory /initrd, if
unmounted, or, if it is still busy, moved to a directory ``/initrd``, if
such a directory exists on the new root file system.
In order to use this mechanism, you do not have to specify the boot
command options root, init, or rw. (If specified, they will affect
the real root file system, not the initrd environment.)
If /proc is mounted, the "real" root device can be changed from within
linuxrc by writing the number of the new root FS device to the special
file /proc/sys/kernel/real-root-dev, e.g.
file /proc/sys/kernel/real-root-dev, e.g.::
# echo 0x301 >/proc/sys/kernel/real-root-dev
Note that the mechanism is incompatible with NFS and similar file
systems.
This old, deprecated mechanism is commonly called "change_root", while
the new, supported mechanism is called "pivot_root".
This old, deprecated mechanism is commonly called ``change_root``, while
the new, supported mechanism is called ``pivot_root``.
Mixed change_root and pivot_root mechanism
------------------------------------------
In case you did not want to use root=/dev/ram0 to trigger the pivot_root
mechanism, you may create both /linuxrc and /sbin/init in your initrd image.
In case you did not want to use ``root=/dev/ram0`` to trigger the pivot_root
mechanism, you may create both ``/linuxrc`` and ``/sbin/init`` in your initrd
image.
/linuxrc would contain only the following:
``/linuxrc`` would contain only the following::
#! /bin/sh
mount -n -t proc proc /proc
echo 0x0100 >/proc/sys/kernel/real-root-dev
umount -n /proc
#! /bin/sh
mount -n -t proc proc /proc
echo 0x0100 >/proc/sys/kernel/real-root-dev
umount -n /proc
Once linuxrc exited, the kernel would mount again your initrd as root,
this time executing /sbin/init. Again, it would be the duty of this init
to build the right environment (maybe using the root= device passed on
the cmdline) before the final execution of the real /sbin/init.
this time executing ``/sbin/init``. Again, it would be the duty of this init
to build the right environment (maybe using the ``root= device`` passed on
the cmdline) before the final execution of the real ``/sbin/init``.
Resources
---------
[1] Almesberger, Werner; "Booting Linux: The History and the Future"
.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future"
http://www.almesberger.net/cv/papers/ols2k-9.ps.gz
[2] newlib package (experimental), with initrd example
http://sources.redhat.com/newlib/
[3] util-linux: Miscellaneous utilities for Linux
http://www.kernel.org/pub/linux/utils/util-linux/
.. [#f2] newlib package (experimental), with initrd example
https://www.sourceware.org/newlib/
.. [#f3] util-linux: Miscellaneous utilities for Linux
https://www.kernel.org/pub/linux/utils/util-linux/

View File

@ -1,5 +1,5 @@
Java(tm) Binary Kernel Support for Linux v1.03
----------------------------------------------
Java(tm) Binary Kernel Support for Linux v1.03
----------------------------------------------
Linux beats them ALL! While all other OS's are TALKING about direct
support of Java Binaries in the OS, Linux is doing it!
@ -19,70 +19,82 @@ other program after you have done the following:
as the application itself).
2) You have to compile BINFMT_MISC either as a module or into
the kernel (CONFIG_BINFMT_MISC) and set it up properly.
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
If you choose to compile it as a module, you will have
to insert it manually with modprobe/insmod, as kmod
cannot easily be supported with binfmt_misc.
cannot easily be supported with binfmt_misc.
Read the file 'binfmt_misc.txt' in this directory to know
more about the configuration process.
3) Add the following configuration items to binfmt_misc
(you should really have read binfmt_misc.txt now):
support for Java applications:
(you should really have read ``binfmt_misc.txt`` now):
support for Java applications::
':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:'
support for executable Jar files:
support for executable Jar files::
':ExecutableJAR:E::jar::/usr/local/bin/jarwrapper:'
support for Java Applets:
support for Java Applets::
':Applet:E::html::/usr/bin/appletviewer:'
or the following, if you want to be more selective:
or the following, if you want to be more selective::
':Applet:M::<!--applet::/usr/bin/appletviewer:'
Of course you have to fix the path names. The path/file names given in this
document match the Debian 2.1 system. (i.e. jdk installed in /usr,
custom wrappers from this document in /usr/local)
document match the Debian 2.1 system. (i.e. jdk installed in ``/usr``,
custom wrappers from this document in ``/usr/local``)
Note, that for the more selective applet support you have to modify
existing html-files to contain <!--applet--> in the first line
('<' has to be the first character!) to let this work!
existing html-files to contain ``<!--applet-->`` in the first line
(``<`` has to be the first character!) to let this work!
For the compiled Java programs you need a wrapper script like the
following (this is because Java is broken in case of the filename
handling), again fix the path names, both in the script and in the
above given configuration string.
You, too, need the little program after the script. Compile like
gcc -O2 -o javaclassname javaclassname.c
and stick it to /usr/local/bin.
You, too, need the little program after the script. Compile like::
gcc -O2 -o javaclassname javaclassname.c
and stick it to ``/usr/local/bin``.
Both the javawrapper shellscript and the javaclassname program
were supplied by Colin J. Watson <cjw44@cam.ac.uk>.
====================== Cut here ===================
#!/bin/bash
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
Javawrapper shell script:
if [ -z "$1" ]; then
.. code-block:: sh
#!/bin/bash
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
if [ -z "$1" ]; then
exec 1>&2
echo Usage: $0 class-file
exit 1
fi
fi
CLASS=$1
FQCLASS=`/usr/local/bin/javaclassname $1`
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
CLASS=$1
FQCLASS=`/usr/local/bin/javaclassname $1`
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
# for example:
# CLASS=Test.class
# FQCLASS=foo.bar.Test
# FQCLASSN=Test
# FQCLASSP=foo/bar
# for example:
# CLASS=Test.class
# FQCLASS=foo.bar.Test
# FQCLASSN=Test
# FQCLASSP=foo/bar
unset CLASSBASE
unset CLASSBASE
declare -i LINKLEVEL=0
declare -i LINKLEVEL=0
while :; do
while :; do
if [ "`basename $CLASS .class`" == "$FQCLASSN" ]; then
# See if this directory works straight off
cd -L `dirname $CLASS`
@ -119,9 +131,9 @@ while :; do
exit 1
fi
CLASS=`ls --color=no -l $CLASS | sed -e 's/^.* \([^ ]*\)$/\1/'`
done
done
if [ -z "$CLASSBASE" ]; then
if [ -z "$CLASSBASE" ]; then
if [ -z "$FQCLASSP" ]; then
GOODNAME=$FQCLASSN.class
else
@ -131,96 +143,97 @@ if [ -z "$CLASSBASE" ]; then
echo $0:
echo " $FQCLASS should be in a file called $GOODNAME"
exit 1
fi
fi
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
# class is not in CLASSPATH, so prepend dir of class to CLASSPATH
if [ -z "${CLASSPATH}" ] ; then
export CLASSPATH=$CLASSBASE
else
export CLASSPATH=$CLASSBASE:$CLASSPATH
fi
fi
fi
shift
/usr/bin/java $FQCLASS "$@"
====================== Cut here ===================
shift
/usr/bin/java $FQCLASS "$@"
javaclassname.c:
====================== Cut here ===================
/* javaclassname.c
*
* Extracts the class name from a Java class file; intended for use in a Java
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
*
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
.. code-block:: c
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
/* javaclassname.c
*
* Extracts the class name from a Java class file; intended for use in a Java
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
*
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#define CP_UTF8 1
#define CP_INTEGER 3
#define CP_FLOAT 4
#define CP_LONG 5
#define CP_DOUBLE 6
#define CP_CLASS 7
#define CP_STRING 8
#define CP_FIELDREF 9
#define CP_METHODREF 10
#define CP_INTERFACEMETHODREF 11
#define CP_NAMEANDTYPE 12
#define CP_METHODHANDLE 15
#define CP_METHODTYPE 16
#define CP_INVOKEDYNAMIC 18
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
/* Define some commonly used error messages */
#define CP_UTF8 1
#define CP_INTEGER 3
#define CP_FLOAT 4
#define CP_LONG 5
#define CP_DOUBLE 6
#define CP_CLASS 7
#define CP_STRING 8
#define CP_FIELDREF 9
#define CP_METHODREF 10
#define CP_INTERFACEMETHODREF 11
#define CP_NAMEANDTYPE 12
#define CP_METHODHANDLE 15
#define CP_METHODTYPE 16
#define CP_INVOKEDYNAMIC 18
#define seek_error() error("%s: Cannot seek\n", program)
#define corrupt_error() error("%s: Class file corrupt\n", program)
#define eof_error() error("%s: Unexpected end of file\n", program)
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
/* Define some commonly used error messages */
char *program;
#define seek_error() error("%s: Cannot seek\n", program)
#define corrupt_error() error("%s: Class file corrupt\n", program)
#define eof_error() error("%s: Unexpected end of file\n", program)
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
long *pool;
char *program;
u_int8_t read_8(FILE *classfile);
u_int16_t read_16(FILE *classfile);
void skip_constant(FILE *classfile, u_int16_t *cur);
void error(const char *format, ...);
int main(int argc, char **argv);
long *pool;
/* Reads in an unsigned 8-bit integer. */
u_int8_t read_8(FILE *classfile)
{
u_int8_t read_8(FILE *classfile);
u_int16_t read_16(FILE *classfile);
void skip_constant(FILE *classfile, u_int16_t *cur);
void error(const char *format, ...);
int main(int argc, char **argv);
/* Reads in an unsigned 8-bit integer. */
u_int8_t read_8(FILE *classfile)
{
int b = fgetc(classfile);
if(b == EOF)
eof_error();
return (u_int8_t)b;
}
}
/* Reads in an unsigned 16-bit integer. */
u_int16_t read_16(FILE *classfile)
{
/* Reads in an unsigned 16-bit integer. */
u_int16_t read_16(FILE *classfile)
{
int b1, b2;
b1 = fgetc(classfile);
if(b1 == EOF)
@ -229,11 +242,11 @@ u_int16_t read_16(FILE *classfile)
if(b2 == EOF)
eof_error();
return (u_int16_t)((b1 << 8) | b2);
}
}
/* Reads in a value from the constant pool. */
void skip_constant(FILE *classfile, u_int16_t *cur)
{
/* Reads in a value from the constant pool. */
void skip_constant(FILE *classfile, u_int16_t *cur)
{
u_int16_t len;
int seekerr = 1;
pool[*cur] = ftell(classfile);
@ -270,19 +283,19 @@ void skip_constant(FILE *classfile, u_int16_t *cur)
}
if(seekerr)
seek_error();
}
}
void error(const char *format, ...)
{
void error(const char *format, ...)
{
va_list ap;
va_start(ap, format);
vfprintf(stderr, format, ap);
va_end(ap);
exit(1);
}
}
int main(int argc, char **argv)
{
int main(int argc, char **argv)
{
FILE *classfile;
u_int16_t cp_count, i, this_class, classinfo_ptr;
u_int8_t length;
@ -349,19 +362,19 @@ int main(int argc, char **argv)
free(pool);
fclose(classfile);
return 0;
}
====================== Cut here ===================
}
jarwrapper::
#!/bin/bash
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
java -jar $1
====================== Cut here ===================
#!/bin/bash
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
Now simply ``chmod +x`` the ``.class``, ``.jar`` and/or ``.html`` files you
want to execute.
java -jar $1
====================== Cut here ===================
Now simply chmod +x the .class, .jar and/or .html files you want to execute.
To add a Java program to your path best put a symbolic link to the main
.class file into /usr/bin (or another place you like) omitting the .class
extension. The directory containing the original .class file will be
@ -371,29 +384,36 @@ added to your CLASSPATH during execution.
To test your new setup, enter in the following simple Java app, and name
it "HelloWorld.java":
.. code-block:: java
class HelloWorld {
public static void main(String args[]) {
System.out.println("Hello World!");
}
}
Now compile the application with:
Now compile the application with::
javac HelloWorld.java
Set the executable permissions of the binary file, with:
Set the executable permissions of the binary file, with::
chmod 755 HelloWorld.class
And then execute it:
And then execute it::
./HelloWorld.class
To execute Java Jar files, simple chmod the *.jar files to include
the execution bit, then just do
To execute Java Jar files, simple chmod the ``*.jar`` files to include
the execution bit, then just do::
./Application.jar
To execute Java Applets, simple chmod the *.html files to include
the execution bit, then just do
To execute Java Applets, simple chmod the ``*.html`` files to include
the execution bit, then just do::
./Applet.html
@ -401,4 +421,3 @@ originally by Brian A. Lantz, brian@lantz.com
heavily edited for binfmt_misc by Richard Günther
new scripts by Colin J. Watson <cjw44@cam.ac.uk>
added executable Jar file support by Kurt Huwig <kurt@iku-netz.de>

View File

@ -0,0 +1,209 @@
The kernel's command-line parameters
====================================
The following is a consolidated list of the kernel parameters as
implemented by the __setup(), core_param() and module_param() macros
and sorted into English Dictionary order (defined as ignoring all
punctuation and sorting digits before letters in a case insensitive
manner), and with descriptions where known.
The kernel parses parameters from the kernel command line up to "--";
if it doesn't recognize a parameter and it doesn't contain a '.', the
parameter gets passed to init: parameters with '=' go into init's
environment, others are passed as command line arguments to init.
Everything after "--" is passed as an argument to init.
Module parameters can be specified in two ways: via the kernel command
line with a module name prefix, or via modprobe, e.g.::
(kernel command line) usbcore.blinkenlights=1
(modprobe command line) modprobe usbcore blinkenlights=1
Parameters for modules which are built into the kernel need to be
specified on the kernel command line. modprobe looks through the
kernel command line (/proc/cmdline) and collects module parameters
when it loads a module, so the kernel command line can be used for
loadable modules too.
Hyphens (dashes) and underscores are equivalent in parameter names, so::
log_buf_len=1M print-fatal-signals=1
can also be entered as::
log-buf-len=1M print_fatal_signals=1
Double-quotes can be used to protect spaces in values, e.g.::
param="spaces in here"
cpu lists:
----------
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
Note that for the special case of a range one can split the range into equal
sized groups and for each group use some amount from the beginning of that
group:
<cpu number>-cpu number>:<used size>/<group size>
For example one can add to the command line following parameter:
isolcpus=1,2,10-20,100-2000:2/25
where the final item represents CPUs 100,101,125,126,150,151,...
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
The parameters listed below are only valid if certain kernel build options were
enabled and if respective hardware is present. The text in square brackets at
the beginning of each description states the restrictions within which a
parameter is applicable::
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
ARM ARM architecture is enabled.
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/m68k/kernel-options.txt.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
RAM RAM disk support is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
TPM TPM drivers are enabled.
TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
Documentation/x86/x86_64/boot-options.txt .
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
In addition, the following text indicates that the option::
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
BOOT Is a boot loader parameter.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/x86/boot.txt>.
There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86/x86_64/boot-options.txt>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
it will appear as a kernel argument readable via /proc/cmdline by programs
running once the system is up.
The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/asm/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
bytes respectively. Such letter suffixes can also be entirely omitted:
.. include:: kernel-parameters.txt
:literal:
Todo
----
Add more DRM drivers.

View File

@ -1,202 +1,3 @@
Kernel Parameters
~~~~~~~~~~~~~~~~~
The following is a consolidated list of the kernel parameters as
implemented by the __setup(), core_param() and module_param() macros
and sorted into English Dictionary order (defined as ignoring all
punctuation and sorting digits before letters in a case insensitive
manner), and with descriptions where known.
The kernel parses parameters from the kernel command line up to "--";
if it doesn't recognize a parameter and it doesn't contain a '.', the
parameter gets passed to init: parameters with '=' go into init's
environment, others are passed as command line arguments to init.
Everything after "--" is passed as an argument to init.
Module parameters can be specified in two ways: via the kernel command
line with a module name prefix, or via modprobe, e.g.:
(kernel command line) usbcore.blinkenlights=1
(modprobe command line) modprobe usbcore blinkenlights=1
Parameters for modules which are built into the kernel need to be
specified on the kernel command line. modprobe looks through the
kernel command line (/proc/cmdline) and collects module parameters
when it loads a module, so the kernel command line can be used for
loadable modules too.
Hyphens (dashes) and underscores are equivalent in parameter names, so
log_buf_len=1M print-fatal-signals=1
can also be entered as
log-buf-len=1M print_fatal_signals=1
Double-quotes can be used to protect spaces in values, e.g.:
param="spaces in here"
cpu lists:
----------
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
Note that for the special case of a range one can split the range into equal
sized groups and for each group use some amount from the beginning of that
group:
<cpu number>-cpu number>:<used size>/<group size>
For example one can add to the command line following parameter:
isolcpus=1,2,10-20,100-2000:2/25
where the final item represents CPUs 100,101,125,126,150,151,...
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
"echo -n ${value} > /sys/module/${modulename}/parameters/${parm}".
The parameters listed below are only valid if certain kernel build options were
enabled and if respective hardware is present. The text in square brackets at
the beginning of each description states the restrictions within which a
parameter is applicable:
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
ARM ARM architecture is enabled.
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/m68k/kernel-options.txt.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
RAM RAM disk support is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
TPM TPM drivers are enabled.
TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
Documentation/x86/x86_64/boot-options.txt .
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
In addition, the following text indicates that the option:
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
BOOT Is a boot loader parameter.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/x86/boot.txt>.
There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86/x86_64/boot-options.txt>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
it will appear as a kernel argument readable via /proc/cmdline by programs
running once the system is up.
The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/asm/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
bytes respectively. Such letter suffixes can also be entirely omitted.
acpi= [HW,ACPI,X86,ARM64]
Advanced Configuration and Power Interface
Format: { force | on | off | strict | noirq | rsdt |
@ -811,7 +612,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
bits, and "f" is flow control ("r" for RTS or
omit it). Default is "9600n8".
See Documentation/serial-console.txt for more
See Documentation/admin-guide/serial-console.rst for more
information. See
Documentation/networking/netconsole.txt for an
alternative.
@ -2235,7 +2036,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt
md= [HW] RAID subsystems devices and level
See Documentation/md.txt.
See Documentation/admin-guide/md.rst.
mdacon= [MDA]
Format: <first>,<last>
@ -2545,7 +2346,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
will be sent.
The default is to send the implementation identification
information.
nfs.recover_lost_locks =
[NFSv4] Attempt to recover locks that were lost due
to a lease timeout on the server. Please note that
@ -3235,6 +3036,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
may be specified.
Format: <port>,<port>....
powersave=off [PPC] This option disables power saving features.
It specifically disables cpuidle and sets the
platform machine description specific power_save
function to NULL. On Idle the CPU just reduces
execution priority.
ppc_strict_facility_enable
[PPC] This option catches any kernel floating point,
Altivec, VSX and SPE outside of regions specifically
@ -3318,7 +3125,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
r128= [HW,DRM]
raid= [HW,RAID]
See Documentation/md.txt.
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/blockdev/ramdisk.txt.
@ -4197,7 +4004,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
See also Documentation/input/joystick-parport.txt
udbg-immortal [PPC] When debugging early kernel crashes that
happen after console_init() and before a proper
happen after console_init() and before a proper
console driver takes over, this boot options might
help "seeing" what's going on.
@ -4564,9 +4371,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
______________________________________________________________________
TODO:
Add more DRM drivers.

View File

@ -1,42 +1,77 @@
Tools that manage md devices can be found at
http://www.kernel.org/pub/linux/utils/raid/
RAID arrays
===========
Boot time assembly of RAID arrays
---------------------------------
Tools that manage md devices can be found at
http://www.kernel.org/pub/linux/utils/raid/
You can boot with your md device with the following kernel command
lines:
for old raid arrays without persistent superblocks:
for old raid arrays without persistent superblocks::
md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
for raid arrays with persistent superblocks
for raid arrays with persistent superblocks::
md=<md device no.>,dev0,dev1,...,devn
or, to assemble a partitionable array:
or, to assemble a partitionable array::
md=d<md device no.>,dev0,dev1,...,devn
md device no. = the number of the md device ...
0 means md0,
1 md1,
2 md2,
3 md3,
4 md4
raid level = -1 linear mode
0 striped mode
other modes are only supported with persistent super blocks
``md device no.``
+++++++++++++++++
chunk size factor = (raid-0 and raid-1 only)
Set the chunk size as 4k << n.
fault level = totally ignored
dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this:
The number of the md device
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
================= =========
``md device no.`` device
================= =========
0 md0
1 md1
2 md2
3 md3
4 md4
================= =========
``raid level``
++++++++++++++
level of the RAID array
=============== =============
``raid level`` level
=============== =============
-1 linear mode
0 striped mode
=============== =============
other modes are only supported with persistent super blocks
``chunk size factor``
+++++++++++++++++++++
(raid-0 and raid-1 only)
Set the chunk size as 4k << n.
``fault level``
+++++++++++++++
Totally ignored
``dev0`` to ``devn``
++++++++++++++++++++
e.g. ``/dev/hda1``, ``/dev/hdc1``, ``/dev/sda1``, ``/dev/sdb1``
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this::
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
Boot time autodetection of RAID arrays
@ -45,10 +80,10 @@ Boot time autodetection of RAID arrays
When md is compiled into the kernel (not as module), partitions of
type 0xfd are scanned and automatically assembled into RAID arrays.
This autodetection may be suppressed with the kernel parameter
"raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
``raid=noautodetect``. As of kernel 2.6.9, only drives with a type 0
superblock can be autodetected and run at boot time.
The kernel parameter "raid=partitionable" (or "raid=part") means
The kernel parameter ``raid=partitionable`` (or ``raid=part``) means
that all auto-detected arrays are assembled as partitionable.
Boot time assembly of degraded/dirty arrays
@ -56,22 +91,23 @@ Boot time assembly of degraded/dirty arrays
If a raid5 or raid6 array is both dirty and degraded, it could have
undetectable data corruption. This is because the fact that it is
'dirty' means that the parity cannot be trusted, and the fact that it
``dirty`` means that the parity cannot be trusted, and the fact that it
is degraded means that some datablocks are missing and cannot reliably
be reconstructed (due to no parity).
For this reason, md will normally refuse to start such an array. This
requires the sysadmin to take action to explicitly start the array
despite possible corruption. This is normally done with
despite possible corruption. This is normally done with::
mdadm --assemble --force ....
This option is not really available if the array has the root
filesystem on it. In order to support this booting from such an
array, md supports a module parameter "start_dirty_degraded" which,
array, md supports a module parameter ``start_dirty_degraded`` which,
when set to 1, bypassed the checks and will allows dirty degraded
arrays to be started.
So, to boot with a root filesystem of a dirty degraded raid[56], use
So, to boot with a root filesystem of a dirty degraded raid 5 or 6, use::
md-mod.start_dirty_degraded=1
@ -80,30 +116,30 @@ Superblock formats
------------------
The md driver can support a variety of different superblock formats.
Currently, it supports superblock formats "0.90.0" and the "md-1" format
Currently, it supports superblock formats ``0.90.0`` and the ``md-1`` format
introduced in the 2.5 development series.
The kernel will autodetect which format superblock is being used.
Superblock format '0' is treated differently to others for legacy
Superblock format ``0`` is treated differently to others for legacy
reasons - it is the original superblock format.
General Rules - apply for all superblock formats
------------------------------------------------
An array is 'created' by writing appropriate superblocks to all
An array is ``created`` by writing appropriate superblocks to all
devices.
It is 'assembled' by associating each of these devices with an
It is ``assembled`` by associating each of these devices with an
particular md virtual device. Once it is completely assembled, it can
be accessed.
An array should be created by a user-space tool. This will write
superblocks to all devices. It will usually mark the array as
'unclean', or with some devices missing so that the kernel md driver
can create appropriate redundancy (copying in raid1, parity
calculation in raid4/5).
``unclean``, or with some devices missing so that the kernel md driver
can create appropriate redundancy (copying in raid 1, parity
calculation in raid 4/5).
When an array is assembled, it is first initialized with the
SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
@ -126,13 +162,12 @@ Devices that have failed or are not yet active can be detached from an
array using HOT_REMOVE_DISK.
Specific Rules that apply to format-0 super block arrays, and
arrays with no superblock (non-persistent).
-------------------------------------------------------------
Specific Rules that apply to format-0 super block arrays, and arrays with no superblock (non-persistent)
--------------------------------------------------------------------------------------------------------
An array can be 'created' by describing the array (level, chunksize
etc) in a SET_ARRAY_INFO ioctl. This must have major_version==0 and
raid_disks != 0.
An array can be ``created`` by describing the array (level, chunksize
etc) in a SET_ARRAY_INFO ioctl. This must have ``major_version==0`` and
``raid_disks != 0``.
Then uninitialized devices can be added with ADD_NEW_DISK. The
structure passed to ADD_NEW_DISK must specify the state of the device
@ -142,24 +177,26 @@ Once started with RUN_ARRAY, uninitialized spares can be added with
HOT_ADD_DISK.
MD devices in sysfs
-------------------
md devices appear in sysfs (/sys) as regular block devices,
e.g.
md devices appear in sysfs (``/sys``) as regular block devices,
e.g.::
/sys/block/md0
Each 'md' device will contain a subdirectory called 'md' which
Each ``md`` device will contain a subdirectory called ``md`` which
contains further md-specific information about the device.
All md devices contain:
level
a text file indicating the 'raid level'. e.g. raid0, raid1,
a text file indicating the ``raid level``. e.g. raid0, raid1,
raid5, linear, multipath, faulty.
If no raid level has been set yet (array is still being
assembled), the value will reflect whatever has been written
to it, which may be a name like the above, or may be a number
such as '0', '5', etc.
such as ``0``, ``5``, etc.
raid_disks
a text file with a simple number indicating the number of devices
@ -172,10 +209,10 @@ All md devices contain:
A change to this attribute will not be permitted if it would
reduce the size of the array. To reduce the number of drives
in an e.g. raid5, the array size must first be reduced by
setting the 'array_size' attribute.
setting the ``array_size`` attribute.
chunk_size
This is the size in bytes for 'chunks' and is only relevant to
This is the size in bytes for ``chunks`` and is only relevant to
raid levels that involve striping (0,4,5,6,10). The address space
of the array is conceptually divided into chunks and consecutive
chunks are striped onto neighbouring devices.
@ -183,7 +220,7 @@ All md devices contain:
of 2. This can only be set while assembling an array
layout
The "layout" for the array for the particular level. This is
The ``layout`` for the array for the particular level. This is
simply a number that is interpretted differently by different
levels. It can be written while assembling an array.
@ -193,22 +230,24 @@ All md devices contain:
devices. Writing a number (in Kilobytes) which is less than
the available size will set the size. Any reconfiguration of the
array (e.g. adding devices) will not cause the size to change.
Writing the word 'default' will cause the effective size of the
Writing the word ``default`` will cause the effective size of the
array to be whatever size is actually available based on
'level', 'chunk_size' and 'component_size'.
``level``, ``chunk_size`` and ``component_size``.
This can be used to reduce the size of the array before reducing
the number of devices in a raid4/5/6, or to support external
metadata formats which mandate such clipping.
reshape_position
This is either "none" or a sector number within the devices of
the array where "reshape" is up to. If this is set, the three
This is either ``none`` or a sector number within the devices of
the array where ``reshape`` is up to. If this is set, the three
attributes mentioned above (raid_disks, chunk_size, layout) can
potentially have 2 values, an old and a new value. If these
values differ, reading the attribute returns
values differ, reading the attribute returns::
new (old)
and writing will effect the 'new' value, leaving the 'old'
and writing will effect the ``new`` value, leaving the ``old``
unchanged.
component_size
@ -223,9 +262,9 @@ All md devices contain:
metadata_version
This indicates the format that is being used to record metadata
about the array. It can be 0.90 (traditional format), 1.0, 1.1,
1.2 (newer format in varying locations) or "none" indicating that
1.2 (newer format in varying locations) or ``none`` indicating that
the kernel isn't managing metadata at all.
Alternately it can be "external:" followed by a string which
Alternately it can be ``external:`` followed by a string which
is set by user-space. This indicates that metadata is managed
by a user-space program. Any device failure or other event that
requires a metadata update will cause array activity to be
@ -233,9 +272,9 @@ All md devices contain:
resync_start
The point at which resync should start. If no resync is needed,
this will be a very large number (or 'none' since 2.6.30-rc1). At
this will be a very large number (or ``none`` since 2.6.30-rc1). At
array creation it will default to 0, though starting the array as
'clean' will set it much larger.
``clean`` will set it much larger.
new_dev
This file can be written but not read. The value written should
@ -246,10 +285,10 @@ All md devices contain:
safe_mode_delay
When an md array has seen no write requests for a certain period
of time, it will be marked as 'clean'. When another write
request arrives, the array is marked as 'dirty' before the write
commences. This is known as 'safe_mode'.
The 'certain period' is controlled by this file which stores the
of time, it will be marked as ``clean``. When another write
request arrives, the array is marked as ``dirty`` before the write
commences. This is known as ``safe_mode``.
The ``certain period`` is controlled by this file which stores the
period as a number of seconds. The default is 200msec (0.200).
Writing a value of 0 disables safemode.
@ -260,38 +299,50 @@ All md devices contain:
cannot be explicitly set, and some transitions are not allowed.
Select/poll works on this file. All changes except between
active_idle and active (which can be frequent and are not
very interesting) are notified. active->active_idle is
reported if the metadata is externally managed.
Active_idle and active (which can be frequent and are not
very interesting) are notified. active->active_idle is
reported if the metadata is externally managed.
clear
No devices, no size, no level
Writing is equivalent to STOP_ARRAY ioctl
inactive
May have some settings, but array is not active
all IO results in error
all IO results in error
When written, doesn't tear down array, but just stops it
suspended (not supported yet)
All IO requests will block. The array can be reconfigured.
Writing this, if accepted, will block until array is quiessent
readonly
no resync can happen. no superblocks get written.
write requests fail
read-auto
like readonly, but behaves like 'clean' on a write request.
clean - no pending writes, but otherwise active.
Write requests fail
read-auto
like readonly, but behaves like ``clean`` on a write request.
clean
no pending writes, but otherwise active.
When written to inactive array, starts without resync
If a write request arrives then
if metadata is known, mark 'dirty' and switch to 'active'.
if not known, block and switch to write-pending
if metadata is known, mark ``dirty`` and switch to ``active``.
if not known, block and switch to write-pending
If written to an active array that has pending writes, then fails.
active
fully active: IO and resync can be happening.
When written to inactive array, starts with resync
write-pending
clean, but writes are blocked waiting for 'active' to be written.
clean, but writes are blocked waiting for ``active`` to be written.
active-idle
like active, but no writes have been seen for a while (safe_mode_delay).
@ -299,57 +350,71 @@ All md devices contain:
bitmap/location
This indicates where the write-intent bitmap for the array is
stored.
It can be one of "none", "file" or "[+-]N".
"file" may later be extended to "file:/file/name"
"[+-]N" means that many sectors from the start of the metadata.
This is replicated on all devices. For arrays with externally
managed metadata, the offset is from the beginning of the
device.
It can be one of ``none``, ``file`` or ``[+-]N``.
``file`` may later be extended to ``file:/file/name``
``[+-]N`` means that many sectors from the start of the metadata.
This is replicated on all devices. For arrays with externally
managed metadata, the offset is from the beginning of the
device.
bitmap/chunksize
The size, in bytes, of the chunk which will be represented by a
single bit. For RAID456, it is a portion of an individual
device. For RAID10, it is a portion of the array. For RAID1, it
is both (they come to the same thing).
bitmap/time_base
The time, in seconds, between looking for bits in the bitmap to
be cleared. In the current implementation, a bit will be cleared
between 2 and 3 times "time_base" after all the covered blocks
between 2 and 3 times ``time_base`` after all the covered blocks
are known to be in-sync.
bitmap/backlog
When write-mostly devices are active in a RAID1, write requests
to those devices proceed in the background - the filesystem (or
other user of the device) does not have to wait for them.
'backlog' sets a limit on the number of concurrent background
``backlog`` sets a limit on the number of concurrent background
writes. If there are more than this, new writes will by
synchronous.
bitmap/metadata
This can be either 'internal' or 'external'.
'internal' is the default and means the metadata for the bitmap
is stored in the first 256 bytes of the allocated space and is
managed by the md module.
'external' means that bitmap metadata is managed externally to
the kernel (i.e. by some userspace program)
This can be either ``internal`` or ``external``.
``internal``
is the default and means the metadata for the bitmap
is stored in the first 256 bytes of the allocated space and is
managed by the md module.
``external``
means that bitmap metadata is managed externally to
the kernel (i.e. by some userspace program)
bitmap/can_clear
This is either 'true' or 'false'. If 'true', then bits in the
This is either ``true`` or ``false``. If ``true``, then bits in the
bitmap will be cleared when the corresponding blocks are thought
to be in-sync. If 'false', bits will never be cleared.
This is automatically set to 'false' if a write happens on a
to be in-sync. If ``false``, bits will never be cleared.
This is automatically set to ``false`` if a write happens on a
degraded array, or if the array becomes degraded during a write.
When metadata is managed externally, it should be set to true
once the array becomes non-degraded, and this fact has been
recorded in the metadata.
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
As component devices are added to an md array, they appear in the ``md``
directory as new directories named::
dev-XXX
where XXX is a name that the kernel knows for the device, e.g. hdb1.
where ``XXX`` is a name that the kernel knows for the device, e.g. hdb1.
Each directory contains:
block
a symlink to the block device in /sys/block, e.g.
a symlink to the block device in /sys/block, e.g.::
/sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
super
@ -358,51 +423,83 @@ Each directory contains:
state
A file recording the current state of the device in the array
which can be a comma separated list of
faulty - device has been kicked from active use due to
a detected fault, or it has unacknowledged bad
blocks
in_sync - device is a fully in-sync member of the array
writemostly - device will only be subject to read
requests if there are no other options.
This applies only to raid1 arrays.
blocked - device has failed, and the failure hasn't been
acknowledged yet by the metadata handler.
Writes that would write to this device if
it were not faulty are blocked.
spare - device is working, but not a full member.
This includes spares that are in the process
of being recovered to
write_error - device has ever seen a write error.
want_replacement - device is (mostly) working but probably
should be replaced, either due to errors or
due to user request.
replacement - device is a replacement for another active
device with same raid_disk.
which can be a comma separated list of:
faulty
device has been kicked from active use due to
a detected fault, or it has unacknowledged bad
blocks
in_sync
device is a fully in-sync member of the array
writemostly
device will only be subject to read
requests if there are no other options.
This applies only to raid1 arrays.
blocked
device has failed, and the failure hasn't been
acknowledged yet by the metadata handler.
Writes that would write to this device if
it were not faulty are blocked.
spare
device is working, but not a full member.
This includes spares that are in the process
of being recovered to
write_error
device has ever seen a write error.
want_replacement
device is (mostly) working but probably
should be replaced, either due to errors or
due to user request.
replacement
device is a replacement for another active
device with same raid_disk.
This list may grow in future.
This can be written to.
Writing "faulty" simulates a failure on the device.
Writing "remove" removes the device from the array.
Writing "writemostly" sets the writemostly flag.
Writing "-writemostly" clears the writemostly flag.
Writing "blocked" sets the "blocked" flag.
Writing "-blocked" clears the "blocked" flags and allows writes
to complete and possibly simulates an error.
Writing "in_sync" sets the in_sync flag.
Writing "write_error" sets writeerrorseen flag.
Writing "-write_error" clears writeerrorseen flag.
Writing "want_replacement" is allowed at any time except to a
replacement device or a spare. It sets the flag.
Writing "-want_replacement" is allowed at any time. It clears
the flag.
Writing "replacement" or "-replacement" is only allowed before
starting the array. It sets or clears the flag.
Writing ``faulty`` simulates a failure on the device.
Writing ``remove`` removes the device from the array.
Writing ``writemostly`` sets the writemostly flag.
Writing ``-writemostly`` clears the writemostly flag.
Writing ``blocked`` sets the ``blocked`` flag.
Writing ``-blocked`` clears the ``blocked`` flags and allows writes
to complete and possibly simulates an error.
Writing ``in_sync`` sets the in_sync flag.
Writing ``write_error`` sets writeerrorseen flag.
Writing ``-write_error`` clears writeerrorseen flag.
Writing ``want_replacement`` is allowed at any time except to a
replacement device or a spare. It sets the flag.
Writing ``-want_replacement`` is allowed at any time. It clears
the flag.
Writing ``replacement`` or ``-replacement`` is only allowed before
starting the array. It sets or clears the flag.
This file responds to select/poll. Any change to 'faulty'
or 'blocked' causes an event.
This file responds to select/poll. Any change to ``faulty``
or ``blocked`` causes an event.
errors
An approximate count of read errors that have been detected on
@ -417,9 +514,9 @@ Each directory contains:
slot
This gives the role that the device has in the array. It will
either be 'none' if the device is not active in the array
either be ``none`` if the device is not active in the array
(i.e. is a spare or has failed) or an integer less than the
'raid_disks' number for the array indicating which position
``raid_disks`` number for the array indicating which position
it currently fills. This can only be set while assembling an
array. A device for which this is set is assumed to be working.
@ -437,7 +534,7 @@ Each directory contains:
written, it will be rejected.
recovery_start
When the device is not 'in_sync', this records the number of
When the device is not ``in_sync``, this records the number of
sectors from the start of the device which are known to be
correct. This is normally zero, but during a recovery
operation it will steadily increase, and if the recovery is
@ -447,21 +544,21 @@ Each directory contains:
This can be set whenever the device is not an active member of
the array, either before the array is activated, or before
the 'slot' is set.
the ``slot`` is set.
Setting this to ``none`` is equivalent to setting ``in_sync``.
Setting to any other value also clears the ``in_sync`` flag.
Setting this to 'none' is equivalent to setting 'in_sync'.
Setting to any other value also clears the 'in_sync' flag.
bad_blocks
This gives the list of all known bad blocks in the form of
start address and length (in sectors respectively). If output
is too big to fit in a page, it will be truncated. Writing
"sector length" to this file adds new acknowledged (i.e.
``sector length`` to this file adds new acknowledged (i.e.
recorded to disk safely) bad blocks.
unacknowledged_bad_blocks
This gives the list of known-but-not-yet-saved-to-disk bad
blocks in the same form of 'bad_blocks'. If output is too big
blocks in the same form of ``bad_blocks``. If output is too big
to fit in a page, it will be truncated. Writing to this file
adds bad blocks without acknowledging them. This is largely
for testing.
@ -469,16 +566,18 @@ Each directory contains:
An active md device will also contain an entry for each active device
in the array. These are named
in the array. These are named::
rdNN
where 'NN' is the position in the array, starting from 0.
where ``NN`` is the position in the array, starting from 0.
So for a 3 drive array there will be rd0, rd1, rd2.
These are symbolic links to the appropriate 'dev-XXX' entry.
Thus, for example,
These are symbolic links to the appropriate ``dev-XXX`` entry.
Thus, for example::
cat /sys/block/md*/md/rd*/state
will show 'in_sync' on every line.
will show ``in_sync`` on every line.
@ -488,50 +587,62 @@ also have
sync_action
a text file that can be used to monitor and control the rebuild
process. It contains one word which can be one of:
resync - redundancy is being recalculated after unclean
shutdown or creation
recover - a hot spare is being built to replace a
failed/missing device
idle - nothing is happening
check - A full check of redundancy was requested and is
happening. This reads all blocks and checks
them. A repair may also happen for some raid
levels.
repair - A full check and repair is happening. This is
similar to 'resync', but was requested by the
user, and the write-intent bitmap is NOT used to
optimise the process.
resync
redundancy is being recalculated after unclean
shutdown or creation
recover
a hot spare is being built to replace a
failed/missing device
idle
nothing is happening
check
A full check of redundancy was requested and is
happening. This reads all blocks and checks
them. A repair may also happen for some raid
levels.
repair
A full check and repair is happening. This is
similar to ``resync``, but was requested by the
user, and the write-intent bitmap is NOT used to
optimise the process.
This file is writable, and each of the strings that could be
read are meaningful for writing.
'idle' will stop an active resync/recovery etc. There is no
guarantee that another resync/recovery may not be automatically
started again, though some event will be needed to trigger
this.
'resync' or 'recovery' can be used to restart the
corresponding operation if it was stopped with 'idle'.
'check' and 'repair' will start the appropriate process
providing the current state is 'idle'.
``idle`` will stop an active resync/recovery etc. There is no
guarantee that another resync/recovery may not be automatically
started again, though some event will be needed to trigger
this.
``resync`` or ``recovery`` can be used to restart the
corresponding operation if it was stopped with ``idle``.
``check`` and ``repair`` will start the appropriate process
providing the current state is ``idle``.
This file responds to select/poll. Any important change in the value
triggers a poll event. Sometimes the value will briefly be
"recover" if a recovery seems to be needed, but cannot be
achieved. In that case, the transition to "recover" isn't
``recover`` if a recovery seems to be needed, but cannot be
achieved. In that case, the transition to ``recover`` isn't
notified, but the transition away is.
degraded
This contains a count of the number of devices by which the
arrays is degraded. So an optimal array will show '0'. A
single failed/missing drive will show '1', etc.
arrays is degraded. So an optimal array will show ``0``. A
single failed/missing drive will show ``1``, etc.
This file responds to select/poll, any increase or decrease
in the count of missing devices will trigger an event.
mismatch_count
When performing 'check' and 'repair', and possibly when
performing 'resync', md will count the number of errors that are
found. The count in 'mismatch_cnt' is the number of sectors
that were re-written, or (for 'check') would have been
When performing ``check`` and ``repair``, and possibly when
performing ``resync``, md will count the number of errors that are
found. The count in ``mismatch_cnt`` is the number of sectors
that were re-written, or (for ``check``) would have been
re-written. As most raid levels work in units of pages rather
than sectors, this may be larger than the number of actual errors
by a factor of the number of sectors in a page.
@ -542,27 +653,30 @@ also have
would need to check the corresponding blocks. Either individual
numbers or start-end pairs can be written. Multiple numbers
can be separated by a space.
Note that the numbers are 'bit' numbers, not 'block' numbers.
Note that the numbers are ``bit`` numbers, not ``block`` numbers.
They should be scaled by the bitmap_chunksize.
sync_speed_min
sync_speed_max
This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
sync_speed_min, sync_speed_max
This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
however they only apply to the particular array.
If no value has been written to these, or if the word 'system'
If no value has been written to these, or if the word ``system``
is written, then the system-wide value is used. If a value,
in kibibytes-per-second is written, then it is used.
When the files are read, they show the currently active value
followed by "(local)" or "(system)" depending on whether it is
followed by ``(local)`` or ``(system)`` depending on whether it is
a locally set or system-wide value.
sync_completed
This shows the number of sectors that have been completed of
whatever the current sync_action is, followed by the number of
sectors in total that could need to be processed. The two
numbers are separated by a '/' thus effectively showing one
numbers are separated by a ``/`` thus effectively showing one
value, a fraction of the process that is complete.
A 'select' on this attribute will return when resync completes,
A ``select`` on this attribute will return when resync completes,
when it reaches the current sync_max (below) and possibly at
other times.
@ -570,26 +684,24 @@ also have
This shows the current actual speed, in K/sec, of the current
sync_action. It is averaged over the last 30 seconds.
suspend_lo
suspend_hi
suspend_lo, suspend_hi
The two values, given as numbers of sectors, indicate a range
within the array where IO will be blocked. This is currently
only supported for raid4/5/6.
sync_min
sync_max
sync_min, sync_max
The two values, given as numbers of sectors, indicate a range
within the array where 'check'/'repair' will operate. Must be
a multiple of chunk_size. When it reaches "sync_max" it will
within the array where ``check``/``repair`` will operate. Must be
a multiple of chunk_size. When it reaches ``sync_max`` it will
pause, rather than complete.
You can use 'select' or 'poll' on "sync_completed" to wait for
You can use ``select`` or ``poll`` on ``sync_completed`` to wait for
that number to reach sync_max. Then you can either increase
"sync_max", or can write 'idle' to "sync_action".
``sync_max``, or can write ``idle`` to ``sync_action``.
The value of 'max' for "sync_max" effectively disables the limit.
The value of ``max`` for ``sync_max`` effectively disables the limit.
When a resync is active, the value can only ever be increased,
never decreased.
The value of '0' is the minimum for "sync_min".
The value of ``0`` is the minimum for ``sync_min``.
@ -598,13 +710,15 @@ personality module that manages it.
These are specific to the implementation of the module and could
change substantially if the implementation changes.
These currently include
These currently include:
stripe_cache_size (currently raid5 only)
number of entries in the stripe cache. This is writable, but
there are upper and lower limits (32768, 17). Default is 256.
strip_cache_active (currently raid5 only)
number of active entries in the stripe cache
preread_bypass_threshold (currently raid5 only)
number of times a stripe requiring preread will be bypassed by
a stripe that does not require preread. For fairness defaults

View File

@ -1,22 +1,21 @@
==============================
KERNEL MODULE SIGNING FACILITY
==============================
Kernel module signing facility
------------------------------
CONTENTS
- Overview.
- Configuring module signing.
- Generating signing keys.
- Public keys in the kernel.
- Manually signing modules.
- Signed modules and stripping.
- Loading signed modules.
- Non-valid signatures and unsigned modules.
- Administering/protecting the private key.
.. CONTENTS
..
.. - Overview.
.. - Configuring module signing.
.. - Generating signing keys.
.. - Public keys in the kernel.
.. - Manually signing modules.
.. - Signed modules and stripping.
.. - Loading signed modules.
.. - Non-valid signatures and unsigned modules.
.. - Administering/protecting the private key.
========
OVERVIEW
Overview
========
The kernel module signing facility cryptographically signs modules during
@ -36,17 +35,19 @@ SHA-512 (the algorithm is selected by data in the signature).
==========================
CONFIGURING MODULE SIGNING
Configuring module signing
==========================
The module signing facility is enabled by going to the "Enable Loadable Module
Support" section of the kernel configuration and turning on
The module signing facility is enabled by going to the
:menuselection:`Enable Loadable Module Support` section of
the kernel configuration and turning on::
CONFIG_MODULE_SIG "Module signature verification"
This has a number of options available:
(1) "Require modules to be validly signed" (CONFIG_MODULE_SIG_FORCE)
(1) :menuselection:`Require modules to be validly signed`
(``CONFIG_MODULE_SIG_FORCE``)
This specifies how the kernel should deal with a module that has a
signature for which the key is not known or a module that is unsigned.
@ -64,35 +65,39 @@ This has a number of options available:
cannot be parsed, it will be rejected out of hand.
(2) "Automatically sign all modules" (CONFIG_MODULE_SIG_ALL)
(2) :menuselection:`Automatically sign all modules`
(``CONFIG_MODULE_SIG_ALL``)
If this is on then modules will be automatically signed during the
modules_install phase of a build. If this is off, then the modules must
be signed manually using:
be signed manually using::
scripts/sign-file
(3) "Which hash algorithm should modules be signed with?"
(3) :menuselection:`Which hash algorithm should modules be signed with?`
This presents a choice of which hash algorithm the installation phase will
sign the modules with:
CONFIG_MODULE_SIG_SHA1 "Sign modules with SHA-1"
CONFIG_MODULE_SIG_SHA224 "Sign modules with SHA-224"
CONFIG_MODULE_SIG_SHA256 "Sign modules with SHA-256"
CONFIG_MODULE_SIG_SHA384 "Sign modules with SHA-384"
CONFIG_MODULE_SIG_SHA512 "Sign modules with SHA-512"
=============================== ==========================================
``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1`
``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224`
``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
=============================== ==========================================
The algorithm selected here will also be built into the kernel (rather
than being a module) so that modules signed with that algorithm can have
their signatures checked without causing a dependency loop.
(4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
(4) :menuselection:`File name or PKCS#11 URI of module signing key`
(``CONFIG_MODULE_SIG_KEY``)
Setting this option to something other than its default of
"certs/signing_key.pem" will disable the autogeneration of signing keys
``certs/signing_key.pem`` will disable the autogeneration of signing keys
and allow the kernel modules to be signed with a key of your choosing.
The string provided should identify a file containing both a private key
and its corresponding X.509 certificate in PEM form, or — on systems where
@ -102,10 +107,11 @@ This has a number of options available:
If the PEM file containing the private key is encrypted, or if the
PKCS#11 token requries a PIN, this can be provided at build time by
means of the KBUILD_SIGN_PIN variable.
means of the ``KBUILD_SIGN_PIN`` variable.
(5) "Additional X.509 keys for default system keyring" (CONFIG_SYSTEM_TRUSTED_KEYS)
(5) :menuselection:`Additional X.509 keys for default system keyring`
(``CONFIG_SYSTEM_TRUSTED_KEYS``)
This option can be set to the filename of a PEM-encoded file containing
additional certificates which will be included in the system keyring by
@ -116,7 +122,7 @@ packages to the kernel build processes for the tool that does the signing.
=======================
GENERATING SIGNING KEYS
Generating signing keys
=======================
Cryptographic keypairs are required to generate and check signatures. A
@ -126,14 +132,14 @@ it can be deleted or stored securely. The public key gets built into the
kernel so that it can be used to check the signatures as the modules are
loaded.
Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its
Under normal conditions, when ``CONFIG_MODULE_SIG_KEY`` is unchanged from its
default, the kernel build will automatically generate a new keypair using
openssl if one does not exist in the file:
openssl if one does not exist in the file::
certs/signing_key.pem
during the building of vmlinux (the public part of the key needs to be built
into vmlinux) using parameters in the:
into vmlinux) using parameters in the::
certs/x509.genkey
@ -142,14 +148,14 @@ file (which is also generated if it does not already exist).
It is strongly recommended that you provide your own x509.genkey file.
Most notably, in the x509.genkey file, the req_distinguished_name section
should be altered from the default:
should be altered from the default::
[ req_distinguished_name ]
#O = Unspecified company
CN = Build time autogenerated kernel key
#emailAddress = unspecified.user@unspecified.company
The generated RSA key size can also be set with:
The generated RSA key size can also be set with::
[ req ]
default_bits = 4096
@ -158,23 +164,23 @@ The generated RSA key size can also be set with:
It is also possible to manually generate the key private/public files using the
x509.genkey key generation configuration file in the root node of the Linux
kernel sources tree and the openssl command. The following is an example to
generate the public/private key files:
generate the public/private key files::
openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
-config x509.genkey -outform PEM -out kernel_key.pem \
-keyout kernel_key.pem
The full pathname for the resulting kernel_key.pem file can then be specified
in the CONFIG_MODULE_SIG_KEY option, and the certificate and key therein will
in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key therein will
be used instead of an autogenerated keypair.
=========================
PUBLIC KEYS IN THE KERNEL
Public keys in the kernel
=========================
The kernel contains a ring of public keys that can be viewed by root. They're
in a keyring called ".system_keyring" that can be seen by:
in a keyring called ".system_keyring" that can be seen by::
[root@deneb ~]# cat /proc/keys
...
@ -184,27 +190,27 @@ in a keyring called ".system_keyring" that can be seen by:
Beyond the public key generated specifically for module signing, additional
trusted certificates can be provided in a PEM-encoded file referenced by the
CONFIG_SYSTEM_TRUSTED_KEYS configuration option.
``CONFIG_SYSTEM_TRUSTED_KEYS`` configuration option.
Further, the architecture code may take public keys from a hardware store and
add those in also (e.g. from the UEFI key database).
Finally, it is possible to add additional public keys by doing:
Finally, it is possible to add additional public keys by doing::
keyctl padd asymmetric "" [.system_keyring-ID] <[key-file]
e.g.:
e.g.::
keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
Note, however, that the kernel will only permit keys to be added to
.system_keyring _if_ the new key's X.509 wrapper is validly signed by a key
``.system_keyring _if_`` the new key's X.509 wrapper is validly signed by a key
that is already resident in the .system_keyring at the time the key was added.
=========================
MANUALLY SIGNING MODULES
=========================
========================
Manually signing modules
========================
To manually sign a module, use the scripts/sign-file tool available in
the Linux kernel source tree. The script requires 4 arguments:
@ -214,7 +220,7 @@ the Linux kernel source tree. The script requires 4 arguments:
3. The public key filename
4. The kernel module to be signed
The following is an example to sign a kernel module:
The following is an example to sign a kernel module::
scripts/sign-file sha512 kernel-signkey.priv \
kernel-signkey.x509 module.ko
@ -228,11 +234,11 @@ $KBUILD_SIGN_PIN environment variable.
============================
SIGNED MODULES AND STRIPPING
Signed modules and stripping
============================
A signed module has a digital signature simply appended at the end. The string
"~Module signature appended~." at the end of the module's file confirms that a
``~Module signature appended~.`` at the end of the module's file confirms that a
signature is present but it does not confirm that the signature is valid!
Signed modules are BRITTLE as the signature is outside of the defined ELF
@ -242,19 +248,19 @@ debug information present at the time of signing.
======================
LOADING SIGNED MODULES
Loading signed modules
======================
Modules are loaded with insmod, modprobe, init_module() or finit_module(),
exactly as for unsigned modules as no processing is done in userspace. The
signature checking is all done within the kernel.
Modules are loaded with insmod, modprobe, ``init_module()`` or
``finit_module()``, exactly as for unsigned modules as no processing is
done in userspace. The signature checking is all done within the kernel.
=========================================
NON-VALID SIGNATURES AND UNSIGNED MODULES
Non-valid signatures and unsigned modules
=========================================
If CONFIG_MODULE_SIG_FORCE is enabled or module.sig_enforce=1 is supplied on
If ``CONFIG_MODULE_SIG_FORCE`` is enabled or module.sig_enforce=1 is supplied on
the kernel command line, the kernel will only load validly signed modules
for which it has a public key. Otherwise, it will also load modules that are
unsigned. Any module for which the kernel has a key, but which proves to have
@ -264,7 +270,7 @@ Any module that has an unparseable signature will be rejected.
=========================================
ADMINISTERING/PROTECTING THE PRIVATE KEY
Administering/protecting the private key
=========================================
Since the private key is used to sign modules, viruses and malware could use
@ -275,5 +281,5 @@ in the root node of the kernel source tree.
If you use the same private key to sign modules for multiple kernel
configurations, you must ensure that the module version information is
sufficient to prevent loading a module into a different kernel. Either
set CONFIG_MODVERSIONS=y or ensure that each configuration has a different
kernel release string by changing EXTRAVERSION or CONFIG_LOCALVERSION.
set ``CONFIG_MODVERSIONS=y`` or ensure that each configuration has a different
kernel release string by changing ``EXTRAVERSION`` or ``CONFIG_LOCALVERSION``.

View File

@ -1,5 +1,5 @@
Mono(tm) Binary Kernel Support for Linux
-----------------------------------------
Mono(tm) Binary Kernel Support for Linux
-----------------------------------------
To configure Linux to automatically execute Mono-based .NET binaries
(in the form of .exe files) without the need to use the mono CLR
@ -19,22 +19,24 @@ other program after you have done the following:
http://www.go-mono.com/compiling.html
Once the Mono CLR support has been installed, just check that
/usr/bin/mono (which could be located elsewhere, for example
/usr/local/bin/mono) is working.
``/usr/bin/mono`` (which could be located elsewhere, for example
``/usr/local/bin/mono``) is working.
2) You have to compile BINFMT_MISC either as a module or into
the kernel (CONFIG_BINFMT_MISC) and set it up properly.
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
If you choose to compile it as a module, you will have
to insert it manually with modprobe/insmod, as kmod
cannot be easily supported with binfmt_misc.
Read the file 'binfmt_misc.txt' in this directory to know
cannot be easily supported with binfmt_misc.
Read the file ``binfmt_misc.txt`` in this directory to know
more about the configuration process.
3) Add the following entries to /etc/rc.local or similar script
3) Add the following entries to ``/etc/rc.local`` or similar script
to be run at system startup:
# Insert BINFMT_MISC module into the kernel
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
.. code-block:: sh
# Insert BINFMT_MISC module into the kernel
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
/sbin/modprobe binfmt_misc
# Some distributions, like Fedora Core, perform
# the following command automatically when the
@ -43,24 +45,26 @@ if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
# Thus, it is possible that the following line
# is not needed at all.
mount -t binfmt_misc none /proc/sys/fs/binfmt_misc
fi
fi
# Register support for .NET CLR binaries
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
# Register support for .NET CLR binaries
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
# Replace /usr/bin/mono with the correct pathname to
# the Mono CLR runtime (usually /usr/local/bin/mono
# when compiling from sources or CVS).
echo ':CLR:M::MZ::/usr/bin/mono:' > /proc/sys/fs/binfmt_misc/register
else
else
echo "No binfmt_misc support"
exit 1
fi
fi
4) Check that .exe binaries can be ran without the need of a
wrapper script, simply by launching the .exe file directly
from a command prompt, for example:
4) Check that ``.exe`` binaries can be ran without the need of a
wrapper script, simply by launching the ``.exe`` file directly
from a command prompt, for example::
/usr/bin/xsd.exe
NOTE: If this fails with a permission denied error, check
that the .exe file has execute permissions.
.. note::
If this fails with a permission denied error, check
that the ``.exe`` file has execute permissions.

View File

@ -0,0 +1,286 @@
Parport
+++++++
The ``parport`` code provides parallel-port support under Linux. This
includes the ability to share one port between multiple device
drivers.
You can pass parameters to the ``parport`` code to override its automatic
detection of your hardware. This is particularly useful if you want
to use IRQs, since in general these can't be autoprobed successfully.
By default IRQs are not used even if they **can** be probed. This is
because there are a lot of people using the same IRQ for their
parallel port and a sound card or network card.
The ``parport`` code is split into two parts: generic (which deals with
port-sharing) and architecture-dependent (which deals with actually
using the port).
Parport as modules
==================
If you load the `parport`` code as a module, say::
# insmod parport
to load the generic ``parport`` code. You then must load the
architecture-dependent code with (for example)::
# insmod parport_pc io=0x3bc,0x378,0x278 irq=none,7,auto
to tell the ``parport`` code that you want three PC-style ports, one at
0x3bc with no IRQ, one at 0x378 using IRQ 7, and one at 0x278 with an
auto-detected IRQ. Currently, PC-style (``parport_pc``), Sun ``bpp``,
Amiga, Atari, and MFC3 hardware is supported.
PCI parallel I/O card support comes from ``parport_pc``. Base I/O
addresses should not be specified for supported PCI cards since they
are automatically detected.
modprobe
--------
If you use modprobe , you will find it useful to add lines as below to a
configuration file in /etc/modprobe.d/ directory::
alias parport_lowlevel parport_pc
options parport_pc io=0x378,0x278 irq=7,auto
modprobe will load ``parport_pc`` (with the options ``io=0x378,0x278 irq=7,auto``)
whenever a parallel port device driver (such as ``lp``) is loaded.
Note that these are example lines only! You shouldn't in general need
to specify any options to ``parport_pc`` in order to be able to use a
parallel port.
Parport probe [optional]
------------------------
In 2.2 kernels there was a module called ``parport_probe``, which was used
for collecting IEEE 1284 device ID information. This has now been
enhanced and now lives with the IEEE 1284 support. When a parallel
port is detected, the devices that are connected to it are analysed,
and information is logged like this::
parport0: Printer, BJC-210 (Canon)
The probe information is available from files in ``/proc/sys/dev/parport/``.
Parport linked into the kernel statically
=========================================
If you compile the ``parport`` code into the kernel, then you can use
kernel boot parameters to get the same effect. Add something like the
following to your LILO command line::
parport=0x3bc parport=0x378,7 parport=0x278,auto,nofifo
You can have many ``parport=...`` statements, one for each port you want
to add. Adding ``parport=0`` to the kernel command-line will disable
parport support entirely. Adding ``parport=auto`` to the kernel
command-line will make ``parport`` use any IRQ lines or DMA channels that
it auto-detects.
Files in /proc
==============
If you have configured the ``/proc`` filesystem into your kernel, you will
see a new directory entry: ``/proc/sys/dev/parport``. In there will be a
directory entry for each parallel port for which parport is
configured. In each of those directories are a collection of files
describing that parallel port.
The ``/proc/sys/dev/parport`` directory tree looks like::
parport
|-- default
| |-- spintime
| `-- timeslice
|-- parport0
| |-- autoprobe
| |-- autoprobe0
| |-- autoprobe1
| |-- autoprobe2
| |-- autoprobe3
| |-- devices
| | |-- active
| | `-- lp
| | `-- timeslice
| |-- base-addr
| |-- irq
| |-- dma
| |-- modes
| `-- spintime
`-- parport1
|-- autoprobe
|-- autoprobe0
|-- autoprobe1
|-- autoprobe2
|-- autoprobe3
|-- devices
| |-- active
| `-- ppa
| `-- timeslice
|-- base-addr
|-- irq
|-- dma
|-- modes
`-- spintime
.. tabularcolumns:: |p{4.0cm}|p{13.5cm}|
======================= =======================================================
File Contents
======================= =======================================================
``devices/active`` A list of the device drivers using that port. A "+"
will appear by the name of the device currently using
the port (it might not appear against any). The
string "none" means that there are no device drivers
using that port.
``base-addr`` Parallel port's base address, or addresses if the port
has more than one in which case they are separated
with tabs. These values might not have any sensible
meaning for some ports.
``irq`` Parallel port's IRQ, or -1 if none is being used.
``dma`` Parallel port's DMA channel, or -1 if none is being
used.
``modes`` Parallel port's hardware modes, comma-separated,
meaning:
- PCSPP
PC-style SPP registers are available.
- TRISTATE
Port is bidirectional.
- COMPAT
Hardware acceleration for printers is
available and will be used.
- EPP
Hardware acceleration for EPP protocol
is available and will be used.
- ECP
Hardware acceleration for ECP protocol
is available and will be used.
- DMA
DMA is available and will be used.
Note that the current implementation will only take
advantage of COMPAT and ECP modes if it has an IRQ
line to use.
``autoprobe`` Any IEEE-1284 device ID information that has been
acquired from the (non-IEEE 1284.3) device.
``autoprobe[0-3]`` IEEE 1284 device ID information retrieved from
daisy-chain devices that conform to IEEE 1284.3.
``spintime`` The number of microseconds to busy-loop while waiting
for the peripheral to respond. You might find that
adjusting this improves performance, depending on your
peripherals. This is a port-wide setting, i.e. it
applies to all devices on a particular port.
``timeslice`` The number of milliseconds that a device driver is
allowed to keep a port claimed for. This is advisory,
and driver can ignore it if it must.
``default/*`` The defaults for spintime and timeslice. When a new
port is registered, it picks up the default spintime.
When a new device is registered, it picks up the
default timeslice.
======================= =======================================================
Device drivers
==============
Once the parport code is initialised, you can attach device drivers to
specific ports. Normally this happens automatically; if the lp driver
is loaded it will create one lp device for each port found. You can
override this, though, by using parameters either when you load the lp
driver::
# insmod lp parport=0,2
or on the LILO command line::
lp=parport0 lp=parport2
Both the above examples would inform lp that you want ``/dev/lp0`` to be
the first parallel port, and /dev/lp1 to be the **third** parallel port,
with no lp device associated with the second port (parport1). Note
that this is different to the way older kernels worked; there used to
be a static association between the I/O port address and the device
name, so ``/dev/lp0`` was always the port at 0x3bc. This is no longer the
case - if you only have one port, it will default to being ``/dev/lp0``,
regardless of base address.
Also:
* If you selected the IEEE 1284 support at compile time, you can say
``lp=auto`` on the kernel command line, and lp will create devices
only for those ports that seem to have printers attached.
* If you give PLIP the ``timid`` parameter, either with ``plip=timid`` on
the command line, or with ``insmod plip timid=1`` when using modules,
it will avoid any ports that seem to be in use by other devices.
* IRQ autoprobing works only for a few port types at the moment.
Reporting printer problems with parport
=======================================
If you are having problems printing, please go through these steps to
try to narrow down where the problem area is.
When reporting problems with parport, really you need to give all of
the messages that ``parport_pc`` spits out when it initialises. There are
several code paths:
- polling
- interrupt-driven, protocol in software
- interrupt-driven, protocol in hardware using PIO
- interrupt-driven, protocol in hardware using DMA
The kernel messages that ``parport_pc`` logs give an indication of which
code path is being used. (They could be a lot better actually..)
For normal printer protocol, having IEEE 1284 modes enabled or not
should not make a difference.
To turn off the 'protocol in hardware' code paths, disable
``CONFIG_PARPORT_PC_FIFO``. Note that when they are enabled they are not
necessarily **used**; it depends on whether the hardware is available,
enabled by the BIOS, and detected by the driver.
So, to start with, disable ``CONFIG_PARPORT_PC_FIFO``, and load ``parport_pc``
with ``irq=none``. See if printing works then. It really should,
because this is the simplest code path.
If that works fine, try with ``io=0x378 irq=7`` (adjust for your
hardware), to make it use interrupt-driven in-software protocol.
If **that** works fine, then one of the hardware modes isn't working
right. Enable ``CONFIG_FIFO`` (no, it isn't a module option,
and yes, it should be), set the port to ECP mode in the BIOS and note
the DMA channel, and try with::
io=0x378 irq=7 dma=none (for PIO)
io=0x378 irq=7 dma=3 (for DMA)
----------
philb@gnu.org
tim@cyberelk.net

View File

@ -5,34 +5,37 @@ Sergiu Iordache <sergiu@chromium.org>
Updated: 17 November 2011
0. Introduction
Introduction
------------
Ramoops is an oops/panic logger that writes its logs to RAM before the system
crashes. It works by logging oopses and panics in a circular buffer. Ramoops
needs a system with persistent RAM so that the content of that area can
survive after a restart.
1. Ramoops concepts
Ramoops concepts
----------------
Ramoops uses a predefined memory area to store the dump. The start and size
and type of the memory area are set using three variables:
* "mem_address" for the start
* "mem_size" for the size. The memory size will be rounded down to a
power of two.
* "mem_type" to specifiy if the memory type (default is pgprot_writecombine).
Typically the default value of mem_type=0 should be used as that sets the pstore
mapping to pgprot_writecombine. Setting mem_type=1 attempts to use
pgprot_noncached, which only works on some platforms. This is because pstore
* ``mem_address`` for the start
* ``mem_size`` for the size. The memory size will be rounded down to a
power of two.
* ``mem_type`` to specifiy if the memory type (default is pgprot_writecombine).
Typically the default value of ``mem_type=0`` should be used as that sets the pstore
mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
``pgprot_noncached``, which only works on some platforms. This is because pstore
depends on atomic operations. At least on ARM, pgprot_noncached causes the
memory to be mapped strongly ordered, and atomic operations on strongly ordered
memory are implementation defined, and won't work on many ARMs such as omaps.
The memory area is divided into "record_size" chunks (also rounded down to
power of two) and each oops/panic writes a "record_size" chunk of
The memory area is divided into ``record_size`` chunks (also rounded down to
power of two) and each oops/panic writes a ``record_size`` chunk of
information.
Dumping both oopses and panics can be done by setting 1 in the "dump_oops"
Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
variable while setting 0 in that variable dumps only the panics.
The module uses a counter to record multiple dumps but the counter gets reset
@ -43,7 +46,8 @@ This might be useful when a hardware reset was used to bring the machine back
to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat
corrupt, but usually it is restorable.
2. Setting the parameters
Setting the parameters
----------------------
Setting the ramoops parameters can be done in several different manners:
@ -52,12 +56,13 @@ Setting the ramoops parameters can be done in several different manners:
boot and then use the reserved memory for ramoops. For example, assuming a
machine with > 128 MB of memory, the following kernel command line will tell
the kernel to use only the first 128 MB of memory, and place ECC-protected
ramoops region at 128 MB boundary:
"mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1"
ramoops region at 128 MB boundary::
mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1
B. Use Device Tree bindings, as described in
Documentation/device-tree/bindings/reserved-memory/ramoops.txt.
For example:
``Documentation/device-tree/bindings/reserved-memory/admin-guide/ramoops.rst``.
For example::
reserved-memory {
#address-cells = <2>;
@ -75,58 +80,63 @@ Setting the ramoops parameters can be done in several different manners:
C. Use a platform device and set the platform data. The parameters can then
be set through that platform data. An example of doing that is:
#include <linux/pstore_ram.h>
[...]
.. code-block:: c
static struct ramoops_platform_data ramoops_data = {
#include <linux/pstore_ram.h>
[...]
static struct ramoops_platform_data ramoops_data = {
.mem_size = <...>,
.mem_address = <...>,
.mem_type = <...>,
.record_size = <...>,
.dump_oops = <...>,
.ecc = <...>,
};
};
static struct platform_device ramoops_dev = {
static struct platform_device ramoops_dev = {
.name = "ramoops",
.dev = {
.platform_data = &ramoops_data,
},
};
};
[... inside a function ...]
int ret;
[... inside a function ...]
int ret;
ret = platform_device_register(&ramoops_dev);
if (ret) {
ret = platform_device_register(&ramoops_dev);
if (ret) {
printk(KERN_ERR "unable to register platform device\n");
return ret;
}
}
You can specify either RAM memory or peripheral devices' memory. However, when
specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
very early in the architecture code, e.g.:
very early in the architecture code, e.g.::
#include <linux/memblock.h>
#include <linux/memblock.h>
memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
3. Dump format
Dump format
-----------
The data dump begins with a header, currently defined as "====" followed by a
The data dump begins with a header, currently defined as ``====`` followed by a
timestamp and a new line. The dump then continues with the actual data.
4. Reading the data
Reading the data
----------------
The dump data can be read from the pstore filesystem. The format for these
files is "dmesg-ramoops-N", where N is the record number in memory. To delete
files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete
a stored record from RAM, simply unlink the respective pstore file.
5. Persistent function tracing
Persistent function tracing
---------------------------
Persistent function tracing might be useful for debugging software or hardware
related hangs. The functions call chain log is stored in a "ftrace-ramoops"
file. Here is an example of usage:
related hangs. The functions call chain log is stored in a ``ftrace-ramoops``
file. Here is an example of usage::
# mount -t debugfs debugfs /sys/kernel/debug/
# echo 1 > /sys/kernel/debug/pstore/record_ftrace

View File

@ -1,3 +1,8 @@
.. _reportingbugs:
Reporting bugs
++++++++++++++
Background
==========
@ -50,12 +55,13 @@ maintainer replies to you, make sure to 'Reply-all' in order to keep the
public mailing list(s) in the email thread.
If you know which driver is causing issues, you can pass one of the driver
files to the get_maintainer.pl script:
files to the get_maintainer.pl script::
perl scripts/get_maintainer.pl -f <filename>
If it is a security bug, please copy the Security Contact listed in the
MAINTAINERS file. They can help coordinate bugfix and disclosure. See
Documentation/SecurityBugs for more information.
:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` for more information.
If you can't figure out which subsystem caused the issue, you should file
a bug in kernel.org bugzilla and send email to
@ -69,8 +75,9 @@ Tips for reporting bugs
If you haven't reported a bug before, please read:
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
http://www.catb.org/esr/faqs/smart-questions.html
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
http://www.catb.org/esr/faqs/smart-questions.html
It's REALLY important to report bugs that seem unrelated as separate email
threads or separate bugzilla entries. If you report several unrelated
@ -87,7 +94,7 @@ step-by-step instructions for how a user can trigger the bug.
If the failure includes an "OOPS:", take a picture of the screen, capture
a netconsole trace, or type the message from your screen into the bug
report. Please read "Documentation/oops-tracing.txt" before posting your
report. Please read "Documentation/admin-guide/oops-tracing.rst" before posting your
bug report. This explains what you should do with the "Oops" information
to make it useful to the recipient.
@ -99,34 +106,34 @@ relevant to your bug, feel free to exclude it.
First run the ver_linux script included as scripts/ver_linux, which
reports the version of some important subsystems. Run this script with
the command "sh scripts/ver_linux".
the command ``awk -f scripts/ver_linux``.
Use that information to fill in all fields of the bug report form, and
post it to the mailing list with a subject of "PROBLEM: <one line
summary from [1.]>" for easy identification by the developers.
summary from [1.]>" for easy identification by the developers::
[1.] One line summary of the problem:
[2.] Full description of the problem/report:
[3.] Keywords (i.e., modules, networking, kernel):
[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
[4.2.] Kernel .config file:
[5.] Most recent kernel version which did not have the bug:
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)
[7.] A small shell script or example program which triggers the
problem (if possible)
[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
[8.2.] Processor information (from /proc/cpuinfo):
[8.3.] Module information (from /proc/modules):
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
[8.5.] PCI information ('lspci -vvv' as root)
[8.6.] SCSI information (from /proc/scsi/scsi)
[8.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
[X.] Other notes, patches, fixes, workarounds:
[1.] One line summary of the problem:
[2.] Full description of the problem/report:
[3.] Keywords (i.e., modules, networking, kernel):
[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
[4.2.] Kernel .config file:
[5.] Most recent kernel version which did not have the bug:
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
[7.] A small shell script or example program which triggers the
problem (if possible)
[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
[8.2.] Processor information (from /proc/cpuinfo):
[8.3.] Module information (from /proc/modules):
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
[8.5.] PCI information ('lspci -vvv' as root)
[8.6.] SCSI information (from /proc/scsi/scsi)
[8.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
[X.] Other notes, patches, fixes, workarounds:
Follow up
@ -153,7 +160,8 @@ Expectations for kernel maintainers
Linux kernel maintainers are busy, overworked human beings. Some times
they may not be able to address your bug in a day, a week, or two weeks.
If they don't answer your email, they may be on vacation, or at a Linux
conference. Check the conference schedule at LWN.net for more info:
conference. Check the conference schedule at https://LWN.net for more info:
https://lwn.net/Calendar/
In general, kernel maintainers take 1 to 5 business days to respond to

View File

@ -8,8 +8,8 @@ like to know when a security bug is found so that it can be fixed and
disclosed as quickly as possible. Please report security bugs to the
Linux kernel security team.
1) Contact
----------
Contact
-------
The Linux kernel security team can be contacted by email at
<security@kernel.org>. This is a private list of security officers
@ -19,12 +19,12 @@ area maintainers to understand and fix the security vulnerability.
As it is with any bug, the more information provided the easier it
will be to diagnose and fix. Please review the procedure outlined in
REPORTING-BUGS if you are unclear about what information is helpful.
admin-guide/reporting-bugs.rst if you are unclear about what information is helpful.
Any exploit code is very helpful and will not be released without
consent from the reporter unless it has already been made public.
2) Disclosure
-------------
Disclosure
----------
The goal of the Linux kernel security team is to work with the
bug submitter to bug resolution as well as disclosure. We prefer
@ -39,8 +39,8 @@ disclosure is from immediate (esp. if it's already publicly known)
to a few weeks. As a basic default policy, we expect report date to
disclosure date to be on the order of 7 days.
3) Non-disclosure agreements
----------------------------
Non-disclosure agreements
-------------------------
The Linux kernel security team is not a formal body and therefore unable
to enter any non-disclosure agreements.

View File

@ -1,15 +1,21 @@
Linux Serial Console
.. _serial_console:
Linux Serial Console
====================
To use a serial port as console you need to compile the support into your
kernel - by default it is not compiled in. For PC style serial ports
it's the config option next to "Standard/generic (dumb) serial support".
it's the config option next to menu option:
:menuselection:`Character devices --> Serial drivers --> 8250/16550 and compatible serial support --> Console on 8250/16550 and compatible serial port`
You must compile serial support into the kernel and not as a module.
It is possible to specify multiple devices for console output. You can
define a new kernel command line option to select which device(s) to
use for console output.
The format of this option is:
The format of this option is::
console=device,options
@ -28,11 +34,11 @@ The format of this option is:
You can specify multiple console= options on the kernel command line.
Output will appear on all of them. The last device will be used when
you open /dev/console. So, for example:
you open ``/dev/console``. So, for example::
console=ttyS1,9600 console=tty0
defines that opening /dev/console will get you the current foreground
defines that opening ``/dev/console`` will get you the current foreground
virtual console, and kernel messages will appear on both the VGA
console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
@ -44,61 +50,61 @@ first looks for a VGA card and then for a serial port. So if you don't
have a VGA card in your system the first serial port will automatically
become the console.
You will need to create a new device to use /dev/console. The official
/dev/console is now character device 5,1.
You will need to create a new device to use ``/dev/console``. The official
``/dev/console`` is now character device 5,1.
(You can also use a network device as a console. See
Documentation/networking/netconsole.txt for information on that.)
``Documentation/networking/netconsole.txt`` for information on that.)
Here's an example that will use /dev/ttyS1 (COM2) as the console.
Here's an example that will use ``/dev/ttyS1`` (COM2) as the console.
Replace the sample values as needed.
1. Create /dev/console (real console) and /dev/tty0 (master virtual
console):
1. Create ``/dev/console`` (real console) and ``/dev/tty0`` (master virtual
console)::
cd /dev
rm -f console tty0
mknod -m 622 console c 5 1
mknod -m 622 tty0 c 4 0
cd /dev
rm -f console tty0
mknod -m 622 console c 5 1
mknod -m 622 tty0 c 4 0
2. LILO can also take input from a serial device. This is a very
useful option. To tell LILO to use the serial port:
In lilo.conf (global section):
In lilo.conf (global section)::
serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits)
serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits)
3. Adjust to kernel flags for the new kernel,
again in lilo.conf (kernel section)
again in lilo.conf (kernel section)::
append = "console=ttyS1,9600"
append = "console=ttyS1,9600"
4. Make sure a getty runs on the serial port so that you can login to
it once the system is done booting. This is done by adding a line
like this to /etc/inittab (exact syntax depends on your getty):
like this to ``/etc/inittab`` (exact syntax depends on your getty)::
S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
5. Init and /etc/ioctl.save
5. Init and ``/etc/ioctl.save``
Sysvinit remembers its stty settings in a file in /etc, called
`/etc/ioctl.save'. REMOVE THIS FILE before using the serial
Sysvinit remembers its stty settings in a file in ``/etc``, called
``/etc/ioctl.save``. REMOVE THIS FILE before using the serial
console for the first time, because otherwise init will probably
set the baudrate to 38400 (baudrate of the virtual console).
6. /dev/console and X
6. ``/dev/console`` and X
Programs that want to do something with the virtual console usually
open /dev/console. If you have created the new /dev/console device,
open ``/dev/console``. If you have created the new ``/dev/console`` device,
and your console is NOT the virtual console some programs will fail.
Those are programs that want to access the VT interface, and use
/dev/console instead of /dev/tty0. Some of those programs are:
``/dev/console instead of /dev/tty0``. Some of those programs are::
Xfree86, svgalib, gpm, SVGATextMode
Xfree86, svgalib, gpm, SVGATextMode
It should be fixed in modern versions of these programs though.
Note that if you boot without a console= option (or with
console=/dev/tty0), /dev/console is the same as /dev/tty0. In that
case everything will still work.
Note that if you boot without a ``console=`` option (or with
``console=/dev/tty0``), ``/dev/console`` is the same as ``/dev/tty0``.
In that case everything will still work.
7. Thanks

View File

@ -0,0 +1,192 @@
Rules on how to access information in sysfs
===========================================
The kernel-exported sysfs exports internal kernel implementation details
and depends on internal kernel structures and layout. It is agreed upon
by the kernel developers that the Linux kernel does not provide a stable
internal API. Therefore, there are aspects of the sysfs interface that
may not be stable across kernel releases.
To minimize the risk of breaking users of sysfs, which are in most cases
low-level userspace applications, with a new kernel release, the users
of sysfs must follow some rules to use an as-abstract-as-possible way to
access this filesystem. The current udev and HAL programs already
implement this and users are encouraged to plug, if possible, into the
abstractions these programs provide instead of accessing sysfs directly.
But if you really do want or need to access sysfs directly, please follow
the following rules and then your programs should work with future
versions of the sysfs interface.
- Do not use libsysfs
It makes assumptions about sysfs which are not true. Its API does not
offer any abstraction, it exposes all the kernel driver-core
implementation details in its own API. Therefore it is not better than
reading directories and opening the files yourself.
Also, it is not actively maintained, in the sense of reflecting the
current kernel development. The goal of providing a stable interface
to sysfs has failed; it causes more problems than it solves. It
violates many of the rules in this document.
- sysfs is always at ``/sys``
Parsing ``/proc/mounts`` is a waste of time. Other mount points are a
system configuration bug you should not try to solve. For test cases,
possibly support a ``SYSFS_PATH`` environment variable to overwrite the
application's behavior, but never try to search for sysfs. Never try
to mount it, if you are not an early boot script.
- devices are only "devices"
There is no such thing like class-, bus-, physical devices,
interfaces, and such that you can rely on in userspace. Everything is
just simply a "device". Class-, bus-, physical, ... types are just
kernel implementation details which should not be expected by
applications that look for devices in sysfs.
The properties of a device are:
- devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
- identical to the DEVPATH value in the event sent from the kernel
at device creation and removal
- the unique key to the device at that point in time
- the kernel's path to the device directory without the leading
``/sys``, and always starting with a slash
- all elements of a devpath must be real directories. Symlinks
pointing to /sys/devices must always be resolved to their real
target and the target path must be used to access the device.
That way the devpath to the device matches the devpath of the
kernel used at event time.
- using or exposing symlink values as elements in a devpath string
is a bug in the application
- kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
- a directory name, identical to the last element of the devpath
- applications need to handle spaces and characters like ``!`` in
the name
- subsystem (``block``, ``tty``, ``pci``, ...)
- simple string, never a path or a link
- retrieved by reading the "subsystem"-link and using only the
last element of the target path
- driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
- a simple string, which may contain spaces, never a path or a
link
- it is retrieved by reading the "driver"-link and using only the
last element of the target path
- devices which do not have "driver"-link just do not have a
driver; copying the driver value in a child device context is a
bug in the application
- attributes
- the files in the device directory or files below subdirectories
of the same device directory
- accessing attributes reached by a symlink pointing to another device,
like the "device"-link, is a bug in the application
Everything else is just a kernel driver-core implementation detail
that should not be assumed to be stable across kernel releases.
- Properties of parent devices never belong into a child device.
Always look at the parent devices themselves for determining device
context properties. If the device ``eth0`` or ``sda`` does not have a
"driver"-link, then this device does not have a driver. Its value is empty.
Never copy any property of the parent-device into a child-device. Parent
device properties may change dynamically without any notice to the
child device.
- Hierarchy in a single device tree
There is only one valid place in sysfs where hierarchy can be examined
and this is below: ``/sys/devices.``
It is planned that all device directories will end up in the tree
below this directory.
- Classification by subsystem
There are currently three places for classification of devices:
``/sys/block,`` ``/sys/class`` and ``/sys/bus.`` It is planned that these will
not contain any device directories themselves, but only flat lists of
symlinks pointing to the unified ``/sys/devices`` tree.
All three places have completely different rules on how to access
device information. It is planned to merge all three
classification directories into one place at ``/sys/subsystem``,
following the layout of the bus directories. All buses and
classes, including the converted block subsystem, will show up
there.
The devices belonging to a subsystem will create a symlink in the
"devices" directory at ``/sys/subsystem/<name>/devices``,
If ``/sys/subsystem`` exists, ``/sys/bus``, ``/sys/class`` and ``/sys/block``
can be ignored. If it does not exist, you always have to scan all three
places, as the kernel is free to move a subsystem from one place to
the other, as long as the devices are still reachable by the same
subsystem name.
Assuming ``/sys/class/<subsystem>`` and ``/sys/bus/<subsystem>``, or
``/sys/block`` and ``/sys/class/block`` are not interchangeable is a bug in
the application.
- Block
The converted block subsystem at ``/sys/class/block`` or
``/sys/subsystem/block`` will contain the links for disks and partitions
at the same level, never in a hierarchy. Assuming the block subsystem to
contain only disks and not partition devices in the same flat list is
a bug in the application.
- "device"-link and <subsystem>:<kernel name>-links
Never depend on the "device"-link. The "device"-link is a workaround
for the old layout, where class devices are not created in
``/sys/devices/`` like the bus devices. If the link-resolving of a
device directory does not end in ``/sys/devices/``, you can use the
"device"-link to find the parent devices in ``/sys/devices/``, That is the
single valid use of the "device"-link; it must never appear in any
path as an element. Assuming the existence of the "device"-link for
a device in ``/sys/devices/`` is a bug in the application.
Accessing ``/sys/class/net/eth0/device`` is a bug in the application.
Never depend on the class-specific links back to the ``/sys/class``
directory. These links are also a workaround for the design mistake
that class devices are not created in ``/sys/devices.`` If a device
directory does not contain directories for child devices, these links
may be used to find the child devices in ``/sys/class.`` That is the single
valid use of these links; they must never appear in any path as an
element. Assuming the existence of these links for devices which are
real child device directories in the ``/sys/devices`` tree is a bug in
the application.
It is planned to remove all these links when all class device
directories live in ``/sys/devices.``
- Position of devices along device chain can change.
Never depend on a specific parent device position in the devpath,
or the chain of parent devices. The kernel is free to insert devices into
the chain. You must always request the parent device you are looking for
by its subsystem value. You need to walk up the chain until you find
the device that matches the expected subsystem. Depending on a specific
position of a parent device or exposing relative paths using ``../`` to
access the chain of parents is a bug in the application.
- When reading and writing sysfs device attribute files, avoid dependency
on specific error codes wherever possible. This minimizes coupling to
the error handling implementation within the kernel.
In general, failures to read or write sysfs device attributes shall
propagate errors wherever possible. Common errors include, but are not
limited to:
``-EIO``: The read or store operation is not supported, typically
returned by the sysfs system itself if the read or store pointer
is ``NULL``.
``-ENXIO``: The read or store operation failed
Error codes will not be changed without good reason, and should a change
to error codes result in user-space breakage, it will be fixed, or the
the offending change will be reverted.
Userspace applications can, however, expect the format and contents of
the attribute files to remain consistent in the absence of a version
attribute change in the context of a given attribute.

View File

@ -0,0 +1,289 @@
Linux Magic System Request Key Hacks
====================================
Documentation for sysrq.c
What is the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is a 'magical' key combo you can hit which the kernel will respond to
regardless of whatever else it is doing, unless it is completely locked up.
How do I enable the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when
configuring the kernel. When running a kernel with SysRq compiled in,
/proc/sys/kernel/sysrq controls the functions allowed to be invoked via
the SysRq key. The default value in this file is set by the
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE config symbol, which itself defaults
to 1. Here is the list of possible values in /proc/sys/kernel/sysrq:
- 0 - disable sysrq completely
- 1 - enable all functions of sysrq
- >1 - bitmask of allowed sysrq functions (see below for detailed function
description)::
2 = 0x2 - enable control of console logging level
4 = 0x4 - enable control of keyboard (SAK, unraw)
8 = 0x8 - enable debugging dumps of processes etc.
16 = 0x10 - enable sync command
32 = 0x20 - enable remount read-only
64 = 0x40 - enable signalling of processes (term, kill, oom-kill)
128 = 0x80 - allow reboot/poweroff
256 = 0x100 - allow nicing of all RT tasks
You can set the value in the file by the following command::
echo "number" >/proc/sys/kernel/sysrq
The number may be written here either as decimal or as hexadecimal
with the 0x prefix. CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE must always be
written in hexadecimal.
Note that the value of ``/proc/sys/kernel/sysrq`` influences only the invocation
via a keyboard. Invocation of any operation via ``/proc/sysrq-trigger`` is
always allowed (by a user with admin privileges).
How do I use the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On x86 - You press the key combo :kbd:`ALT-SysRq-<command key>`.
.. note::
Some
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
also known as the 'Print Screen' key. Also some keyboards cannot
handle so many keys being pressed at the same time, so you might
have better luck with press :kbd:`Alt`, press :kbd:`SysRq`,
release :kbd:`SysRq`, press :kbd:`<command key>`, release everything.
On SPARC - You press :kbd:`ALT-STOP-<command key>`, I believe.
On the serial console (PC style standard serial ports only)
You send a ``BREAK``, then within 5 seconds a command key. Sending
``BREAK`` twice is interpreted as a normal BREAK.
On PowerPC
Press :kbd:`ALT - Print Screen` (or :kbd:`F13`) - :kbd:`<command key>`,
:kbd:`Print Screen` (or :kbd:`F13`) - :kbd:`<command key>` may suffice.
On other
If you know of the key combos for other architectures, please
let me know so I can add them to this section.
On all
write a character to /proc/sysrq-trigger. e.g.::
echo t > /proc/sysrq-trigger
What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=========== ===================================================================
Command Function
=========== ===================================================================
``b`` Will immediately reboot the system without syncing or unmounting
your disks.
``c`` Will perform a system crash by a NULL pointer dereference.
A crashdump will be taken if configured.
``d`` Shows all locks that are held.
``e`` Send a SIGTERM to all processes, except for init.
``f`` Will call the oom killer to kill a memory hog process, but do not
panic if nothing can be killed.
``g`` Used by kgdb (kernel debugger)
``h`` Will display help (actually any other key than those listed
here will display help. but ``h`` is easy to remember :-)
``i`` Send a SIGKILL to all processes, except for init.
``j`` Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
``k`` Secure Access Key (SAK) Kills all programs on the current virtual
console. NOTE: See important comments below in SAK section.
``l`` Shows a stack backtrace for all active CPUs.
``m`` Will dump current memory info to your console.
``n`` Used to make RT tasks nice-able
``o`` Will shut your system off (if configured and supported).
``p`` Will dump the current registers and flags to your console.
``q`` Will dump per CPU lists of all armed hrtimers (but NOT regular
timer_list timers) and detailed information about all
clockevent devices.
``r`` Turns off keyboard raw mode and sets it to XLATE.
``s`` Will attempt to sync all mounted filesystems.
``t`` Will dump a list of current tasks and their information to your
console.
``u`` Will attempt to remount all mounted filesystems read-only.
``v`` Forcefully restores framebuffer console
``v`` Causes ETM buffer dump [ARM-specific]
``w`` Dumps tasks that are in uninterruptable (blocked) state.
``x`` Used by xmon interface on ppc/powerpc platforms.
Show global PMU Registers on sparc64.
Dump all TLB entries on MIPS.
``y`` Show global CPU Registers [SPARC-64 specific]
``z`` Dump the ftrace buffer
``0``-``9`` Sets the console log level, controlling which kernel messages
will be printed to your console. (``0``, for example would make
it so that only emergency messages like PANICs or OOPSes would
make it to your console.)
=========== ===================================================================
Okay, so what can I use them for?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Well, unraw(r) is very handy when your X server or a svgalib program crashes.
sak(k) (Secure Access Key) is useful when you want to be sure there is no
trojan program running at console which could grab your password
when you would try to login. It will kill all programs on given console,
thus letting you make sure that the login prompt you see is actually
the one from init, not some trojan program.
.. important::
In its true form it is not a true SAK like the one in a
c2 compliant system, and it should not be mistaken as
such.
It seems others find it useful as (System Attention Key) which is
useful when you want to exit a program that will not let you switch consoles.
(For example, X or a svgalib program.)
``reboot(b)`` is good when you're unable to shut down. But you should also
``sync(s)`` and ``umount(u)`` first.
``crash(c)`` can be used to manually trigger a crashdump when the system is hung.
Note that this just triggers a crash if there is no dump mechanism available.
``sync(s)`` is great when your system is locked up, it allows you to sync your
disks and will certainly lessen the chance of data loss and fscking. Note
that the sync hasn't taken place until you see the "OK" and "Done" appear
on the screen. (If the kernel is really in strife, you may not ever get the
OK or Done message...)
``umount(u)`` is basically useful in the same ways as ``sync(s)``. I generally
``sync(s)``, ``umount(u)``, then ``reboot(b)`` when my system locks. It's saved
me many a fsck. Again, the unmount (remount read-only) hasn't taken place until
you see the "OK" and "Done" message appear on the screen.
The loglevels ``0``-``9`` are useful when your console is being flooded with
kernel messages you do not want to see. Selecting ``0`` will prevent all but
the most urgent kernel messages from reaching your console. (They will
still be logged if syslogd/klogd are alive, though.)
``term(e)`` and ``kill(i)`` are useful if you have some sort of runaway process
you are unable to kill any other way, especially if it's spawning other
processes.
"just thaw ``it(j)``" is useful if your system becomes unresponsive due to a
frozen (probably root) filesystem via the FIFREEZE ioctl.
Sometimes SysRq seems to get 'stuck' after using it, what can I do?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
That happens to me, also. I've found that tapping shift, alt, and control
on both sides of the keyboard, and hitting an invalid sysrq sequence again
will fix the problem. (i.e., something like :kbd:`alt-sysrq-z`). Switching to
another virtual console (:kbd:`ALT+Fn`) and then back again should also help.
I hit SysRq, but nothing seems to happen, what's wrong?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are some keyboards that produce a different keycode for SysRq than the
pre-defined value of 99 (see ``KEY_SYSRQ`` in ``include/linux/input.h``), or
which don't have a SysRq key at all. In these cases, run ``showkey -s`` to find
an appropriate scancode sequence, and use ``setkeycodes <sequence> 99`` to map
this sequence to the usual SysRq code (e.g., ``setkeycodes e05b 99``). It's
probably best to put this command in a boot script. Oh, and by the way, you
exit ``showkey`` by not typing anything for ten seconds.
I want to add SysRQ key events to a module, how does it work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to register a basic function with the table, you must first include
the header ``include/linux/sysrq.h``, this will define everything else you need.
Next, you must create a ``sysrq_key_op`` struct, and populate it with A) the key
handler function you will use, B) a help_msg string, that will print when SysRQ
prints help, and C) an action_msg string, that will print right before your
handler is called. Your handler must conform to the prototype in 'sysrq.h'.
After the ``sysrq_key_op`` is created, you can call the kernel function
``register_sysrq_key(int key, struct sysrq_key_op *op_p);`` this will
register the operation pointed to by ``op_p`` at table key 'key',
if that slot in the table is blank. At module unload time, you must call
the function ``unregister_sysrq_key(int key, struct sysrq_key_op *op_p)``, which
will remove the key op pointed to by 'op_p' from the key 'key', if and only if
it is currently registered in that slot. This is in case the slot has been
overwritten since you registered it.
The Magic SysRQ system works by registering key operations against a key op
lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has
a number of operations registered into it at compile time, but is mutable,
and 2 functions are exported for interface to it::
register_sysrq_key and unregister_sysrq_key.
Of course, never ever leave an invalid pointer in the table. I.e., when
your module that called register_sysrq_key() exits, it must call
unregister_sysrq_key() to clean up the sysrq key table entry that it used.
Null pointers in the table are always safe. :)
If for some reason you feel the need to call the handle_sysrq function from
within a function called by handle_sysrq, you must be aware that you are in
a lock (you are also in an interrupt handler, which means don't sleep!), so
you must call ``__handle_sysrq_nolock`` instead.
When I hit a SysRq key combination only the header appears on the console?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sysrq output is subject to the same console loglevel control as all
other console output. This means that if the kernel was booted 'quiet'
as is common on distro kernels the output may not appear on the actual
console, even though it will appear in the dmesg buffer, and be accessible
via the dmesg command and to the consumers of ``/proc/kmsg``. As a specific
exception the header line from the sysrq command is passed to all console
consumers as if the current loglevel was maximum. If only the header
is emitted it is almost certain that the kernel loglevel is too low.
Should you require the output on the console channel then you will need
to temporarily up the console loglevel using :kbd:`alt-sysrq-8` or::
echo 8 > /proc/sysrq-trigger
Remember to return the loglevel to normal after triggering the sysrq
command you are interested in.
I have more questions, who can I ask?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Just ask them on the linux-kernel mailing list:
linux-kernel@vger.kernel.org
Credits
~~~~~~~
Written by Mydraal <vulpyne@vulpyne.net>
Updated by Adam Sulmicki <adam@cfar.umd.edu>
Updated by Jeremy M. Dolan <jmd@turbogeek.org> 2001/01/28 10:15:59
Added to by Crutcher Dunnavant <crutcher+kernel@datastacks.com>

View File

@ -0,0 +1,59 @@
Tainted kernels
---------------
Some oops reports contain the string **'Tainted: '** after the program
counter. This indicates that the kernel has been tainted by some
mechanism. The string is followed by a series of position-sensitive
characters, each representing a particular tainted value.
1) 'G' if all modules loaded have a GPL or compatible license, 'P' if
any proprietary module has been loaded. Modules without a
MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by
insmod as GPL compatible are assumed to be proprietary.
2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
modules were loaded normally.
3) ``S`` if the oops occurred on an SMP kernel running on hardware that
hasn't been certified as safe to run multiprocessor.
Currently this occurs only on various Athlons that are not
SMP capable.
4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
modules were unloaded normally.
5) ``M`` if any processor has reported a Machine Check Exception,
``' '`` if no Machine Check Exceptions have occurred.
6) ``B`` if a page-release function has found a bad page reference or
some unexpected page flags.
7) ``U`` if a user or user application specifically requested that the
Tainted flag be set, ``' '`` otherwise.
8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG.
9) ``A`` if the ACPI table has been overridden.
10) ``W`` if a warning has previously been issued by the kernel.
(Though some warnings may set more specific taint flags.)
11) ``C`` if a staging driver has been loaded.
12) ``I`` if the kernel is working around a severe bug in the platform
firmware (BIOS or similar).
13) ``O`` if an externally-built ("out-of-tree") module has been loaded.
14) ``E`` if an unsigned module has been loaded in a kernel supporting
module signature.
15) ``L`` if a soft lockup has previously occurred on the system.
16) ``K`` if the kernel has been live patched.
The primary reason for the **'Tainted: '** string is to tell kernel
debuggers if this is a clean kernel or if anything unusual has
occurred. Tainting is permanent: even if an offending module is
unloaded, the tainted value remains to indicate that the kernel is not
trustworthy.

View File

@ -1,12 +1,16 @@
Unicode support
===============
Last update: 2005-01-17, version 1.4
This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
of the Linux Assigned Names And Numbers Authority (LANANA) project.
The current version can be found at:
http://www.lanana.org/docs/unicode/unicode.txt
http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
------------------------
Introduction
------------
The Linux kernel code has been rewritten to use Unicode to map
characters to fonts. By downloading a single Unicode-to-font table,
@ -16,12 +20,14 @@ the font as indicated.
This changes the semantics of the eight-bit character tables subtly.
The four character tables are now:
=============== =============================== ================
Map symbol Map name Escape code (G0)
=============== =============================== ================
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
GRAF_MAP DEC VT100 pseudographics ESC ( 0
IBMPC_MAP IBM code page 437 ESC ( U
USER_MAP User defined ESC ( K
=============== =============================== ================
In particular, ESC ( U is no longer "straight to font", since the font
might be completely different than the IBM character set. This
@ -55,10 +61,12 @@ In addition, the following characters not present in Unicode 1.1.4
have been defined; these are used by the DEC VT graphics map. [v1.2]
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
====== ======================================
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
====== ======================================
The DEC VT220 uses a 6x10 character matrix, and these characters form
a smooth progression in the DEC VT graphics character set. I have
@ -74,10 +82,12 @@ keyboard symbols that are unlikely to ever be added to Unicode proper
since they are horribly vendor-specific. This, of course, is an
excellent example of horrible design.
====== ======================================
U+F810 KEYBOARD SYMBOL FLYING FLAG
U+F811 KEYBOARD SYMBOL PULLDOWN MENU
U+F812 KEYBOARD SYMBOL OPEN APPLE
U+F813 KEYBOARD SYMBOL SOLID APPLE
====== ======================================
Klingon language support
------------------------
@ -99,8 +109,10 @@ of the dingbats/symbols/forms type and this is a language, I have
located it at the end, on a 16-cell boundary in keeping with standard
Unicode practice.
NOTE: This range is now officially managed by the ConScript Unicode
Registry. The normative reference is at:
.. note::
This range is now officially managed by the ConScript Unicode
Registry. The normative reference is at:
http://www.evertype.com/standards/csur/klingon.html
@ -112,6 +124,7 @@ However, since the set of symbols appear to be consistent throughout,
with only the actual shapes being different, in keeping with standard
Unicode practice these differences are considered font variants.
====== =======================================================
U+F8D0 KLINGON LETTER A
U+F8D1 KLINGON LETTER B
U+F8D2 KLINGON LETTER CH
@ -155,6 +168,7 @@ U+F8F9 KLINGON DIGIT NINE
U+F8FD KLINGON COMMA
U+F8FE KLINGON FULL STOP
U+F8FF KLINGON SYMBOL FOR EMPIRE
====== =======================================================
Other Fictional and Artificial Scripts
--------------------------------------

View File

@ -0,0 +1,66 @@
Software cursor for VGA
=======================
by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Linux now has some ability to manipulate cursor appearance. Normally, you
can set the size of hardware cursor (and also work around some ugly bugs in
those miserable Trident cards [#f1]_. You can now play a few new tricks:
you can make your cursor look
like a non-blinking red block, make it inverse background of the character it's
over or to highlight that character and still choose whether the original
hardware cursor should remain visible or not. There may be other things I have
never thought of.
The cursor appearance is controlled by a ``<ESC>[?1;2;3c`` escape sequence
where 1, 2 and 3 are parameters described below. If you omit any of them,
they will default to zeroes.
first Parameter
specifies cursor size::
0=default
1=invisible
2=underline,
...
8=full block
+ 16 if you want the software cursor to be applied
+ 32 if you want to always change the background color
+ 64 if you dislike having the background the same as the
foreground.
Highlights are ignored for the last two flags.
second parameter
selects character attribute bits you want to change
(by simply XORing them with the value of this parameter). On standard
VGA, the high four bits specify background and the low four the
foreground. In both groups, low three bits set color (as in normal
color codes used by the console) and the most significant one turns
on highlight (or sometimes blinking -- it depends on the configuration
of your VGA).
third parameter
consists of character attribute bits you want to set.
Bit setting takes place before bit toggling, so you can simply clear a
bit by including it in both the set mask and the toggle mask.
.. [#f1] see ``#define TRIDENT_GLITCH`` in ``drivers/video/vgacon.c``.
Examples
--------
To get normal blinking underline, use::
echo -e '\033[?2c'
To get blinking block, use::
echo -e '\033[?6c'
To get red non-blinking block, use::
echo -e '\033[?17;0;64c'

View File

@ -51,7 +51,7 @@ As an alternative, the boot loader can pass the relevant 'console='
option to the kernel via the tagged lists specifying the port, and
serial format options as described in
Documentation/kernel-parameters.txt.
Documentation/admin-guide/kernel-parameters.rst.
3. Detect the machine type

View File

@ -1,574 +0,0 @@
========================================
GENERIC ASSOCIATIVE ARRAY IMPLEMENTATION
========================================
Contents:
- Overview.
- The public API.
- Edit script.
- Operations table.
- Manipulation functions.
- Access functions.
- Index key form.
- Internal workings.
- Basic internal tree layout.
- Shortcuts.
- Splitting and collapsing nodes.
- Non-recursive iteration.
- Simultaneous alteration and iteration.
========
OVERVIEW
========
This associative array implementation is an object container with the following
properties:
(1) Objects are opaque pointers. The implementation does not care where they
point (if anywhere) or what they point to (if anything).
[!] NOTE: Pointers to objects _must_ be zero in the least significant bit.
(2) Objects do not need to contain linkage blocks for use by the array. This
permits an object to be located in multiple arrays simultaneously.
Rather, the array is made up of metadata blocks that point to objects.
(3) Objects require index keys to locate them within the array.
(4) Index keys must be unique. Inserting an object with the same key as one
already in the array will replace the old object.
(5) Index keys can be of any length and can be of different lengths.
(6) Index keys should encode the length early on, before any variation due to
length is seen.
(7) Index keys can include a hash to scatter objects throughout the array.
(8) The array can iterated over. The objects will not necessarily come out in
key order.
(9) The array can be iterated over whilst it is being modified, provided the
RCU readlock is being held by the iterator. Note, however, under these
circumstances, some objects may be seen more than once. If this is a
problem, the iterator should lock against modification. Objects will not
be missed, however, unless deleted.
(10) Objects in the array can be looked up by means of their index key.
(11) Objects can be looked up whilst the array is being modified, provided the
RCU readlock is being held by the thread doing the look up.
The implementation uses a tree of 16-pointer nodes internally that are indexed
on each level by nibbles from the index key in the same manner as in a radix
tree. To improve memory efficiency, shortcuts can be emplaced to skip over
what would otherwise be a series of single-occupancy nodes. Further, nodes
pack leaf object pointers into spare space in the node rather than making an
extra branch until as such time an object needs to be added to a full node.
==============
THE PUBLIC API
==============
The public API can be found in <linux/assoc_array.h>. The associative array is
rooted on the following structure:
struct assoc_array {
...
};
The code is selected by enabling CONFIG_ASSOCIATIVE_ARRAY.
EDIT SCRIPT
-----------
The insertion and deletion functions produce an 'edit script' that can later be
applied to effect the changes without risking ENOMEM. This retains the
preallocated metadata blocks that will be installed in the internal tree and
keeps track of the metadata blocks that will be removed from the tree when the
script is applied.
This is also used to keep track of dead blocks and dead objects after the
script has been applied so that they can be freed later. The freeing is done
after an RCU grace period has passed - thus allowing access functions to
proceed under the RCU read lock.
The script appears as outside of the API as a pointer of the type:
struct assoc_array_edit;
There are two functions for dealing with the script:
(1) Apply an edit script.
void assoc_array_apply_edit(struct assoc_array_edit *edit);
This will perform the edit functions, interpolating various write barriers
to permit accesses under the RCU read lock to continue. The edit script
will then be passed to call_rcu() to free it and any dead stuff it points
to.
(2) Cancel an edit script.
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
This frees the edit script and all preallocated memory immediately. If
this was for insertion, the new object is _not_ released by this function,
but must rather be released by the caller.
These functions are guaranteed not to fail.
OPERATIONS TABLE
----------------
Various functions take a table of operations:
struct assoc_array_ops {
...
};
This points to a number of methods, all of which need to be provided:
(1) Get a chunk of index key from caller data:
unsigned long (*get_key_chunk)(const void *index_key, int level);
This should return a chunk of caller-supplied index key starting at the
*bit* position given by the level argument. The level argument will be a
multiple of ASSOC_ARRAY_KEY_CHUNK_SIZE and the function should return
ASSOC_ARRAY_KEY_CHUNK_SIZE bits. No error is possible.
(2) Get a chunk of an object's index key.
unsigned long (*get_object_key_chunk)(const void *object, int level);
As the previous function, but gets its data from an object in the array
rather than from a caller-supplied index key.
(3) See if this is the object we're looking for.
bool (*compare_object)(const void *object, const void *index_key);
Compare the object against an index key and return true if it matches and
false if it doesn't.
(4) Diff the index keys of two objects.
int (*diff_objects)(const void *object, const void *index_key);
Return the bit position at which the index key of the specified object
differs from the given index key or -1 if they are the same.
(5) Free an object.
void (*free_object)(void *object);
Free the specified object. Note that this may be called an RCU grace
period after assoc_array_apply_edit() was called, so synchronize_rcu() may
be necessary on module unloading.
MANIPULATION FUNCTIONS
----------------------
There are a number of functions for manipulating an associative array:
(1) Initialise an associative array.
void assoc_array_init(struct assoc_array *array);
This initialises the base structure for an associative array. It can't
fail.
(2) Insert/replace an object in an associative array.
struct assoc_array_edit *
assoc_array_insert(struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key,
void *object);
This inserts the given object into the array. Note that the least
significant bit of the pointer must be zero as it's used to type-mark
pointers internally.
If an object already exists for that key then it will be replaced with the
new object and the old one will be freed automatically.
The index_key argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. -ENOMEM is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
(3) Delete an object from an associative array.
struct assoc_array_edit *
assoc_array_delete(struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key);
This deletes an object that matches the specified data from the array.
The index_key argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. -ENOMEM is returned in the case of
an out-of-memory error. NULL will be returned if the specified object is
not found within the array.
The caller should lock exclusively against other modifiers of the array.
(4) Delete all objects from an associative array.
struct assoc_array_edit *
assoc_array_clear(struct assoc_array *array,
const struct assoc_array_ops *ops);
This deletes all the objects from an associative array and leaves it
completely empty.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. -ENOMEM is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
(5) Destroy an associative array, deleting all objects.
void assoc_array_destroy(struct assoc_array *array,
const struct assoc_array_ops *ops);
This destroys the contents of the associative array and leaves it
completely empty. It is not permitted for another thread to be traversing
the array under the RCU read lock at the same time as this function is
destroying it as no RCU deferral is performed on memory release -
something that would require memory to be allocated.
The caller should lock exclusively against other modifiers and accessors
of the array.
(6) Garbage collect an associative array.
int assoc_array_gc(struct assoc_array *array,
const struct assoc_array_ops *ops,
bool (*iterator)(void *object, void *iterator_data),
void *iterator_data);
This iterates over the objects in an associative array and passes each one
to iterator(). If iterator() returns true, the object is kept. If it
returns false, the object will be freed. If the iterator() function
returns true, it must perform any appropriate refcount incrementing on the
object before returning.
The internal tree will be packed down if possible as part of the iteration
to reduce the number of nodes in it.
The iterator_data is passed directly to iterator() and is otherwise
ignored by the function.
The function will return 0 if successful and -ENOMEM if there wasn't
enough memory.
It is possible for other threads to iterate over or search the array under
the RCU read lock whilst this function is in progress. The caller should
lock exclusively against other modifiers of the array.
ACCESS FUNCTIONS
----------------
There are two functions for accessing an associative array:
(1) Iterate over all the objects in an associative array.
int assoc_array_iterate(const struct assoc_array *array,
int (*iterator)(const void *object,
void *iterator_data),
void *iterator_data);
This passes each object in the array to the iterator callback function.
iterator_data is private data for that function.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held. Under such circumstances,
it is possible for the iteration function to see some objects twice. If
this is a problem, then modification should be locked against. The
iteration algorithm should not, however, miss any objects.
The function will return 0 if no objects were in the array or else it will
return the result of the last iterator function called. Iteration stops
immediately if any call to the iteration function results in a non-zero
return.
(2) Find an object in an associative array.
void *assoc_array_find(const struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key);
This walks through the array's internal tree directly to the object
specified by the index key..
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held.
The function will return the object if found (and set *_type to the object
type) or will return NULL if the object was not found.
INDEX KEY FORM
--------------
The index key can be of any form, but since the algorithms aren't told how long
the key is, it is strongly recommended that the index key includes its length
very early on before any variation due to the length would have an effect on
comparisons.
This will cause leaves with different length keys to scatter away from each
other - and those with the same length keys to cluster together.
It is also recommended that the index key begin with a hash of the rest of the
key to maximise scattering throughout keyspace.
The better the scattering, the wider and lower the internal tree will be.
Poor scattering isn't too much of a problem as there are shortcuts and nodes
can contain mixtures of leaves and metadata pointers.
The index key is read in chunks of machine word. Each chunk is subdivided into
one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is
unlikely that more than one word of any particular index key will have to be
used.
=================
INTERNAL WORKINGS
=================
The associative array data structure has an internal tree. This tree is
constructed of two types of metadata blocks: nodes and shortcuts.
A node is an array of slots. Each slot can contain one of four things:
(*) A NULL pointer, indicating that the slot is empty.
(*) A pointer to an object (a leaf).
(*) A pointer to a node at the next level.
(*) A pointer to a shortcut.
BASIC INTERNAL TREE LAYOUT
--------------------------
Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index
key space is strictly subdivided by the nodes in the tree and nodes occur on
fixed levels. For example:
Level: 0 1 2 3
=============== =============== =============== ===============
NODE D
NODE B NODE C +------>+---+
+------>+---+ +------>+---+ | | 0 |
NODE A | | 0 | | | 0 | | +---+
+---+ | +---+ | +---+ | : :
| 0 | | : : | : : | +---+
+---+ | +---+ | +---+ | | f |
| 1 |---+ | 3 |---+ | 7 |---+ +---+
+---+ +---+ +---+
: : : : | 8 |---+
+---+ +---+ +---+ | NODE E
| e |---+ | f | : : +------>+---+
+---+ | +---+ +---+ | 0 |
| f | | | f | +---+
+---+ | +---+ : :
| NODE F +---+
+------>+---+ | f |
| 0 | NODE G +---+
+---+ +------>+---+
: : | | 0 |
+---+ | +---+
| 6 |---+ : :
+---+ +---+
: : | f |
+---+ +---+
| f |
+---+
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
Assuming no other meta data nodes in the tree, the key space is divided thusly:
KEY PREFIX NODE
========== ====
137* D
138* E
13[0-69-f]* C
1[0-24-f]* B
e6* G
e[0-57-f]* F
[02-df]* A
So, for instance, keys with the following example index keys will be found in
the appropriate nodes:
INDEX KEY PREFIX NODE
=============== ======= ====
13694892892489 13 C
13795289025897 137 D
13889dde88793 138 E
138bbb89003093 138 E
1394879524789 12 C
1458952489 1 B
9431809de993ba - A
b4542910809cd - A
e5284310def98 e F
e68428974237 e6 G
e7fffcbd443 e F
f3842239082 - A
To save memory, if a node can hold all the leaves in its portion of keyspace,
then the node will have all those leaves in it and will not have any metadata
pointers - even if some of those leaves would like to be in the same slot.
A node can contain a heterogeneous mix of leaves and metadata pointers.
Metadata pointers must be in the slots that match their subdivisions of key
space. The leaves can be in any slot not occupied by a metadata pointer. It
is guaranteed that none of the leaves in a node will match a slot occupied by a
metadata pointer. If the metadata pointer is there, any leaf whose key matches
the metadata key prefix must be in the subtree that the metadata pointer points
to.
In the above example list of index keys, node A will contain:
SLOT CONTENT INDEX KEY (PREFIX)
==== =============== ==================
1 PTR TO NODE B 1*
any LEAF 9431809de993ba
any LEAF b4542910809cd
e PTR TO NODE F e*
any LEAF f3842239082
and node B:
3 PTR TO NODE C 13*
any LEAF 1458952489
SHORTCUTS
---------
Shortcuts are metadata records that jump over a piece of keyspace. A shortcut
is a replacement for a series of single-occupancy nodes ascending through the
levels. Shortcuts exist to save memory and to speed up traversal.
It is possible for the root of the tree to be a shortcut - say, for example,
the tree contains at least 17 nodes all with key prefix '1111'. The insertion
algorithm will insert a shortcut to skip over the '1111' keyspace in a single
bound and get to the fourth level where these actually become different.
SPLITTING AND COLLAPSING NODES
------------------------------
Each node has a maximum capacity of 16 leaves and metadata pointers. If the
insertion algorithm finds that it is trying to insert a 17th object into a
node, that node will be split such that at least two leaves that have a common
key segment at that level end up in a separate node rooted on that slot for
that common key segment.
If the leaves in a full node and the leaf that is being inserted are
sufficiently similar, then a shortcut will be inserted into the tree.
When the number of objects in the subtree rooted at a node falls to 16 or
fewer, then the subtree will be collapsed down to a single node - and this will
ripple towards the root if possible.
NON-RECURSIVE ITERATION
-----------------------
Each node and shortcut contains a back pointer to its parent and the number of
slot in that parent that points to it. None-recursive iteration uses these to
proceed rootwards through the tree, going to the parent node, slot N + 1 to
make sure progress is made without the need for a stack.
The backpointers, however, make simultaneous alteration and iteration tricky.
SIMULTANEOUS ALTERATION AND ITERATION
-------------------------------------
There are a number of cases to consider:
(1) Simple insert/replace. This involves simply replacing a NULL or old
matching leaf pointer with the pointer to the new leaf after a barrier.
The metadata blocks don't change otherwise. An old leaf won't be freed
until after the RCU grace period.
(2) Simple delete. This involves just clearing an old matching leaf. The
metadata blocks don't change otherwise. The old leaf won't be freed until
after the RCU grace period.
(3) Insertion replacing part of a subtree that we haven't yet entered. This
may involve replacement of part of that subtree - but that won't affect
the iteration as we won't have reached the pointer to it yet and the
ancestry blocks are not replaced (the layout of those does not change).
(4) Insertion replacing nodes that we're actively processing. This isn't a
problem as we've passed the anchoring pointer and won't switch onto the
new layout until we follow the back pointers - at which point we've
already examined the leaves in the replaced node (we iterate over all the
leaves in a node before following any of its metadata pointers).
We might, however, re-see some leaves that have been split out into a new
branch that's in a slot further along than we were at.
(5) Insertion replacing nodes that we're processing a dependent branch of.
This won't affect us until we follow the back pointers. Similar to (4).
(6) Deletion collapsing a branch under us. This doesn't affect us because the
back pointers will get us back to the parent of the new node before we
could see the new node. The entire collapsed subtree is thrown away
unchanged - and will still be rooted on the same slot, so we shouldn't
process it a second time as we'll go back to slot + 1.
Note:
(*) Under some circumstances, we need to simultaneously change the parent
pointer and the parent slot pointer on a node (say, for example, we
inserted another node before it and moved it up a level). We cannot do
this without locking against a read - so we have to replace that node too.
However, when we're changing a shortcut into a node this isn't a problem
as shortcuts only have one slot and so the parent slot number isn't used
when traversing backwards over one. This means that it's okay to change
the slot number first - provided suitable barriers are used to make sure
the parent slot number is read after the back pointer.
Obsolete blocks and leaves are freed up after an RCU grace period has passed,
so as long as anyone doing walking or iteration holds the RCU read lock, the
old superstructure should not go away on them.

View File

@ -1,45 +0,0 @@
March 2008
Jan-Simon Moeller, dl9pf@gmx.de
How to deal with bad memory e.g. reported by memtest86+ ?
#########################################################
There are three possibilities I know of:
1) Reinsert/swap the memory modules
2) Buy new modules (best!) or try to exchange the memory
if you have spare-parts
3) Use BadRAM or memmap
This Howto is about number 3) .
BadRAM
######
BadRAM is the actively developed and available as kernel-patch
here: http://rick.vanrein.org/linux/badram/
For more details see the BadRAM documentation.
memmap
######
memmap is already in the kernel and usable as kernel-parameter at
boot-time. Its syntax is slightly strange and you may need to
calculate the values by yourself!
Syntax to exclude a memory area (see kernel-parameters.txt for details):
memmap=<size>$<address>
Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
some others. All had 0x1869xxxx in common, so I chose a pattern of
0x18690000,0xffff0000.
With the numbers of the example above:
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000

View File

@ -1,56 +0,0 @@
These instructions are deliberately very basic. If you want something clever,
go read the real docs ;-) Please don't add more stuff, but feel free to
correct my mistakes ;-) (mbligh@aracnet.com)
Thanks to John Levon, Dave Hansen, et al. for help writing this.
<test> is the thing you're trying to measure.
Make sure you have the correct System.map / vmlinux referenced!
It is probably easiest to use "make install" for linux and hack
/sbin/installkernel to copy vmlinux to /boot, in addition to vmlinuz,
config, System.map, which are usually installed by default.
Readprofile
-----------
A recent readprofile command is needed for 2.6, such as found in util-linux
2.12a, which can be downloaded from:
http://www.kernel.org/pub/linux/utils/util-linux/
Most distributions will ship it already.
Add "profile=2" to the kernel command line.
clear readprofile -r
<test>
dump output readprofile -m /boot/System.map > captured_profile
Oprofile
--------
Get the source (see Changes for required version) from
http://oprofile.sourceforge.net/ and add "idle=poll" to the kernel command
line.
Configure with CONFIG_PROFILING=y and CONFIG_OPROFILE=y & reboot on new kernel
./configure --with-kernel-support
make install
For superior results, be sure to enable the local APIC. If opreport sees
a 0Hz CPU, APIC was not on. Be aware that idle=poll may mean a performance
penalty.
One time setup:
opcontrol --setup --vmlinux=/boot/vmlinux
clear opcontrol --reset
start opcontrol --start
<test>
stop opcontrol --stop
dump output opreport > output_file
To only report on the kernel, run opreport -l /boot/vmlinux > output_file
A reset is needed to clear old statistics, which survive a reboot.

View File

@ -1,131 +0,0 @@
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
=====================================================================
This Kernel feature allows you to invoke almost (for restrictions see below)
every program by simply typing its name in the shell.
This includes for example compiled Java(TM), Python or Emacs programs.
To achieve this you must tell binfmt_misc which interpreter has to be invoked
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
at the beginning of the file with a magic byte sequence (masking out specified
bits) you have supplied. Binfmt_misc can also recognise a filename extension
aka '.com' or '.exe'.
First you must mount binfmt_misc:
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
To actually register a new binary type, you have to set up a string looking like
:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':'
upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register.
Here is what the fields mean:
- 'name' is an identifier string. A new /proc file will be created with this
name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious
reasons.
- 'type' is the type of recognition. Give 'M' for magic and 'E' for extension.
- 'offset' is the offset of the magic/mask in the file, counted in bytes. This
defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored
when using filename extension matching.
- 'magic' is the byte sequence binfmt_misc is matching for. The magic string
may contain hex-encoded characters like \x0a or \xA4. Note that you must
escape any NUL bytes; parsing halts at the first one. In a shell environment
you might have to write \\x0a to prevent the shell from eating your \.
If you chose filename extension matching, this is the extension to be
recognised (without the '.', the \x0a specials are not allowed). Extension
matching is case sensitive, and slashes '/' are not allowed!
- 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some
bits from matching by supplying a string like magic and as long as magic.
The mask is anded with the byte sequence of the file. Note that you must
escape any NUL bytes; parsing halts at the first one. Ignored when using
filename extension matching.
- 'interpreter' is the program that should be invoked with the binary as first
argument (specify the full path)
- 'flags' is an optional field that controls several aspects of the invocation
of the interpreter. It is a string of capital letters, each controls a
certain aspect. The following flags are supported -
'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite
the original argv[0] with the full path to the binary. When this
flag is included, binfmt_misc will add an argument to the argument
vector for this purpose, thus preserving the original argv[0].
e.g. If your interp is set to /bin/foo and you run `blah` (which is
in /usr/local/bin), then the kernel will execute /bin/foo with
argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The
interp has to be aware of this so it can execute /usr/local/bin/blah
with argv[] set to ["blah"].
'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path
of the binary to the interpreter as an argument. When this flag is
included, binfmt_misc will open the file for reading and pass its
descriptor as an argument, instead of the full path, thus allowing
the interpreter to execute non-readable binaries. This feature
should be used with care - the interpreter has to be trusted not to
emit the contents of the non-readable binary.
'C' - credentials. Currently, the behavior of binfmt_misc is to calculate
the credentials and security token of the new process according to
the interpreter. When this flag is included, these attributes are
calculated according to the binary. It also implies the 'O' flag.
This feature should be used with care as the interpreter
will run with root permissions when a setuid binary owned by root
is run with binfmt_misc.
'F' - fix binary. The usual behaviour of binfmt_misc is to spawn the
binary lazily when the misc format file is invoked. However,
this doesn't work very well in the face of mount namespaces and
changeroots, so the F mode opens the binary as soon as the
emulation is installed and uses the opened image to spawn the
emulator, meaning it is always available once installed,
regardless of how the environment changes.
There are some restrictions:
- the whole register string may not exceed 1920 characters
- the magic must reside in the first 128 bytes of the file, i.e.
offset+size(magic) has to be less than 128
- the interpreter string may not exceed 127 characters
To use binfmt_misc you have to mount it first. You can mount it with
"mount -t binfmt_misc none /proc/sys/fs/binfmt_misc" command, or you can add
a line "none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0" to your
/etc/fstab so it auto mounts on boot.
You may want to add the binary formats in one of your /etc/rc scripts during
boot-up. Read the manual of your init program to figure out how to do this
right.
Think about the order of adding entries! Later added entries are matched first!
A few examples (assumed you are in /proc/sys/fs/binfmt_misc):
- enable support for em86 (like binfmt_em86, for Alpha AXP only):
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
- enable support for packed DOS applications (pre-configured dosemu hdimages):
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
- enable support for Windows executables using wine:
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
For java support see Documentation/java.txt
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
or 1 (to enable) to /proc/sys/fs/binfmt_misc/status or /proc/.../the_name.
Catting the file tells you the current status of binfmt_misc/the entry.
You can remove one entry or all entries by echoing -1 to /proc/.../the_name
or /proc/sys/fs/binfmt_misc/status.
HINTS:
======
If you want to pass special arguments to your interpreter, you can
write a wrapper script for it. See Documentation/java.txt for an
example.
Your interpreter should NOT look in the PATH for the filename; the kernel
passes it the full filename (or the file descriptor) to use. Using $PATH can
cause unexpected behaviour and can be a security hazard.
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>

View File

@ -184,7 +184,7 @@ infrequently used and the primary purpose of Smart Array controllers is to
act as a RAID controller for disk drives, so the vast majority of commands
are allocated for disk devices. However, if you have more than a few tape
drives attached to a smart array, the default number of commands may not be
enought (for example, if you have 8 tape drives, you could only rewind 6
enough (for example, if you have 8 tape drives, you could only rewind 6
at one time with the default number of commands.) The cciss_tape_cmds module
parameter allows more commands (up to 16 more) to be allocated for use by
tape drives. For example:

View File

@ -14,7 +14,7 @@ Contents:
The RAM disk driver is a way to use main system memory as a block device. It
is required for initrd, an initial filesystem used if you need to load modules
in order to access the root filesystem (see Documentation/initrd.txt). It can
in order to access the root filesystem (see Documentation/admin-guide/initrd.rst). It can
also be used for a temporary filesystem for crypto work, since the contents
are erased on reboot.

View File

@ -1,34 +0,0 @@
Linux Braille Console
To get early boot messages on a braille device (before userspace screen
readers can start), you first need to compile the support for the usual serial
console (see serial-console.txt), and for braille device (in Device Drivers -
Accessibility).
Then you need to specify a console=brl, option on the kernel command line, the
format is:
console=brl,serial_options...
where serial_options... are the same as described in serial-console.txt
So for instance you can use console=brl,ttyS0 if the braille device is connected
to the first serial port, and console=brl,ttyS0,115200 to override the baud rate
to 115200, etc.
By default, the braille device will just show the last kernel message (console
mode). To review previous messages, press the Insert key to switch to the VT
review mode. In review mode, the arrow keys permit to browse in the VT content,
page up/down keys go at the top/bottom of the screen, and the home key goes back
to the cursor, hence providing very basic screen reviewing facility.
Sound feedback can be obtained by adding the braille_console.sound=1 kernel
parameter.
For simplicity, only one braille console can be enabled, other uses of
console=brl,... will be discarded. Also note that it does not interfere with
the console selection mechanism described in serial-console.txt
For now, only the VisioBraille device is supported.
Samuel Thibault <samuel.thibault@ens-lyon.org>

View File

@ -8,7 +8,7 @@ cpuacct.txt
- CPU Accounting Controller; account CPU usage for groups of tasks.
cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
devices.txt
admin-guide/devices.rst
- Device Whitelist Controller; description, interface and security.
freezer-subsystem.txt
- checkpointing; rationale to not use signals, interface.

View File

@ -161,7 +161,7 @@ The producer will look something like this:
unsigned long head = buffer->head;
/* The spin_unlock() and next spin_lock() provide needed ordering. */
unsigned long tail = ACCESS_ONCE(buffer->tail);
unsigned long tail = READ_ONCE(buffer->tail);
if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
/* insert one item into the buffer */
@ -222,7 +222,7 @@ This will instruct the CPU to make sure the index is up to date before reading
the new item, and then it shall make sure the CPU has finished reading the item
before it writes the new tail pointer, which will erase the item.
Note the use of ACCESS_ONCE() and smp_load_acquire() to read the
Note the use of READ_ONCE() and smp_load_acquire() to read the
opposition index. This prevents the compiler from discarding and
reloading its cached value - which some compilers will do across
smp_read_barrier_depends(). This isn't strictly needed if you can

View File

@ -34,10 +34,10 @@ from load_config import loadConfig
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['kernel-doc', 'rstFlatTable', 'kernel_include', 'cdomain']
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain']
# The name of the math extension changed on Sphinx 1.4
if minor > 3:
if major == 1 and minor > 3:
extensions.append("sphinx.ext.imgmath")
else:
extensions.append("sphinx.ext.pngmath")
@ -136,7 +136,7 @@ pygments_style = 'sphinx'
todo_include_todos = False
primary_domain = 'C'
highlight_language = 'guess'
highlight_language = 'none'
# -- Options for HTML output ----------------------------------------------
@ -332,18 +332,32 @@ latex_elements = {
'''
}
# Fix reference escape troubles with Sphinx 1.4.x
if major == 1 and minor > 3:
latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
'The kernel development community', 'manual'),
('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
'The kernel development community', 'manual'),
('core-api/index', 'core-api.tex', 'The kernel core API manual',
'The kernel development community', 'manual'),
('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
'The kernel development community', 'manual'),
('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation',
'The kernel development community', 'manual'),
('development-process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
'The kernel development community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
'The kernel development community', 'manual'),
('security/index', 'security.tex', 'The kernel security subsystem manual',
'The kernel development community', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of

View File

@ -0,0 +1,551 @@
========================================
Generic Associative Array Implementation
========================================
Overview
========
This associative array implementation is an object container with the following
properties:
1. Objects are opaque pointers. The implementation does not care where they
point (if anywhere) or what they point to (if anything).
.. note:: Pointers to objects _must_ be zero in the least significant bit.
2. Objects do not need to contain linkage blocks for use by the array. This
permits an object to be located in multiple arrays simultaneously.
Rather, the array is made up of metadata blocks that point to objects.
3. Objects require index keys to locate them within the array.
4. Index keys must be unique. Inserting an object with the same key as one
already in the array will replace the old object.
5. Index keys can be of any length and can be of different lengths.
6. Index keys should encode the length early on, before any variation due to
length is seen.
7. Index keys can include a hash to scatter objects throughout the array.
8. The array can iterated over. The objects will not necessarily come out in
key order.
9. The array can be iterated over whilst it is being modified, provided the
RCU readlock is being held by the iterator. Note, however, under these
circumstances, some objects may be seen more than once. If this is a
problem, the iterator should lock against modification. Objects will not
be missed, however, unless deleted.
10. Objects in the array can be looked up by means of their index key.
11. Objects can be looked up whilst the array is being modified, provided the
RCU readlock is being held by the thread doing the look up.
The implementation uses a tree of 16-pointer nodes internally that are indexed
on each level by nibbles from the index key in the same manner as in a radix
tree. To improve memory efficiency, shortcuts can be emplaced to skip over
what would otherwise be a series of single-occupancy nodes. Further, nodes
pack leaf object pointers into spare space in the node rather than making an
extra branch until as such time an object needs to be added to a full node.
The Public API
==============
The public API can be found in ``<linux/assoc_array.h>``. The associative
array is rooted on the following structure::
struct assoc_array {
...
};
The code is selected by enabling ``CONFIG_ASSOCIATIVE_ARRAY`` with::
./script/config -e ASSOCIATIVE_ARRAY
Edit Script
-----------
The insertion and deletion functions produce an 'edit script' that can later be
applied to effect the changes without risking ``ENOMEM``. This retains the
preallocated metadata blocks that will be installed in the internal tree and
keeps track of the metadata blocks that will be removed from the tree when the
script is applied.
This is also used to keep track of dead blocks and dead objects after the
script has been applied so that they can be freed later. The freeing is done
after an RCU grace period has passed - thus allowing access functions to
proceed under the RCU read lock.
The script appears as outside of the API as a pointer of the type::
struct assoc_array_edit;
There are two functions for dealing with the script:
1. Apply an edit script::
void assoc_array_apply_edit(struct assoc_array_edit *edit);
This will perform the edit functions, interpolating various write barriers
to permit accesses under the RCU read lock to continue. The edit script
will then be passed to ``call_rcu()`` to free it and any dead stuff it points
to.
2. Cancel an edit script::
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
This frees the edit script and all preallocated memory immediately. If
this was for insertion, the new object is _not_ released by this function,
but must rather be released by the caller.
These functions are guaranteed not to fail.
Operations Table
----------------
Various functions take a table of operations::
struct assoc_array_ops {
...
};
This points to a number of methods, all of which need to be provided:
1. Get a chunk of index key from caller data::
unsigned long (*get_key_chunk)(const void *index_key, int level);
This should return a chunk of caller-supplied index key starting at the
*bit* position given by the level argument. The level argument will be a
multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
2. Get a chunk of an object's index key::
unsigned long (*get_object_key_chunk)(const void *object, int level);
As the previous function, but gets its data from an object in the array
rather than from a caller-supplied index key.
3. See if this is the object we're looking for::
bool (*compare_object)(const void *object, const void *index_key);
Compare the object against an index key and return ``true`` if it matches and
``false`` if it doesn't.
4. Diff the index keys of two objects::
int (*diff_objects)(const void *object, const void *index_key);
Return the bit position at which the index key of the specified object
differs from the given index key or -1 if they are the same.
5. Free an object::
void (*free_object)(void *object);
Free the specified object. Note that this may be called an RCU grace period
after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be
necessary on module unloading.
Manipulation Functions
----------------------
There are a number of functions for manipulating an associative array:
1. Initialise an associative array::
void assoc_array_init(struct assoc_array *array);
This initialises the base structure for an associative array. It can't fail.
2. Insert/replace an object in an associative array::
struct assoc_array_edit *
assoc_array_insert(struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key,
void *object);
This inserts the given object into the array. Note that the least
significant bit of the pointer must be zero as it's used to type-mark
pointers internally.
If an object already exists for that key then it will be replaced with the
new object and the old one will be freed automatically.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
3. Delete an object from an associative array::
struct assoc_array_edit *
assoc_array_delete(struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key);
This deletes an object that matches the specified data from the array.
The ``index_key`` argument should hold index key information and is
passed to the methods in the ops table when they are called.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error. ``NULL`` will be returned if the specified object is
not found within the array.
The caller should lock exclusively against other modifiers of the array.
4. Delete all objects from an associative array::
struct assoc_array_edit *
assoc_array_clear(struct assoc_array *array,
const struct assoc_array_ops *ops);
This deletes all the objects from an associative array and leaves it
completely empty.
This function makes no alteration to the array itself, but rather returns
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
an out-of-memory error.
The caller should lock exclusively against other modifiers of the array.
5. Destroy an associative array, deleting all objects::
void assoc_array_destroy(struct assoc_array *array,
const struct assoc_array_ops *ops);
This destroys the contents of the associative array and leaves it
completely empty. It is not permitted for another thread to be traversing
the array under the RCU read lock at the same time as this function is
destroying it as no RCU deferral is performed on memory release -
something that would require memory to be allocated.
The caller should lock exclusively against other modifiers and accessors
of the array.
6. Garbage collect an associative array::
int assoc_array_gc(struct assoc_array *array,
const struct assoc_array_ops *ops,
bool (*iterator)(void *object, void *iterator_data),
void *iterator_data);
This iterates over the objects in an associative array and passes each one to
``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it
returns ``false``, the object will be freed. If the ``iterator()`` function
returns ``true``, it must perform any appropriate refcount incrementing on the
object before returning.
The internal tree will be packed down if possible as part of the iteration
to reduce the number of nodes in it.
The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
ignored by the function.
The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
enough memory.
It is possible for other threads to iterate over or search the array under
the RCU read lock whilst this function is in progress. The caller should
lock exclusively against other modifiers of the array.
Access Functions
----------------
There are two functions for accessing an associative array:
1. Iterate over all the objects in an associative array::
int assoc_array_iterate(const struct assoc_array *array,
int (*iterator)(const void *object,
void *iterator_data),
void *iterator_data);
This passes each object in the array to the iterator callback function.
``iterator_data`` is private data for that function.
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held. Under such circumstances,
it is possible for the iteration function to see some objects twice. If
this is a problem, then modification should be locked against. The
iteration algorithm should not, however, miss any objects.
The function will return ``0`` if no objects were in the array or else it will
return the result of the last iterator function called. Iteration stops
immediately if any call to the iteration function results in a non-zero
return.
2. Find an object in an associative array::
void *assoc_array_find(const struct assoc_array *array,
const struct assoc_array_ops *ops,
const void *index_key);
This walks through the array's internal tree directly to the object
specified by the index key..
This may be used on an array at the same time as the array is being
modified, provided the RCU read lock is held.
The function will return the object if found (and set ``*_type`` to the object
type) or will return ``NULL`` if the object was not found.
Index Key Form
--------------
The index key can be of any form, but since the algorithms aren't told how long
the key is, it is strongly recommended that the index key includes its length
very early on before any variation due to the length would have an effect on
comparisons.
This will cause leaves with different length keys to scatter away from each
other - and those with the same length keys to cluster together.
It is also recommended that the index key begin with a hash of the rest of the
key to maximise scattering throughout keyspace.
The better the scattering, the wider and lower the internal tree will be.
Poor scattering isn't too much of a problem as there are shortcuts and nodes
can contain mixtures of leaves and metadata pointers.
The index key is read in chunks of machine word. Each chunk is subdivided into
one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is
unlikely that more than one word of any particular index key will have to be
used.
Internal Workings
=================
The associative array data structure has an internal tree. This tree is
constructed of two types of metadata blocks: nodes and shortcuts.
A node is an array of slots. Each slot can contain one of four things:
* A NULL pointer, indicating that the slot is empty.
* A pointer to an object (a leaf).
* A pointer to a node at the next level.
* A pointer to a shortcut.
Basic Internal Tree Layout
--------------------------
Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index
key space is strictly subdivided by the nodes in the tree and nodes occur on
fixed levels. For example::
Level: 0 1 2 3
=============== =============== =============== ===============
NODE D
NODE B NODE C +------>+---+
+------>+---+ +------>+---+ | | 0 |
NODE A | | 0 | | | 0 | | +---+
+---+ | +---+ | +---+ | : :
| 0 | | : : | : : | +---+
+---+ | +---+ | +---+ | | f |
| 1 |---+ | 3 |---+ | 7 |---+ +---+
+---+ +---+ +---+
: : : : | 8 |---+
+---+ +---+ +---+ | NODE E
| e |---+ | f | : : +------>+---+
+---+ | +---+ +---+ | 0 |
| f | | | f | +---+
+---+ | +---+ : :
| NODE F +---+
+------>+---+ | f |
| 0 | NODE G +---+
+---+ +------>+---+
: : | | 0 |
+---+ | +---+
| 6 |---+ : :
+---+ +---+
: : | f |
+---+ +---+
| f |
+---+
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
Assuming no other meta data nodes in the tree, the key space is divided
thusly::
KEY PREFIX NODE
========== ====
137* D
138* E
13[0-69-f]* C
1[0-24-f]* B
e6* G
e[0-57-f]* F
[02-df]* A
So, for instance, keys with the following example index keys will be found in
the appropriate nodes::
INDEX KEY PREFIX NODE
=============== ======= ====
13694892892489 13 C
13795289025897 137 D
13889dde88793 138 E
138bbb89003093 138 E
1394879524789 12 C
1458952489 1 B
9431809de993ba - A
b4542910809cd - A
e5284310def98 e F
e68428974237 e6 G
e7fffcbd443 e F
f3842239082 - A
To save memory, if a node can hold all the leaves in its portion of keyspace,
then the node will have all those leaves in it and will not have any metadata
pointers - even if some of those leaves would like to be in the same slot.
A node can contain a heterogeneous mix of leaves and metadata pointers.
Metadata pointers must be in the slots that match their subdivisions of key
space. The leaves can be in any slot not occupied by a metadata pointer. It
is guaranteed that none of the leaves in a node will match a slot occupied by a
metadata pointer. If the metadata pointer is there, any leaf whose key matches
the metadata key prefix must be in the subtree that the metadata pointer points
to.
In the above example list of index keys, node A will contain::
SLOT CONTENT INDEX KEY (PREFIX)
==== =============== ==================
1 PTR TO NODE B 1*
any LEAF 9431809de993ba
any LEAF b4542910809cd
e PTR TO NODE F e*
any LEAF f3842239082
and node B::
3 PTR TO NODE C 13*
any LEAF 1458952489
Shortcuts
---------
Shortcuts are metadata records that jump over a piece of keyspace. A shortcut
is a replacement for a series of single-occupancy nodes ascending through the
levels. Shortcuts exist to save memory and to speed up traversal.
It is possible for the root of the tree to be a shortcut - say, for example,
the tree contains at least 17 nodes all with key prefix ``1111``. The
insertion algorithm will insert a shortcut to skip over the ``1111`` keyspace
in a single bound and get to the fourth level where these actually become
different.
Splitting And Collapsing Nodes
------------------------------
Each node has a maximum capacity of 16 leaves and metadata pointers. If the
insertion algorithm finds that it is trying to insert a 17th object into a
node, that node will be split such that at least two leaves that have a common
key segment at that level end up in a separate node rooted on that slot for
that common key segment.
If the leaves in a full node and the leaf that is being inserted are
sufficiently similar, then a shortcut will be inserted into the tree.
When the number of objects in the subtree rooted at a node falls to 16 or
fewer, then the subtree will be collapsed down to a single node - and this will
ripple towards the root if possible.
Non-Recursive Iteration
-----------------------
Each node and shortcut contains a back pointer to its parent and the number of
slot in that parent that points to it. None-recursive iteration uses these to
proceed rootwards through the tree, going to the parent node, slot N + 1 to
make sure progress is made without the need for a stack.
The backpointers, however, make simultaneous alteration and iteration tricky.
Simultaneous Alteration And Iteration
-------------------------------------
There are a number of cases to consider:
1. Simple insert/replace. This involves simply replacing a NULL or old
matching leaf pointer with the pointer to the new leaf after a barrier.
The metadata blocks don't change otherwise. An old leaf won't be freed
until after the RCU grace period.
2. Simple delete. This involves just clearing an old matching leaf. The
metadata blocks don't change otherwise. The old leaf won't be freed until
after the RCU grace period.
3. Insertion replacing part of a subtree that we haven't yet entered. This
may involve replacement of part of that subtree - but that won't affect
the iteration as we won't have reached the pointer to it yet and the
ancestry blocks are not replaced (the layout of those does not change).
4. Insertion replacing nodes that we're actively processing. This isn't a
problem as we've passed the anchoring pointer and won't switch onto the
new layout until we follow the back pointers - at which point we've
already examined the leaves in the replaced node (we iterate over all the
leaves in a node before following any of its metadata pointers).
We might, however, re-see some leaves that have been split out into a new
branch that's in a slot further along than we were at.
5. Insertion replacing nodes that we're processing a dependent branch of.
This won't affect us until we follow the back pointers. Similar to (4).
6. Deletion collapsing a branch under us. This doesn't affect us because the
back pointers will get us back to the parent of the new node before we
could see the new node. The entire collapsed subtree is thrown away
unchanged - and will still be rooted on the same slot, so we shouldn't
process it a second time as we'll go back to slot + 1.
.. note::
Under some circumstances, we need to simultaneously change the parent
pointer and the parent slot pointer on a node (say, for example, we
inserted another node before it and moved it up a level). We cannot do
this without locking against a read - so we have to replace that node too.
However, when we're changing a shortcut into a node this isn't a problem
as shortcuts only have one slot and so the parent slot number isn't used
when traversing backwards over one. This means that it's okay to change
the slot number first - provided suitable barriers are used to make sure
the parent slot number is read after the back pointer.
Obsolete blocks and leaves are freed up after an RCU grace period has passed,
so as long as anyone doing walking or iteration holds the RCU read lock, the
old superstructure should not go away on them.

View File

@ -1,36 +1,42 @@
Semantics and Behavior of Atomic and
Bitmask Operations
=======================================================
Semantics and Behavior of Atomic and Bitmask Operations
=======================================================
David S. Miller
:Author: David S. Miller
This document is intended to serve as a guide to Linux port
This document is intended to serve as a guide to Linux port
maintainers on how to implement atomic counter, bitops, and spinlock
interfaces properly.
The atomic_t type should be defined as a signed integer and
Atomic Type And Operations
==========================
The atomic_t type should be defined as a signed integer and
the atomic_long_t type as a signed long integer. Also, they should
be made opaque such that any kind of cast to a normal C integer type
will fail. Something like the following should suffice:
will fail. Something like the following should suffice::
typedef struct { int counter; } atomic_t;
typedef struct { long counter; } atomic_long_t;
Historically, counter has been declared volatile. This is now discouraged.
See Documentation/volatile-considered-harmful.txt for the complete rationale.
See :ref:`Documentation/process/volatile-considered-harmful.rst
<volatile_considered_harmful>` for the complete rationale.
local_t is very similar to atomic_t. If the counter is per CPU and only
updated by one CPU, local_t is probably more appropriate. Please see
Documentation/local_ops.txt for the semantics of local_t.
:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of
local_t.
The first operations to implement for atomic_t's are the initializers and
plain reads.
plain reads. ::
#define ATOMIC_INIT(i) { (i) }
#define atomic_set(v, i) ((v)->counter = (i))
The first macro is used in definitions, such as:
The first macro is used in definitions, such as::
static atomic_t my_counter = ATOMIC_INIT(1);
static atomic_t my_counter = ATOMIC_INIT(1);
The initializer is atomic in that the return values of the atomic operations
are guaranteed to be correct reflecting the initialized value if the
@ -38,10 +44,10 @@ initializer is used before runtime. If the initializer is used at runtime, a
proper implicit or explicit read memory barrier is needed before reading the
value with atomic_read from another thread.
As with all of the atomic_ interfaces, replace the leading "atomic_"
with "atomic_long_" to operate on atomic_long_t.
As with all of the ``atomic_`` interfaces, replace the leading ``atomic_``
with ``atomic_long_`` to operate on atomic_long_t.
The second interface can be used at runtime, as in:
The second interface can be used at runtime, as in::
struct foo { atomic_t counter; };
...
@ -59,7 +65,7 @@ been set with this operation or set with another operation. A proper implicit
or explicit memory barrier is needed before the value set with the operation
is guaranteed to be readable with atomic_read from another thread.
Next, we have:
Next, we have::
#define atomic_read(v) ((v)->counter)
@ -73,36 +79,37 @@ initialization by any other thread is visible yet, so the user of the
interface must take care of that with a proper implicit or explicit memory
barrier.
*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
.. warning::
Some architectures may choose to use the volatile keyword, barriers, or inline
assembly to guarantee some degree of immediacy for atomic_read() and
atomic_set(). This is not uniformly guaranteed, and may change in the future,
so all users of atomic_t should treat atomic_read() and atomic_set() as simple
C statements that may be reordered or optimized away entirely by the compiler
or processor, and explicitly invoke the appropriate compiler and/or memory
barrier for each use case. Failure to do so will result in code that may
suddenly break when used with different architectures or compiler
optimizations, or even changes in unrelated code which changes how the
compiler optimizes the section accessing atomic_t variables.
``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!
*** YOU HAVE BEEN WARNED! ***
Some architectures may choose to use the volatile keyword, barriers, or
inline assembly to guarantee some degree of immediacy for atomic_read()
and atomic_set(). This is not uniformly guaranteed, and may change in
the future, so all users of atomic_t should treat atomic_read() and
atomic_set() as simple C statements that may be reordered or optimized
away entirely by the compiler or processor, and explicitly invoke the
appropriate compiler and/or memory barrier for each use case. Failure
to do so will result in code that may suddenly break when used with
different architectures or compiler optimizations, or even changes in
unrelated code which changes how the compiler optimizes the section
accessing atomic_t variables.
Properly aligned pointers, longs, ints, and chars (and unsigned
equivalents) may be atomically loaded from and stored to in the same
sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
macro should be used to prevent the compiler from using optimizations
that might otherwise optimize accesses out of existence on the one hand,
or that might create unsolicited accesses on the other.
sense as described for atomic_read() and atomic_set(). The READ_ONCE()
and WRITE_ONCE() macros should be used to prevent the compiler from using
optimizations that might otherwise optimize accesses out of existence on
the one hand, or that might create unsolicited accesses on the other.
For example consider the following code:
For example consider the following code::
while (a > 0)
do_something();
If the compiler can prove that do_something() does not store to the
variable a, then the compiler is within its rights transforming this to
the following:
the following::
tmp = a;
if (a > 0)
@ -110,14 +117,14 @@ the following:
do_something();
If you don't want the compiler to do this (and you probably don't), then
you should use something like the following:
you should use something like the following::
while (ACCESS_ONCE(a) < 0)
while (READ_ONCE(a) < 0)
do_something();
Alternatively, you could place a barrier() call in the loop.
For another example, consider the following code:
For another example, consider the following code::
tmp_a = a;
do_something_with(tmp_a);
@ -125,7 +132,7 @@ For another example, consider the following code:
If the compiler can prove that do_something_with() does not store to the
variable a, then the compiler is within its rights to manufacture an
additional load as follows:
additional load as follows::
tmp_a = a;
do_something_with(tmp_a);
@ -139,15 +146,15 @@ The compiler would be likely to manufacture this additional load if
do_something_with() was an inline function that made very heavy use
of registers: reloading from variable a could save a flush to the
stack and later reload. To prevent the compiler from attacking your
code in this manner, write the following:
code in this manner, write the following::
tmp_a = ACCESS_ONCE(a);
tmp_a = READ_ONCE(a);
do_something_with(tmp_a);
do_something_else_with(tmp_a);
For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
and never changed later, so that memory barriers are not needed:
and never changed later, so that memory barriers are not needed::
if (a)
b = 9;
@ -155,7 +162,7 @@ and never changed later, so that memory barriers are not needed:
b = 42;
The compiler is within its rights to manufacture an additional store
by transforming the above code into the following:
by transforming the above code into the following::
b = 42;
if (a)
@ -163,20 +170,22 @@ by transforming the above code into the following:
This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero. To prevent
the compiler from doing this, write something like:
the compiler from doing this, write something like::
if (a)
ACCESS_ONCE(b) = 9;
WRITE_ONCE(b, 9);
else
ACCESS_ONCE(b) = 42;
WRITE_ONCE(b, 42);
Don't even -think- about doing this without proper use of memory barriers,
locks, or atomic operations if variable a can change at runtime!
*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
.. warning::
``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!
Now, we move onto the atomic operation interfaces typically implemented with
the help of assembly code.
the help of assembly code. ::
void atomic_add(int i, atomic_t *v);
void atomic_sub(int i, atomic_t *v);
@ -192,7 +201,7 @@ One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers. They need only perform the
atomic_t counter update in an SMP safe manner.
Next, we have:
Next, we have::
int atomic_inc_return(atomic_t *v);
int atomic_dec_return(atomic_t *v);
@ -214,7 +223,7 @@ If the atomic instructions used in an implementation provide explicit
memory barrier semantics which satisfy the above requirements, that is
fine as well.
Let's move on:
Let's move on::
int atomic_add_return(int i, atomic_t *v);
int atomic_sub_return(int i, atomic_t *v);
@ -224,7 +233,7 @@ explicit counter adjustment is given instead of the implicit "1".
This means that like atomic_{inc,dec}_return(), the memory barrier
semantics are required.
Next:
Next::
int atomic_inc_and_test(atomic_t *v);
int atomic_dec_and_test(atomic_t *v);
@ -234,13 +243,13 @@ given atomic counter. They return a boolean indicating whether the
resulting counter value was zero or not.
Again, these primitives provide explicit memory barrier semantics around
the atomic operation.
the atomic operation::
int atomic_sub_and_test(int i, atomic_t *v);
This is identical to atomic_dec_and_test() except that an explicit
decrement is given instead of the implicit "1". This primitive must
provide explicit memory barrier semantics around the operation.
provide explicit memory barrier semantics around the operation::
int atomic_add_negative(int i, atomic_t *v);
@ -249,7 +258,7 @@ is return which indicates whether the resulting counter value is negative.
This primitive must provide explicit memory barrier semantics around
the operation.
Then:
Then::
int atomic_xchg(atomic_t *v, int new);
@ -257,14 +266,14 @@ This performs an atomic exchange operation on the atomic variable v, setting
the given new value. It returns the old value that the atomic variable v had
just before the operation.
atomic_xchg must provide explicit memory barriers around the operation.
atomic_xchg must provide explicit memory barriers around the operation. ::
int atomic_cmpxchg(atomic_t *v, int old, int new);
This performs an atomic compare exchange operation on the atomic value v,
with the given old and new values. Like all atomic_xxx operations,
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
other accesses of *v are performed through atomic_xxx operations.
other accesses of \*v are performed through atomic_xxx operations.
atomic_cmpxchg must provide explicit memory barriers around the operation,
although if the comparison fails then no memory ordering guarantees are
@ -273,7 +282,7 @@ required.
The semantics for atomic_cmpxchg are the same as those defined for 'cas'
below.
Finally:
Finally::
int atomic_add_unless(atomic_t *v, int a, int u);
@ -289,12 +298,12 @@ atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, a set of interfaces are
defined which accomplish this:
defined which accomplish this::
void smp_mb__before_atomic(void);
void smp_mb__after_atomic(void);
For example, smp_mb__before_atomic() can be used like so:
For example, smp_mb__before_atomic() can be used like so::
obj->dead = 1;
smp_mb__before_atomic();
@ -315,67 +324,69 @@ atomic_t implementation above can have disastrous results. Here is
an example, which follows a pattern occurring frequently in the Linux
kernel. It is the use of atomic counters to implement reference
counting, and it works such that once the counter falls to zero it can
be guaranteed that no other entity can be accessing the object:
be guaranteed that no other entity can be accessing the object::
static void obj_list_add(struct obj *obj, struct list_head *head)
{
obj->active = 1;
list_add(&obj->list, head);
}
static void obj_list_add(struct obj *obj, struct list_head *head)
{
obj->active = 1;
list_add(&obj->list, head);
}
static void obj_list_del(struct obj *obj)
{
list_del(&obj->list);
obj->active = 0;
}
static void obj_list_del(struct obj *obj)
{
list_del(&obj->list);
obj->active = 0;
}
static void obj_destroy(struct obj *obj)
{
BUG_ON(obj->active);
kfree(obj);
}
static void obj_destroy(struct obj *obj)
{
BUG_ON(obj->active);
kfree(obj);
}
struct obj *obj_list_peek(struct list_head *head)
{
if (!list_empty(head)) {
struct obj *obj_list_peek(struct list_head *head)
{
if (!list_empty(head)) {
struct obj *obj;
obj = list_entry(head->next, struct obj, list);
atomic_inc(&obj->refcnt);
return obj;
}
return NULL;
}
void obj_poke(void)
{
struct obj *obj;
obj = list_entry(head->next, struct obj, list);
atomic_inc(&obj->refcnt);
return obj;
spin_lock(&global_list_lock);
obj = obj_list_peek(&global_list);
spin_unlock(&global_list_lock);
if (obj) {
obj->ops->poke(obj);
if (atomic_dec_and_test(&obj->refcnt))
obj_destroy(obj);
}
}
return NULL;
}
void obj_poke(void)
{
struct obj *obj;
void obj_timeout(struct obj *obj)
{
spin_lock(&global_list_lock);
obj_list_del(obj);
spin_unlock(&global_list_lock);
spin_lock(&global_list_lock);
obj = obj_list_peek(&global_list);
spin_unlock(&global_list_lock);
if (obj) {
obj->ops->poke(obj);
if (atomic_dec_and_test(&obj->refcnt))
obj_destroy(obj);
}
}
void obj_timeout(struct obj *obj)
{
spin_lock(&global_list_lock);
obj_list_del(obj);
spin_unlock(&global_list_lock);
.. note::
if (atomic_dec_and_test(&obj->refcnt))
obj_destroy(obj);
}
(This is a simplification of the ARP queue management in the
generic neighbour discover code of the networking. Olaf Kirch
found a bug wrt. memory barriers in kfree_skb() that exposed
the atomic_t memory barrier requirements quite clearly.)
This is a simplification of the ARP queue management in the generic
neighbour discover code of the networking. Olaf Kirch found a bug wrt.
memory barriers in kfree_skb() that exposed the atomic_t memory barrier
requirements quite clearly.
Given the above scheme, it must be the case that the obj->active
update done by the obj list deletion be visible to other processors
@ -383,7 +394,7 @@ before the atomic counter decrement is performed.
Otherwise, the counter could fall to zero, yet obj->active would still
be set, thus triggering the assertion in obj_destroy(). The error
sequence looks like this:
sequence looks like this::
cpu 0 cpu 1
obj_poke() obj_timeout()
@ -420,6 +431,10 @@ same scheme.
Another note is that the atomic_t operations returning values are
extremely slow on an old 386.
Atomic Bitmask
==============
We will now cover the atomic bitmask operations. You will find that
their SMP and memory barrier semantics are similar in shape and scope
to the atomic_t ops above.
@ -427,7 +442,7 @@ to the atomic_t ops above.
Native atomic bit operations are defined to operate on objects aligned
to the size of an "unsigned long" C data type, and are least of that
size. The endianness of the bits within each "unsigned long" are the
native endianness of the cpu.
native endianness of the cpu. ::
void set_bit(unsigned long nr, volatile unsigned long *addr);
void clear_bit(unsigned long nr, volatile unsigned long *addr);
@ -437,7 +452,7 @@ These routines set, clear, and change, respectively, the bit number
indicated by "nr" on the bit mask pointed to by "ADDR".
They must execute atomically, yet there are no implicit memory barrier
semantics required of these interfaces.
semantics required of these interfaces. ::
int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
@ -466,7 +481,7 @@ must provide explicit memory barrier semantics around their execution.
All memory operations before the atomic bit operation call must be
made visible globally before the atomic bit operation is made visible.
Likewise, the atomic bit operation must be visible globally before any
subsequent memory operation is made visible. For example:
subsequent memory operation is made visible. For example::
obj->dead = 1;
if (test_and_set_bit(0, &obj->flags))
@ -479,7 +494,7 @@ done by test_and_set_bit() becomes visible. Likewise, the atomic
memory operation done by test_and_set_bit() must become visible before
"obj->killed = 1;" is visible.
Finally there is the basic operation:
Finally there is the basic operation::
int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
@ -488,13 +503,13 @@ pointed to by "addr".
If explicit memory barriers are required around {set,clear}_bit() (which do
not return a value, and thus does not need to provide memory barrier
semantics), two interfaces are provided:
semantics), two interfaces are provided::
void smp_mb__before_atomic(void);
void smp_mb__after_atomic(void);
They are used as follows, and are akin to their atomic_t operation
brothers:
brothers::
/* All memory operations before this call will
* be globally visible before the clear_bit().
@ -511,7 +526,7 @@ There are two special bitops with lock barrier semantics (acquire/release,
same as spinlocks). These operate in the same way as their non-_lock/unlock
postfixed variants, except that they are to provide acquire/release semantics,
respectively. This means they can be used for bit_spin_trylock and
bit_spin_unlock type operations without specifying any more barriers.
bit_spin_unlock type operations without specifying any more barriers. ::
int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
void clear_bit_unlock(unsigned long nr, unsigned long *addr);
@ -526,7 +541,7 @@ provided. They are used in contexts where some other higher-level SMP
locking scheme is being used to protect the bitmask, and thus less
expensive non-atomic operations may be used in the implementation.
They have names similar to the above bitmask operation interfaces,
except that two underscores are prefixed to the interface name.
except that two underscores are prefixed to the interface name. ::
void __set_bit(unsigned long nr, volatile unsigned long *addr);
void __clear_bit(unsigned long nr, volatile unsigned long *addr);
@ -542,9 +557,11 @@ The routines xchg() and cmpxchg() must provide the same exact
memory-barrier semantics as the atomic and bit operations returning
values.
Note: If someone wants to use xchg(), cmpxchg() and their variants,
linux/atomic.h should be included rather than asm/cmpxchg.h, unless
the code is in arch/* and can take care of itself.
.. note::
If someone wants to use xchg(), cmpxchg() and their variants,
linux/atomic.h should be included rather than asm/cmpxchg.h, unless the
code is in arch/* and can take care of itself.
Spinlocks and rwlocks have memory barrier expectations as well.
The rule to follow is simple:
@ -558,7 +575,7 @@ The rule to follow is simple:
Which finally brings us to _atomic_dec_and_lock(). There is an
architecture-neutral version implemented in lib/dec_and_lock.c,
but most platforms will wish to optimize this in assembler.
but most platforms will wish to optimize this in assembler. ::
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
@ -573,7 +590,7 @@ sure the spinlock operation is globally visible before any
subsequent memory operation.
We can demonstrate this operation more clearly if we define
an abstract atomic operation:
an abstract atomic operation::
long cas(long *mem, long old, long new);
@ -584,48 +601,48 @@ an abstract atomic operation:
3) Regardless, the current value at "mem" is returned.
As an example usage, here is what an atomic counter update
might look like:
might look like::
void example_atomic_inc(long *counter)
{
long old, new, ret;
void example_atomic_inc(long *counter)
{
long old, new, ret;
while (1) {
old = *counter;
new = old + 1;
while (1) {
old = *counter;
new = old + 1;
ret = cas(counter, old, new);
if (ret == old)
break;
}
}
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
{
long old, new, ret;
int went_to_zero;
went_to_zero = 0;
while (1) {
old = atomic_read(atomic);
new = old - 1;
if (new == 0) {
went_to_zero = 1;
spin_lock(lock);
}
ret = cas(atomic, old, new);
if (ret == old)
break;
if (went_to_zero) {
spin_unlock(lock);
went_to_zero = 0;
ret = cas(counter, old, new);
if (ret == old)
break;
}
}
return went_to_zero;
}
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock()::
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
{
long old, new, ret;
int went_to_zero;
went_to_zero = 0;
while (1) {
old = atomic_read(atomic);
new = old - 1;
if (new == 0) {
went_to_zero = 1;
spin_lock(lock);
}
ret = cas(atomic, old, new);
if (ret == old)
break;
if (went_to_zero) {
spin_unlock(lock);
went_to_zero = 0;
}
}
return went_to_zero;
}
Now, as far as memory barriers go, as long as spin_lock()
strictly orders all subsequent memory operations (including
@ -635,6 +652,7 @@ Said another way, _atomic_dec_and_lock() must guarantee that
a counter dropping to zero is never made visible before the
spinlock being acquired.
Note that this also means that for the case where the counter
is not dropping to zero, there are no memory ordering
requirements.
.. note::
Note that this also means that for the case where the counter is not
dropping to zero, there are no memory ordering requirements.

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Core-API Documentation"
tags.add("subproject")
latex_documents = [
('index', 'core-api.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,310 @@
============================================
The object-lifetime debugging infrastructure
============================================
:Author: Thomas Gleixner
Introduction
============
debugobjects is a generic infrastructure to track the life time of
kernel objects and validate the operations on those.
debugobjects is useful to check for the following error patterns:
- Activation of uninitialized objects
- Initialization of active objects
- Usage of freed/destroyed objects
debugobjects is not changing the data structure of the real object so it
can be compiled in with a minimal runtime impact and enabled on demand
with a kernel command line option.
Howto use debugobjects
======================
A kernel subsystem needs to provide a data structure which describes the
object type and add calls into the debug code at appropriate places. The
data structure to describe the object type needs at minimum the name of
the object type. Optional functions can and should be provided to fixup
detected problems so the kernel can continue to work and the debug
information can be retrieved from a live system instead of hard core
debugging with serial consoles and stack trace transcripts from the
monitor.
The debug calls provided by debugobjects are:
- debug_object_init
- debug_object_init_on_stack
- debug_object_activate
- debug_object_deactivate
- debug_object_destroy
- debug_object_free
- debug_object_assert_init
Each of these functions takes the address of the real object and a
pointer to the object type specific debug description structure.
Each detected error is reported in the statistics and a limited number
of errors are printk'ed including a full stack trace.
The statistics are available via /sys/kernel/debug/debug_objects/stats.
They provide information about the number of warnings and the number of
successful fixups along with information about the usage of the internal
tracking objects and the state of the internal tracking objects pool.
Debug functions
===============
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_init
This function is called whenever the initialization function of a real
object is called.
When the real object is already tracked by debugobjects it is checked,
whether the object can be initialized. Initializing is not allowed for
active and destroyed objects. When debugobjects detects an error, then
it calls the fixup_init function of the object type description
structure if provided by the caller. The fixup function can correct the
problem before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to the
subsystem.
When the real object is not yet tracked by debugobjects, debugobjects
allocates a tracker object for the real object and sets the tracker
object state to ODEBUG_STATE_INIT. It verifies that the object is not
on the callers stack. If it is on the callers stack then a limited
number of warnings including a full stack trace is printk'ed. The
calling code must use debug_object_init_on_stack() and remove the
object before leaving the function which allocated it. See next section.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_init_on_stack
This function is called whenever the initialization function of a real
object which resides on the stack is called.
When the real object is already tracked by debugobjects it is checked,
whether the object can be initialized. Initializing is not allowed for
active and destroyed objects. When debugobjects detects an error, then
it calls the fixup_init function of the object type description
structure if provided by the caller. The fixup function can correct the
problem before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to the
subsystem.
When the real object is not yet tracked by debugobjects debugobjects
allocates a tracker object for the real object and sets the tracker
object state to ODEBUG_STATE_INIT. It verifies that the object is on
the callers stack.
An object which is on the stack must be removed from the tracker by
calling debug_object_free() before the function which allocates the
object returns. Otherwise we keep track of stale objects.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_activate
This function is called whenever the activation function of a real
object is called.
When the real object is already tracked by debugobjects it is checked,
whether the object can be activated. Activating is not allowed for
active and destroyed objects. When debugobjects detects an error, then
it calls the fixup_activate function of the object type description
structure if provided by the caller. The fixup function can correct the
problem before the real activation of the object happens. E.g. it can
deactivate an active object in order to prevent damage to the subsystem.
When the real object is not yet tracked by debugobjects then the
fixup_activate function is called if available. This is necessary to
allow the legitimate activation of statically allocated and initialized
objects. The fixup function checks whether the object is valid and calls
the debug_objects_init() function to initialize the tracking of this
object.
When the activation is legitimate, then the state of the associated
tracker object is set to ODEBUG_STATE_ACTIVE.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_deactivate
This function is called whenever the deactivation function of a real
object is called.
When the real object is tracked by debugobjects it is checked, whether
the object can be deactivated. Deactivating is not allowed for untracked
or destroyed objects.
When the deactivation is legitimate, then the state of the associated
tracker object is set to ODEBUG_STATE_INACTIVE.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_destroy
This function is called to mark an object destroyed. This is useful to
prevent the usage of invalid objects, which are still available in
memory: either statically allocated objects or objects which are freed
later.
When the real object is tracked by debugobjects it is checked, whether
the object can be destroyed. Destruction is not allowed for active and
destroyed objects. When debugobjects detects an error, then it calls the
fixup_destroy function of the object type description structure if
provided by the caller. The fixup function can correct the problem
before the real destruction of the object happens. E.g. it can
deactivate an active object in order to prevent damage to the subsystem.
When the destruction is legitimate, then the state of the associated
tracker object is set to ODEBUG_STATE_DESTROYED.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_free
This function is called before an object is freed.
When the real object is tracked by debugobjects it is checked, whether
the object can be freed. Free is not allowed for active objects. When
debugobjects detects an error, then it calls the fixup_free function of
the object type description structure if provided by the caller. The
fixup function can correct the problem before the real free of the
object happens. E.g. it can deactivate an active object in order to
prevent damage to the subsystem.
Note that debug_object_free removes the object from the tracker. Later
usage of the object is detected by the other debug checks.
.. kernel-doc:: lib/debugobjects.c
:functions: debug_object_assert_init
This function is called to assert that an object has been initialized.
When the real object is not tracked by debugobjects, it calls
fixup_assert_init of the object type description structure provided by
the caller, with the hardcoded object state ODEBUG_NOT_AVAILABLE. The
fixup function can correct the problem by calling debug_object_init
and other specific initializing functions.
When the real object is already tracked by debugobjects it is ignored.
Fixup functions
===============
Debug object type description structure
---------------------------------------
.. kernel-doc:: include/linux/debugobjects.h
:internal:
fixup_init
-----------
This function is called from the debug code whenever a problem in
debug_object_init is detected. The function takes the address of the
object and the state which is currently recorded in the tracker.
Called from debug_object_init when the object state is:
- ODEBUG_STATE_ACTIVE
The function returns true when the fixup was successful, otherwise
false. The return value is used to update the statistics.
Note, that the function needs to call the debug_object_init() function
again, after the damage has been repaired in order to keep the state
consistent.
fixup_activate
---------------
This function is called from the debug code whenever a problem in
debug_object_activate is detected.
Called from debug_object_activate when the object state is:
- ODEBUG_STATE_NOTAVAILABLE
- ODEBUG_STATE_ACTIVE
The function returns true when the fixup was successful, otherwise
false. The return value is used to update the statistics.
Note that the function needs to call the debug_object_activate()
function again after the damage has been repaired in order to keep the
state consistent.
The activation of statically initialized objects is a special case. When
debug_object_activate() has no tracked object for this object address
then fixup_activate() is called with object state
ODEBUG_STATE_NOTAVAILABLE. The fixup function needs to check whether
this is a legitimate case of a statically initialized object or not. In
case it is it calls debug_object_init() and debug_object_activate()
to make the object known to the tracker and marked active. In this case
the function should return false because this is not a real fixup.
fixup_destroy
--------------
This function is called from the debug code whenever a problem in
debug_object_destroy is detected.
Called from debug_object_destroy when the object state is:
- ODEBUG_STATE_ACTIVE
The function returns true when the fixup was successful, otherwise
false. The return value is used to update the statistics.
fixup_free
-----------
This function is called from the debug code whenever a problem in
debug_object_free is detected. Further it can be called from the debug
checks in kfree/vfree, when an active object is detected from the
debug_check_no_obj_freed() sanity checks.
Called from debug_object_free() or debug_check_no_obj_freed() when
the object state is:
- ODEBUG_STATE_ACTIVE
The function returns true when the fixup was successful, otherwise
false. The return value is used to update the statistics.
fixup_assert_init
-------------------
This function is called from the debug code whenever a problem in
debug_object_assert_init is detected.
Called from debug_object_assert_init() with a hardcoded state
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the debug
bucket.
The function returns true when the fixup was successful, otherwise
false. The return value is used to update the statistics.
Note, this function should make sure debug_object_init() is called
before returning.
The handling of statically initialized objects is a special case. The
fixup function should check if this is a legitimate case of a statically
initialized object or not. In this case only debug_object_init()
should be called to make the object known to the tracker. Then the
function should return false because this is not a real fixup.
Known Bugs And Assumptions
==========================
None (knock on wood).

View File

@ -0,0 +1,33 @@
======================
Core API Documentation
======================
This is the beginning of a manual for core kernel APIs. The conversion
(and writing!) of documents for this manual is much appreciated!
Core utilities
==============
.. toctree::
:maxdepth: 1
assoc_array
atomic_ops
local_ops
workqueue
Interfaces for kernel debugging
===============================
.. toctree::
:maxdepth: 1
debug-objects
tracepoint
.. only:: subproject
Indices
=======
* :ref:`genindex`

View File

@ -0,0 +1,206 @@
.. _local_ops:
=================================================
Semantics and Behavior of Local Atomic Operations
=================================================
:Author: Mathieu Desnoyers
This document explains the purpose of the local atomic operations, how
to implement them for any given architecture and shows how they can be used
properly. It also stresses on the precautions that must be taken when reading
those local variables across CPUs when the order of memory writes matters.
.. note::
Note that ``local_t`` based operations are not recommended for general
kernel use. Please use the ``this_cpu`` operations instead unless there is
really a special purpose. Most uses of ``local_t`` in the kernel have been
replaced by ``this_cpu`` operations. ``this_cpu`` operations combine the
relocation with the ``local_t`` like semantics in a single instruction and
yield more compact and faster executing code.
Purpose of local atomic operations
==================================
Local atomic operations are meant to provide fast and highly reentrant per CPU
counters. They minimize the performance cost of standard atomic operations by
removing the LOCK prefix and memory barriers normally required to synchronize
across CPUs.
Having fast per CPU atomic counters is interesting in many cases: it does not
require disabling interrupts to protect from interrupt handlers and it permits
coherent counters in NMI handlers. It is especially useful for tracing purposes
and for various performance monitoring counters.
Local atomic operations only guarantee variable modification atomicity wrt the
CPU which owns the data. Therefore, care must taken to make sure that only one
CPU writes to the ``local_t`` data. This is done by using per cpu data and
making sure that we modify it from within a preemption safe context. It is
however permitted to read ``local_t`` data from any CPU: it will then appear to
be written out of order wrt other memory writes by the owner CPU.
Implementation for a given architecture
=======================================
It can be done by slightly modifying the standard atomic operations: only
their UP variant must be kept. It typically means removing LOCK prefix (on
i386 and x86_64) and any SMP synchronization barrier. If the architecture does
not have a different behavior between SMP and UP, including
``asm-generic/local.h`` in your architecture's ``local.h`` is sufficient.
The ``local_t`` type is defined as an opaque ``signed long`` by embedding an
``atomic_long_t`` inside a structure. This is made so a cast from this type to
a ``long`` fails. The definition looks like::
typedef struct { atomic_long_t a; } local_t;
Rules to follow when using local atomic operations
==================================================
* Variables touched by local ops must be per cpu variables.
* *Only* the CPU owner of these variables must write to them.
* This CPU can use local ops from any context (process, irq, softirq, nmi, ...)
to update its ``local_t`` variables.
* Preemption (or interrupts) must be disabled when using local ops in
process context to make sure the process won't be migrated to a
different CPU between getting the per-cpu variable and doing the
actual local op.
* When using local ops in interrupt context, no special care must be
taken on a mainline kernel, since they will run on the local CPU with
preemption already disabled. I suggest, however, to explicitly
disable preemption anyway to make sure it will still work correctly on
-rt kernels.
* Reading the local cpu variable will provide the current copy of the
variable.
* Reads of these variables can be done from any CPU, because updates to
"``long``", aligned, variables are always atomic. Since no memory
synchronization is done by the writer CPU, an outdated copy of the
variable can be read when reading some *other* cpu's variables.
How to use local atomic operations
==================================
::
#include <linux/percpu.h>
#include <asm/local.h>
static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
Counting
========
Counting is done on all the bits of a signed long.
In preemptible context, use ``get_cpu_var()`` and ``put_cpu_var()`` around
local atomic operations: it makes sure that preemption is disabled around write
access to the per cpu variable. For instance::
local_inc(&get_cpu_var(counters));
put_cpu_var(counters);
If you are already in a preemption-safe context, you can use
``this_cpu_ptr()`` instead::
local_inc(this_cpu_ptr(&counters));
Reading the counters
====================
Those local counters can be read from foreign CPUs to sum the count. Note that
the data seen by local_read across CPUs must be considered to be out of order
relatively to other memory writes happening on the CPU that owns the data::
long sum = 0;
for_each_online_cpu(cpu)
sum += local_read(&per_cpu(counters, cpu));
If you want to use a remote local_read to synchronize access to a resource
between CPUs, explicit ``smp_wmb()`` and ``smp_rmb()`` memory barriers must be used
respectively on the writer and the reader CPUs. It would be the case if you use
the ``local_t`` variable as a counter of bytes written in a buffer: there should
be a ``smp_wmb()`` between the buffer write and the counter increment and also a
``smp_rmb()`` between the counter read and the buffer read.
Here is a sample module which implements a basic per cpu counter using
``local.h``::
/* test-local.c
*
* Sample module for local.h usage.
*/
#include <asm/local.h>
#include <linux/module.h>
#include <linux/timer.h>
static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
static struct timer_list test_timer;
/* IPI called on each CPU. */
static void test_each(void *info)
{
/* Increment the counter from a non preemptible context */
printk("Increment on cpu %d\n", smp_processor_id());
local_inc(this_cpu_ptr(&counters));
/* This is what incrementing the variable would look like within a
* preemptible context (it disables preemption) :
*
* local_inc(&get_cpu_var(counters));
* put_cpu_var(counters);
*/
}
static void do_test_timer(unsigned long data)
{
int cpu;
/* Increment the counters */
on_each_cpu(test_each, NULL, 1);
/* Read all the counters */
printk("Counters read from CPU %d\n", smp_processor_id());
for_each_online_cpu(cpu) {
printk("Read : CPU %d, count %ld\n", cpu,
local_read(&per_cpu(counters, cpu)));
}
del_timer(&test_timer);
test_timer.expires = jiffies + 1000;
add_timer(&test_timer);
}
static int __init test_init(void)
{
/* initialize the timer that will increment the counter */
init_timer(&test_timer);
test_timer.function = do_test_timer;
test_timer.expires = jiffies + 1;
add_timer(&test_timer);
return 0;
}
static void __exit test_exit(void)
{
del_timer_sync(&test_timer);
}
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("Local Atomic Ops");

View File

@ -0,0 +1,55 @@
===============================
The Linux Kernel Tracepoint API
===============================
:Author: Jason Baron
:Author: William Cohen
Introduction
============
Tracepoints are static probe points that are located in strategic points
throughout the kernel. 'Probes' register/unregister with tracepoints via
a callback mechanism. The 'probes' are strictly typed functions that are
passed a unique set of parameters defined by each tracepoint.
From this simple callback mechanism, 'probes' can be used to profile,
debug, and understand kernel behavior. There are a number of tools that
provide a framework for using 'probes'. These tools include Systemtap,
ftrace, and LTTng.
Tracepoints are defined in a number of header files via various macros.
Thus, the purpose of this document is to provide a clear accounting of
the available tracepoints. The intention is to understand not only what
tracepoints are available but also to understand where future
tracepoints might be added.
The API presented has functions of the form:
``trace_tracepointname(function parameters)``. These are the tracepoints
callbacks that are found throughout the code. Registering and
unregistering probes with these callback sites is covered in the
``Documentation/trace/*`` directory.
IRQ
===
.. kernel-doc:: include/trace/events/irq.h
:internal:
SIGNAL
======
.. kernel-doc:: include/trace/events/signal.h
:internal:
Block IO
========
.. kernel-doc:: include/trace/events/block.h
:internal:
Workqueue
=========
.. kernel-doc:: include/trace/events/workqueue.h
:internal:

View File

@ -1,21 +1,14 @@
====================================
Concurrency Managed Workqueue (cmwq)
====================================
September, 2010 Tejun Heo <tj@kernel.org>
Florian Mickler <florian@mickler.org>
CONTENTS
1. Introduction
2. Why cmwq?
3. The Design
4. Application Programming Interface (API)
5. Example Execution Scenarios
6. Guidelines
7. Debugging
:Date: September, 2010
:Author: Tejun Heo <tj@kernel.org>
:Author: Florian Mickler <florian@mickler.org>
1. Introduction
Introduction
============
There are many cases where an asynchronous process execution context
is needed and the workqueue (wq) API is the most commonly used
@ -32,7 +25,8 @@ there is no work item left on the workqueue the worker becomes idle.
When a new work item gets queued, the worker begins executing again.
2. Why cmwq?
Why cmwq?
=========
In the original wq implementation, a multi threaded (MT) wq had one
worker thread per CPU and a single threaded (ST) wq had one worker
@ -71,7 +65,8 @@ focus on the following goals.
the API users don't need to worry about such details.
3. The Design
The Design
==========
In order to ease the asynchronous execution of functions a new
abstraction, the work item, is introduced.
@ -102,7 +97,7 @@ aspects of the way the work items are executed by setting flags on the
workqueue they are putting the work item on. These flags include
things like CPU locality, concurrency limits, priority and more. To
get a detailed overview refer to the API description of
alloc_workqueue() below.
``alloc_workqueue()`` below.
When a work item is queued to a workqueue, the target worker-pool is
determined according to the queue parameters and workqueue attributes
@ -136,7 +131,7 @@ them.
For unbound workqueues, the number of backing pools is dynamic.
Unbound workqueue can be assigned custom attributes using
apply_workqueue_attrs() and workqueue will automatically create
``apply_workqueue_attrs()`` and workqueue will automatically create
backing worker pools matching the attributes. The responsibility of
regulating concurrency level is on the users. There is also a flag to
mark a bound wq to ignore the concurrency management. Please refer to
@ -151,94 +146,95 @@ pressure. Else it is possible that the worker-pool deadlocks waiting
for execution contexts to free up.
4. Application Programming Interface (API)
Application Programming Interface (API)
=======================================
alloc_workqueue() allocates a wq. The original create_*workqueue()
functions are deprecated and scheduled for removal. alloc_workqueue()
takes three arguments - @name, @flags and @max_active. @name is the
name of the wq and also used as the name of the rescuer thread if
there is one.
``alloc_workqueue()`` allocates a wq. The original
``create_*workqueue()`` functions are deprecated and scheduled for
removal. ``alloc_workqueue()`` takes three arguments - @``name``,
``@flags`` and ``@max_active``. ``@name`` is the name of the wq and
also used as the name of the rescuer thread if there is one.
A wq no longer manages execution resources but serves as a domain for
forward progress guarantee, flush and work item attributes. @flags
and @max_active control how work items are assigned execution
forward progress guarantee, flush and work item attributes. ``@flags``
and ``@max_active`` control how work items are assigned execution
resources, scheduled and executed.
@flags:
WQ_UNBOUND
``flags``
---------
Work items queued to an unbound wq are served by the special
worker-pools which host workers which are not bound to any
specific CPU. This makes the wq behave as a simple execution
context provider without concurrency management. The unbound
worker-pools try to start execution of work items as soon as
possible. Unbound wq sacrifices locality but is useful for
the following cases.
``WQ_UNBOUND``
Work items queued to an unbound wq are served by the special
worker-pools which host workers which are not bound to any
specific CPU. This makes the wq behave as a simple execution
context provider without concurrency management. The unbound
worker-pools try to start execution of work items as soon as
possible. Unbound wq sacrifices locality but is useful for
the following cases.
* Wide fluctuation in the concurrency level requirement is
expected and using bound wq may end up creating large number
of mostly unused workers across different CPUs as the issuer
hops through different CPUs.
* Wide fluctuation in the concurrency level requirement is
expected and using bound wq may end up creating large number
of mostly unused workers across different CPUs as the issuer
hops through different CPUs.
* Long running CPU intensive workloads which can be better
managed by the system scheduler.
* Long running CPU intensive workloads which can be better
managed by the system scheduler.
WQ_FREEZABLE
``WQ_FREEZABLE``
A freezable wq participates in the freeze phase of the system
suspend operations. Work items on the wq are drained and no
new work item starts execution until thawed.
A freezable wq participates in the freeze phase of the system
suspend operations. Work items on the wq are drained and no
new work item starts execution until thawed.
``WQ_MEM_RECLAIM``
All wq which might be used in the memory reclaim paths **MUST**
have this flag set. The wq is guaranteed to have at least one
execution context regardless of memory pressure.
WQ_MEM_RECLAIM
``WQ_HIGHPRI``
Work items of a highpri wq are queued to the highpri
worker-pool of the target cpu. Highpri worker-pools are
served by worker threads with elevated nice level.
All wq which might be used in the memory reclaim paths _MUST_
have this flag set. The wq is guaranteed to have at least one
execution context regardless of memory pressure.
Note that normal and highpri worker-pools don't interact with
each other. Each maintain its separate pool of workers and
implements concurrency management among its workers.
WQ_HIGHPRI
``WQ_CPU_INTENSIVE``
Work items of a CPU intensive wq do not contribute to the
concurrency level. In other words, runnable CPU intensive
work items will not prevent other work items in the same
worker-pool from starting execution. This is useful for bound
work items which are expected to hog CPU cycles so that their
execution is regulated by the system scheduler.
Work items of a highpri wq are queued to the highpri
worker-pool of the target cpu. Highpri worker-pools are
served by worker threads with elevated nice level.
Although CPU intensive work items don't contribute to the
concurrency level, start of their executions is still
regulated by the concurrency management and runnable
non-CPU-intensive work items can delay execution of CPU
intensive work items.
Note that normal and highpri worker-pools don't interact with
each other. Each maintain its separate pool of workers and
implements concurrency management among its workers.
This flag is meaningless for unbound wq.
WQ_CPU_INTENSIVE
Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
workqueues are now non-reentrant - any work item is guaranteed to be
executed by at most one worker system-wide at any given time.
Work items of a CPU intensive wq do not contribute to the
concurrency level. In other words, runnable CPU intensive
work items will not prevent other work items in the same
worker-pool from starting execution. This is useful for bound
work items which are expected to hog CPU cycles so that their
execution is regulated by the system scheduler.
Although CPU intensive work items don't contribute to the
concurrency level, start of their executions is still
regulated by the concurrency management and runnable
non-CPU-intensive work items can delay execution of CPU
intensive work items.
``max_active``
--------------
This flag is meaningless for unbound wq.
Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues
are now non-reentrant - any work item is guaranteed to be executed by
at most one worker system-wide at any given time.
@max_active:
@max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq. For example,
with @max_active of 16, at most 16 work items of the wq can be
``@max_active`` determines the maximum number of execution contexts
per CPU which can be assigned to the work items of a wq. For example,
with ``@max_active`` of 16, at most 16 work items of the wq can be
executing at the same time per CPU.
Currently, for a bound wq, the maximum limit for @max_active is 512
and the default value used when 0 is specified is 256. For an unbound
wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
values are chosen sufficiently high such that they are not the
limiting factor while providing protection in runaway cases.
Currently, for a bound wq, the maximum limit for ``@max_active`` is
512 and the default value used when 0 is specified is 256. For an
unbound wq, the limit is higher of 512 and 4 *
``num_possible_cpus()``. These values are chosen sufficiently high
such that they are not the limiting factor while providing protection
in runaway cases.
The number of active work items of a wq is usually regulated by the
users of the wq, more specifically, by how many work items the users
@ -247,13 +243,14 @@ throttling the number of active work items, specifying '0' is
recommended.
Some users depend on the strict execution ordering of ST wq. The
combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
behavior. Work items on such wq are always queued to the unbound
worker-pools and only one work item can be active at any given time thus
achieving the same ordering property as ST wq.
combination of ``@max_active`` of 1 and ``WQ_UNBOUND`` is used to
achieve this behavior. Work items on such wq are always queued to the
unbound worker-pools and only one work item can be active at any given
time thus achieving the same ordering property as ST wq.
5. Example Execution Scenarios
Example Execution Scenarios
===========================
The following example execution scenarios try to illustrate how cmwq
behave under different configurations.
@ -265,7 +262,7 @@ behave under different configurations.
Ignoring all other tasks, works and processing overhead, and assuming
simple FIFO scheduling, the following is one highly simplified version
of possible sequences of events with the original wq.
of possible sequences of events with the original wq. ::
TIME IN MSECS EVENT
0 w0 starts and burns CPU
@ -279,7 +276,7 @@ of possible sequences of events with the original wq.
40 w2 sleeps
50 w2 wakes up and finishes
And with cmwq with @max_active >= 3,
And with cmwq with ``@max_active`` >= 3, ::
TIME IN MSECS EVENT
0 w0 starts and burns CPU
@ -293,7 +290,7 @@ And with cmwq with @max_active >= 3,
20 w1 wakes up and finishes
25 w2 wakes up and finishes
If @max_active == 2,
If ``@max_active`` == 2, ::
TIME IN MSECS EVENT
0 w0 starts and burns CPU
@ -308,7 +305,7 @@ If @max_active == 2,
35 w2 wakes up and finishes
Now, let's assume w1 and w2 are queued to a different wq q1 which has
WQ_CPU_INTENSIVE set,
``WQ_CPU_INTENSIVE`` set, ::
TIME IN MSECS EVENT
0 w0 starts and burns CPU
@ -322,13 +319,15 @@ WQ_CPU_INTENSIVE set,
25 w2 wakes up and finishes
6. Guidelines
Guidelines
==========
* Do not forget to use WQ_MEM_RECLAIM if a wq may process work items
which are used during memory reclaim. Each wq with WQ_MEM_RECLAIM
set has an execution context reserved for it. If there is
dependency among multiple work items used during memory reclaim,
they should be queued to separate wq each with WQ_MEM_RECLAIM.
* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work
items which are used during memory reclaim. Each wq with
``WQ_MEM_RECLAIM`` set has an execution context reserved for it. If
there is dependency among multiple work items used during memory
reclaim, they should be queued to separate wq each with
``WQ_MEM_RECLAIM``.
* Unless strict ordering is required, there is no need to use ST wq.
@ -337,30 +336,31 @@ WQ_CPU_INTENSIVE set,
well under the default limit.
* A wq serves as a domain for forward progress guarantee
(WQ_MEM_RECLAIM, flush and work item attributes. Work items which
are not involved in memory reclaim and don't need to be flushed as a
part of a group of work items, and don't require any special
attribute, can use one of the system wq. There is no difference in
execution characteristics between using a dedicated wq and a system
wq.
(``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
which are not involved in memory reclaim and don't need to be
flushed as a part of a group of work items, and don't require any
special attribute, can use one of the system wq. There is no
difference in execution characteristics between using a dedicated wq
and a system wq.
* Unless work items are expected to consume a huge amount of CPU
cycles, using a bound wq is usually beneficial due to the increased
level of locality in wq operations and work item execution.
7. Debugging
Debugging
=========
Because the work functions are executed by generic worker threads
there are a few tricks needed to shed some light on misbehaving
workqueue users.
Worker threads show up in the process list as:
Worker threads show up in the process list as: ::
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
If kworkers are going crazy (using too much cpu), there are two types
of possible problems:
@ -368,7 +368,7 @@ of possible problems:
1. Something being scheduled in rapid succession
2. A single work item that consumes lots of cpu cycles
The first one can be tracked using tracing:
The first one can be tracked using tracing: ::
$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
@ -380,9 +380,15 @@ the output and the offender can be determined with the work item
function.
For the second type of problems it should be possible to just check
the stack trace of the offending worker thread.
the stack trace of the offending worker thread. ::
$ cat /proc/THE_OFFENDING_KWORKER/stack
The work item's function should be trivially visible in the stack
trace.
Kernel Inline Documentations Reference
======================================
.. kernel-doc:: include/linux/workqueue.h

View File

@ -84,9 +84,9 @@ are added or removed anytime. Trimming it accurately for your system needs
upfront can save some boot time memory. See below for how we use heuristics
in x86_64 case to keep this under check.
cpu_online_mask: Bitmap of all CPUs currently online. Its set in __cpu_up()
after a cpu is available for kernel scheduling and ready to receive
interrupts from devices. Its cleared when a cpu is brought down using
cpu_online_mask: Bitmap of all CPUs currently online. It's set in __cpu_up()
after a CPU is available for kernel scheduling and ready to receive
interrupts from devices. It's cleared when a CPU is brought down using
__cpu_disable(), before which all OS services including interrupts are
migrated to another target CPU.
@ -181,7 +181,7 @@ To support physical addition/removal, one would need some BIOS hooks and
the platform should have something like an attention button in PCI hotplug.
CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
Q: How do i logically offline a CPU?
Q: How do I logically offline a CPU?
A: Do the following.
#echo 0 > /sys/devices/system/cpu/cpuX/online
@ -191,15 +191,15 @@ Once the logical offline is successful, check
#cat /proc/interrupts
You should now not see the CPU that you removed. Also online file will report
the state as 0 when a cpu if offline and 1 when its online.
the state as 0 when a CPU is offline and 1 when it's online.
#To display the current cpu state.
#cat /sys/devices/system/cpu/cpuX/online
Q: Why can't i remove CPU0 on some systems?
Q: Why can't I remove CPU0 on some systems?
A: Some architectures may have some special dependency on a certain CPU.
For e.g in IA64 platforms we have ability to sent platform interrupts to the
For e.g in IA64 platforms we have ability to send platform interrupts to the
OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI
specifications, we didn't have a way to change the target CPU. Hence if the
current ACPI version doesn't support such re-direction, we disable that CPU
@ -231,7 +231,7 @@ either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug.
--Fenghua Yu <fenghua.yu@intel.com>
Q: How do i find out if a particular CPU is not removable?
Q: How do I find out if a particular CPU is not removable?
A: Depending on the implementation, some architectures may show this by the
absence of the "online" file. This is done if it can be determined ahead of
time that this CPU cannot be removed.
@ -250,7 +250,7 @@ A: The following happen, listed in no particular order :-)
- All processes are migrated away from this outgoing CPU to new CPUs.
The new CPU is chosen from each process' current cpuset, which may be
a subset of all online CPUs.
- All interrupts targeted to this CPU is migrated to a new CPU
- All interrupts targeted to this CPU are migrated to a new CPU
- timers/bottom half/task lets are also migrated to a new CPU
- Once all services are migrated, kernel calls an arch specific routine
__cpu_disable() to perform arch specific cleanup.
@ -259,10 +259,10 @@ A: The following happen, listed in no particular order :-)
CPU is being offlined).
"It is expected that each service cleans up when the CPU_DOWN_PREPARE
notifier is called, when CPU_DEAD is called its expected there is nothing
notifier is called, when CPU_DEAD is called it's expected there is nothing
running on behalf of this CPU that was offlined"
Q: If i have some kernel code that needs to be aware of CPU arrival and
Q: If I have some kernel code that needs to be aware of CPU arrival and
departure, how to i arrange for proper notification?
A: This is what you would need in your kernel code to receive notifications.
@ -311,7 +311,7 @@ things will happen if a notifier in path sent a BAD notify code.
Q: I don't see my action being called for all CPUs already up and running?
A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
If you need to perform some action for each cpu already in the system, then
If you need to perform some action for each CPU already in the system, then
do this:
for_each_online_cpu(i) {
@ -363,8 +363,8 @@ A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
callbacks as well as initialize the already online CPUs.
Q: If i would like to develop cpu hotplug support for a new architecture,
what do i need at a minimum?
Q: If I would like to develop CPU hotplug support for a new architecture,
what do I need at a minimum?
A: The following are what is required for CPU hotplug infrastructure to work
correctly.
@ -382,8 +382,8 @@ A: The following are what is required for CPU hotplug infrastructure to work
per_cpu state to be set, to ensure the processor
dead routine is called to be sure positively.
Q: I need to ensure that a particular cpu is not removed when there is some
work specific to this cpu is in progress.
Q: I need to ensure that a particular CPU is not removed when there is some
work specific to this CPU in progress.
A: There are two ways. If your code can be run in interrupt context, use
smp_call_function_single(), otherwise use work_on_cpu(). Note that
work_on_cpu() is slow, and can fail due to out of memory:

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Development tools for the kernel"
tags.add("subproject")
latex_documents = [
('index', 'dev-tools.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -201,7 +201,9 @@ Appendix A: gather_on_build.sh
------------------------------
Sample script to gather coverage meta files on the build machine
(see 6a)::
(see 6a):
.. code-block:: sh
#!/bin/bash
@ -232,7 +234,9 @@ Appendix B: gather_on_test.sh
-----------------------------
Sample script to gather coverage data files on the test machine
(see 6b)::
(see 6b):
.. code-block:: sh
#!/bin/bash -e

View File

@ -23,3 +23,11 @@ whole; patches welcome!
kmemleak
kmemcheck
gdb-kernel-debugging
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@ -24,7 +24,9 @@ Profiling data will only become accessible once debugfs has been mounted::
mount -t debugfs none /sys/kernel/debug
The following program demonstrates kcov usage from within a test program::
The following program demonstrates kcov usage from within a test program:
.. code-block:: c
#include <stdio.h>
#include <stddef.h>

View File

@ -1,9 +0,0 @@
Linux Kernel Development Documentation
======================================
Contents:
.. toctree::
:maxdepth: 2
development-process

View File

@ -17,7 +17,7 @@ The target is named "raid" and it accepts the following parameters:
raid0 RAID0 striping (no resilience)
raid1 RAID1 mirroring
raid4 RAID4 with dedicated last parity disk
raid5_n RAID5 with dedicated last parity disk suporting takeover
raid5_n RAID5 with dedicated last parity disk supporting takeover
Same as raid4
-Transitory layout
raid5_la RAID5 left asymmetric
@ -36,7 +36,7 @@ The target is named "raid" and it accepts the following parameters:
- rotating parity N (right-to-left) with data continuation
raid6_n_6 RAID6 with dedicate parity disks
- parity and Q-syndrome on the last 2 disks;
laylout for takeover from/to raid4/raid5_n
layout for takeover from/to raid4/raid5_n
raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
- layout for takeover from raid5_la from/to raid6
raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
@ -137,8 +137,8 @@ The target is named "raid" and it accepts the following parameters:
device removal (negative value) or device addition (positive
value) to any reshape supporting raid levels 4/5/6 and 10.
RAID levels 4/5/6 allow for addition of devices (metadata
and data device tupel), raid10_near and raid10_offset only
allow for device addtion. raid10_far does not support any
and data device tuple), raid10_near and raid10_offset only
allow for device addition. raid10_far does not support any
reshaping at all.
A minimum of devices have to be kept to enforce resilience,
which is 3 devices for raid4/5 and 4 devices for raid6.

View File

@ -1,7 +1,7 @@
* Maxim DS3231 Real Time Clock
Required properties:
see: Documentation/devicetree/bindings/i2c/trivial-devices.txt
see: Documentation/devicetree/bindings/i2c/trivial-admin-guide/devices.rst
Optional property:
- #clock-cells: Should be 1.

View File

@ -3,7 +3,7 @@
Philips PCF8563/Epson RTC8564 Real Time Clock
Required properties:
see: Documentation/devicetree/bindings/i2c/trivial-devices.txt
see: Documentation/devicetree/bindings/i2c/trivial-admin-guide/devices.rst
Optional property:
- #clock-cells: Should be 0.

View File

@ -3,7 +3,7 @@
I. For patch submitters
0) Normal patch submission rules from Documentation/SubmittingPatches
0) Normal patch submission rules from Documentation/process/submitting-patches.rst
applies.
1) The Documentation/ portion of the patch should be a separate patch.

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = 'Linux Kernel Documentation Guide'
tags.add("subproject")
latex_documents = [
('index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,90 @@
DocBook XML [DEPRECATED]
========================
.. attention::
This section describes the deprecated DocBook XML toolchain. Please do not
create new DocBook XML template files. Please consider converting existing
DocBook XML templates files to Sphinx/reStructuredText.
Converting DocBook to Sphinx
----------------------------
Over time, we expect all of the documents under ``Documentation/DocBook`` to be
converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
which uses ``pandoc`` under the hood. For example::
$ cd Documentation/sphinx
$ ./tmplcvt ../DocBook/in.tmpl ../out.rst
Then edit the resulting rst files to fix any remaining issues, and add the
document in the ``toctree`` in ``Documentation/index.rst``.
Components of the kernel-doc system
-----------------------------------
Many places in the source tree have extractable documentation in the form of
block comments above functions. The components of this system are:
- ``scripts/kernel-doc``
This is a perl script that hunts for the block comments and can mark them up
directly into reStructuredText, DocBook, man, text, and HTML. (No, not
texinfo.)
- ``Documentation/DocBook/*.tmpl``
These are XML template files, which are normal XML files with special
place-holders for where the extracted documentation should go.
- ``scripts/docproc.c``
This is a program for converting XML template files into XML files. When a
file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
able to distinguish between internal and external functions.
It invokes kernel-doc, giving it the list of functions that are to be
documented.
Additionally it is used to scan the XML template files to locate all the files
referenced herein. This is used to generate dependency information as used by
make.
- ``Makefile``
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
DocBook XML files, PostScript files, PDF files, and html files in
Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
- ``Documentation/DocBook/Makefile``
This is where C files are associated with SGML templates.
How to use kernel-doc comments in DocBook XML template files
------------------------------------------------------------
DocBook XML template files (\*.tmpl) are like normal XML files, except that they
can contain escape sequences where extracted documentation should be inserted.
``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
functions that are exported using ``EXPORT_SYMBOL``: the function list is
collected from files listed in ``Documentation/DocBook/Makefile``.
``!I<filename>`` is replaced by the documentation for functions that are **not**
exported using ``EXPORT_SYMBOL``.
``!D<filename>`` is used to name additional files to search for functions
exported using ``EXPORT_SYMBOL``.
``!F<filename> <function [functions...]>`` is replaced by the documentation, in
``<filename>``, for the functions listed.
``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
``<section title>``; do not quote the ``<section title>``.
``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
sections and documented functions, symbols, etc. are used. This makes sense to
use when you use ``!F`` or ``!P`` only and want to verify that all documentation
is included.

View File

@ -0,0 +1,20 @@
.. _doc_guide:
=================================
How to write kernel documentation
=================================
.. toctree::
:maxdepth: 1
sphinx.rst
kernel-doc.rst
parse-headers.rst
docbook.rst
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@ -1,228 +1,3 @@
==========================
Linux Kernel Documentation
==========================
Introduction
============
The Linux kernel uses `Sphinx`_ to generate pretty documentation from
`reStructuredText`_ files under ``Documentation``. To build the documentation in
HTML or PDF formats, use ``make htmldocs`` or ``make pdfdocs``. The generated
documentation is placed in ``Documentation/output``.
.. _Sphinx: http://www.sphinx-doc.org/
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
The reStructuredText files may contain directives to include structured
documentation comments, or kernel-doc comments, from source files. Usually these
are used to describe the functions and types and design of the code. The
kernel-doc comments have some special structure and formatting, but beyond that
they are also treated as reStructuredText.
There is also the deprecated DocBook toolchain to generate documentation from
DocBook XML template files under ``Documentation/DocBook``. The DocBook files
are to be converted to reStructuredText, and the toolchain is slated to be
removed.
Finally, there are thousands of plain text documentation files scattered around
``Documentation``. Some of these will likely be converted to reStructuredText
over time, but the bulk of them will remain in plain text.
Sphinx Build
============
The usual way to generate the documentation is to run ``make htmldocs`` or
``make pdfdocs``. There are also other formats available, see the documentation
section of ``make help``. The generated documentation is placed in
format-specific subdirectories under ``Documentation/output``.
To generate documentation, Sphinx (``sphinx-build``) must obviously be
installed. For prettier HTML output, the Read the Docs Sphinx theme
(``sphinx_rtd_theme``) is used if available. For PDF output, ``rst2pdf`` is also
needed. All of these are widely available and packaged in distributions.
To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
variable. For example, use ``make SPHINXOPTS=-v htmldocs`` to get more verbose
output.
To remove the generated documentation, run ``make cleandocs``.
Writing Documentation
=====================
Adding new documentation can be as simple as:
1. Add a new ``.rst`` file somewhere under ``Documentation``.
2. Refer to it from the Sphinx main `TOC tree`_ in ``Documentation/index.rst``.
.. _TOC tree: http://www.sphinx-doc.org/en/stable/markup/toctree.html
This is usually good enough for simple documentation (like the one you're
reading right now), but for larger documents it may be advisable to create a
subdirectory (or use an existing one). For example, the graphics subsystem
documentation is under ``Documentation/gpu``, split to several ``.rst`` files,
and has a separate ``index.rst`` (with a ``toctree`` of its own) referenced from
the main index.
See the documentation for `Sphinx`_ and `reStructuredText`_ on what you can do
with them. In particular, the Sphinx `reStructuredText Primer`_ is a good place
to get started with reStructuredText. There are also some `Sphinx specific
markup constructs`_.
.. _reStructuredText Primer: http://www.sphinx-doc.org/en/stable/rest.html
.. _Sphinx specific markup constructs: http://www.sphinx-doc.org/en/stable/markup/index.html
Specific guidelines for the kernel documentation
------------------------------------------------
Here are some specific guidelines for the kernel documentation:
* Please don't go overboard with reStructuredText markup. Keep it simple.
* Please stick to this order of heading adornments:
1. ``=`` with overline for document title::
==============
Document title
==============
2. ``=`` for chapters::
Chapters
========
3. ``-`` for sections::
Section
-------
4. ``~`` for subsections::
Subsection
~~~~~~~~~~
Although RST doesn't mandate a specific order ("Rather than imposing a fixed
number and order of section title adornment styles, the order enforced will be
the order as encountered."), having the higher levels the same overall makes
it easier to follow the documents.
the C domain
------------
The `Sphinx C Domain`_ (name c) is suited for documentation of C API. E.g. a
function prototype:
.. code-block:: rst
.. c:function:: int ioctl( int fd, int request )
The C domain of the kernel-doc has some additional features. E.g. you can
*rename* the reference name of a function with a common name like ``open`` or
``ioctl``:
.. code-block:: rst
.. c:function:: int ioctl( int fd, int request )
:name: VIDIOC_LOG_STATUS
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
.. code-block:: rst
:c:func:`VIDIOC_LOG_STATUS`
list tables
-----------
We recommend the use of *list table* formats. The *list table* formats are
double-stage lists. Compared to the ASCII-art they might not be as
comfortable for
readers of the text files. Their advantage is that they are easy to
create or modify and that the diff of a modification is much more meaningful,
because it is limited to the modified content.
The ``flat-table`` is a double-stage list similar to the ``list-table`` with
some additional features:
* column-span: with the role ``cspan`` a cell can be extended through
additional columns
* row-span: with the role ``rspan`` a cell can be extended through
additional rows
* auto span rightmost cell of a table row over the missing cells on the right
side of that table-row. With Option ``:fill-cells:`` this behavior can
changed from *auto span* to *auto fill*, which automatically inserts (empty)
cells instead of spanning the last cell.
options:
* ``:header-rows:`` [int] count of header rows
* ``:stub-columns:`` [int] count of stub columns
* ``:widths:`` [[int] [int] ... ] widths of columns
* ``:fill-cells:`` instead of auto-spanning missing cells, insert missing cells
roles:
* ``:cspan:`` [int] additional columns (*morecols*)
* ``:rspan:`` [int] additional rows (*morerows*)
The example below shows how to use this markup. The first level of the staged
list is the *table-row*. In the *table-row* there is only one markup allowed,
the list of the cells in this *table-row*. Exceptions are *comments* ( ``..`` )
and *targets* (e.g. a ref to ``:ref:`last row <last row>``` / :ref:`last row
<last row>`).
.. code-block:: rst
.. flat-table:: table title
:widths: 2 1 1 3
* - head col 1
- head col 2
- head col 3
- head col 4
* - column 1
- field 1.1
- field 1.2 with autospan
* - column 2
- field 2.1
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
* .. _`last row`:
- column 3
Rendered as:
.. flat-table:: table title
:widths: 2 1 1 3
* - head col 1
- head col 2
- head col 3
- head col 4
* - column 1
- field 1.1
- field 1.2 with autospan
* - column 2
- field 2.1
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
* .. _`last row`:
- column 3
Including kernel-doc comments
=============================
@ -484,7 +259,10 @@ span multiple lines. The continuation lines may contain indentation.
In-line member documentation comments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The structure members may also be documented in-line within the definition::
The structure members may also be documented in-line within the definition.
There are two styles, single-line comments where both the opening ``/**`` and
closing ``*/`` are on the same line, and multi-line comments where they are each
on a line of their own, like all other kernel-doc comments::
/**
* struct foo - Brief description.
@ -502,6 +280,8 @@ The structure members may also be documented in-line within the definition::
* Here, the member description may contain several paragraphs.
*/
int baz;
/** @foobar: Single line description. */
int foobar;
}
Private members
@ -586,94 +366,3 @@ file.
Data structures visible in kernel include files should also be documented using
kernel-doc formatted comments.
DocBook XML [DEPRECATED]
========================
.. attention::
This section describes the deprecated DocBook XML toolchain. Please do not
create new DocBook XML template files. Please consider converting existing
DocBook XML templates files to Sphinx/reStructuredText.
Converting DocBook to Sphinx
----------------------------
Over time, we expect all of the documents under ``Documentation/DocBook`` to be
converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
which uses ``pandoc`` under the hood. For example::
$ cd Documentation/sphinx
$ ./tmplcvt ../DocBook/in.tmpl ../out.rst
Then edit the resulting rst files to fix any remaining issues, and add the
document in the ``toctree`` in ``Documentation/index.rst``.
Components of the kernel-doc system
-----------------------------------
Many places in the source tree have extractable documentation in the form of
block comments above functions. The components of this system are:
- ``scripts/kernel-doc``
This is a perl script that hunts for the block comments and can mark them up
directly into reStructuredText, DocBook, man, text, and HTML. (No, not
texinfo.)
- ``Documentation/DocBook/*.tmpl``
These are XML template files, which are normal XML files with special
place-holders for where the extracted documentation should go.
- ``scripts/docproc.c``
This is a program for converting XML template files into XML files. When a
file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
able to distinguish between internal and external functions.
It invokes kernel-doc, giving it the list of functions that are to be
documented.
Additionally it is used to scan the XML template files to locate all the files
referenced herein. This is used to generate dependency information as used by
make.
- ``Makefile``
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
DocBook XML files, PostScript files, PDF files, and html files in
Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
- ``Documentation/DocBook/Makefile``
This is where C files are associated with SGML templates.
How to use kernel-doc comments in DocBook XML template files
------------------------------------------------------------
DocBook XML template files (\*.tmpl) are like normal XML files, except that they
can contain escape sequences where extracted documentation should be inserted.
``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
functions that are exported using ``EXPORT_SYMBOL``: the function list is
collected from files listed in ``Documentation/DocBook/Makefile``.
``!I<filename>`` is replaced by the documentation for functions that are **not**
exported using ``EXPORT_SYMBOL``.
``!D<filename>`` is used to name additional files to search for functions
exported using ``EXPORT_SYMBOL``.
``!F<filename> <function [functions...]>`` is replaced by the documentation, in
``<filename>``, for the functions listed.
``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
``<section title>``; do not quote the ``<section title>``.
``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
sections and documented functions, symbols, etc. are used. This makes sense to
use when you use ``!F`` or ``!P`` only and want to verify that all documentation
is included.

View File

@ -0,0 +1,192 @@
===========================
Including uAPI header files
===========================
Sometimes, it is useful to include header files and C example codes in
order to describe the userspace API and to generate cross-references
between the code and the documentation. Adding cross-references for
userspace API files has an additional vantage: Sphinx will generate warnings
if a symbol is not found at the documentation. That helps to keep the
uAPI documentation in sync with the Kernel changes.
The :ref:`parse_headers.pl <parse_headers>` provide a way to generate such
cross-references. It has to be called via Makefile, while building the
documentation. Please see ``Documentation/media/Makefile`` for an example
about how to use it inside the Kernel tree.
.. _parse_headers:
parse_headers.pl
^^^^^^^^^^^^^^^^
NAME
****
parse_headers.pl - parse a C file, in order to identify functions, structs,
enums and defines and create cross-references to a Sphinx book.
SYNOPSIS
********
\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
Where <options> can be: --debug, --help or --man.
OPTIONS
*******
\ **--debug**\
Put the script in verbose mode, useful for debugging.
\ **--usage**\
Prints a brief help message and exits.
\ **--help**\
Prints a more detailed help message and exits.
DESCRIPTION
***********
Convert a C header or source file (C_FILE), into a ReStructured Text
included via ..parsed-literal block with cross-references for the
documentation files that describe the API. It accepts an optional
EXCEPTIONS_FILE with describes what elements will be either ignored or
be pointed to a non-default reference.
The output is written at the (OUT_FILE).
It is capable of identifying defines, functions, structs, typedefs,
enums and enum symbols and create cross-references for all of them.
It is also capable of distinguish #define used for specifying a Linux
ioctl.
The EXCEPTIONS_FILE contain two types of statements: \ **ignore**\ or \ **replace**\ .
The syntax for the ignore tag is:
ignore \ **type**\ \ **name**\
The \ **ignore**\ means that it won't generate cross references for a
\ **name**\ symbol of type \ **type**\ .
The syntax for the replace tag is:
replace \ **type**\ \ **name**\ \ **new_value**\
The \ **replace**\ means that it will generate cross references for a
\ **name**\ symbol of type \ **type**\ , but, instead of using the default
replacement rule, it will use \ **new_value**\ .
For both statements, \ **type**\ can be either one of the following:
\ **ioctl**\
The ignore or replace statement will apply to ioctl definitions like:
#define VIDIOC_DBG_S_REGISTER _IOW('V', 79, struct v4l2_dbg_register)
\ **define**\
The ignore or replace statement will apply to any other #define found
at C_FILE.
\ **typedef**\
The ignore or replace statement will apply to typedef statements at C_FILE.
\ **struct**\
The ignore or replace statement will apply to the name of struct statements
at C_FILE.
\ **enum**\
The ignore or replace statement will apply to the name of enum statements
at C_FILE.
\ **symbol**\
The ignore or replace statement will apply to the name of enum statements
at C_FILE.
For replace statements, \ **new_value**\ will automatically use :c:type:
references for \ **typedef**\ , \ **enum**\ and \ **struct**\ types. It will use :ref:
for \ **ioctl**\ , \ **define**\ and \ **symbol**\ types. The type of reference can
also be explicitly defined at the replace statement.
EXAMPLES
********
ignore define _VIDEODEV2_H
Ignore a #define _VIDEODEV2_H at the C_FILE.
ignore symbol PRIVATE
On a struct like:
enum foo { BAR1, BAR2, PRIVATE };
It won't generate cross-references for \ **PRIVATE**\ .
replace symbol BAR1 :c:type:\`foo\`
replace symbol BAR2 :c:type:\`foo\`
On a struct like:
enum foo { BAR1, BAR2, PRIVATE };
It will make the BAR1 and BAR2 enum symbols to cross reference the foo
symbol at the C domain.
BUGS
****
Report bugs to Mauro Carvalho Chehab <mchehab@s-opensource.com>
COPYRIGHT
*********
Copyright (c) 2016 by Mauro Carvalho Chehab <mchehab@s-opensource.com>.
License GPLv2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

View File

@ -0,0 +1,219 @@
Introduction
============
The Linux kernel uses `Sphinx`_ to generate pretty documentation from
`reStructuredText`_ files under ``Documentation``. To build the documentation in
HTML or PDF formats, use ``make htmldocs`` or ``make pdfdocs``. The generated
documentation is placed in ``Documentation/output``.
.. _Sphinx: http://www.sphinx-doc.org/
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
The reStructuredText files may contain directives to include structured
documentation comments, or kernel-doc comments, from source files. Usually these
are used to describe the functions and types and design of the code. The
kernel-doc comments have some special structure and formatting, but beyond that
they are also treated as reStructuredText.
There is also the deprecated DocBook toolchain to generate documentation from
DocBook XML template files under ``Documentation/DocBook``. The DocBook files
are to be converted to reStructuredText, and the toolchain is slated to be
removed.
Finally, there are thousands of plain text documentation files scattered around
``Documentation``. Some of these will likely be converted to reStructuredText
over time, but the bulk of them will remain in plain text.
Sphinx Build
============
The usual way to generate the documentation is to run ``make htmldocs`` or
``make pdfdocs``. There are also other formats available, see the documentation
section of ``make help``. The generated documentation is placed in
format-specific subdirectories under ``Documentation/output``.
To generate documentation, Sphinx (``sphinx-build``) must obviously be
installed. For prettier HTML output, the Read the Docs Sphinx theme
(``sphinx_rtd_theme``) is used if available. For PDF output, ``rst2pdf`` is also
needed. All of these are widely available and packaged in distributions.
To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
variable. For example, use ``make SPHINXOPTS=-v htmldocs`` to get more verbose
output.
To remove the generated documentation, run ``make cleandocs``.
Writing Documentation
=====================
Adding new documentation can be as simple as:
1. Add a new ``.rst`` file somewhere under ``Documentation``.
2. Refer to it from the Sphinx main `TOC tree`_ in ``Documentation/index.rst``.
.. _TOC tree: http://www.sphinx-doc.org/en/stable/markup/toctree.html
This is usually good enough for simple documentation (like the one you're
reading right now), but for larger documents it may be advisable to create a
subdirectory (or use an existing one). For example, the graphics subsystem
documentation is under ``Documentation/gpu``, split to several ``.rst`` files,
and has a separate ``index.rst`` (with a ``toctree`` of its own) referenced from
the main index.
See the documentation for `Sphinx`_ and `reStructuredText`_ on what you can do
with them. In particular, the Sphinx `reStructuredText Primer`_ is a good place
to get started with reStructuredText. There are also some `Sphinx specific
markup constructs`_.
.. _reStructuredText Primer: http://www.sphinx-doc.org/en/stable/rest.html
.. _Sphinx specific markup constructs: http://www.sphinx-doc.org/en/stable/markup/index.html
Specific guidelines for the kernel documentation
------------------------------------------------
Here are some specific guidelines for the kernel documentation:
* Please don't go overboard with reStructuredText markup. Keep it simple.
* Please stick to this order of heading adornments:
1. ``=`` with overline for document title::
==============
Document title
==============
2. ``=`` for chapters::
Chapters
========
3. ``-`` for sections::
Section
-------
4. ``~`` for subsections::
Subsection
~~~~~~~~~~
Although RST doesn't mandate a specific order ("Rather than imposing a fixed
number and order of section title adornment styles, the order enforced will be
the order as encountered."), having the higher levels the same overall makes
it easier to follow the documents.
the C domain
------------
The `Sphinx C Domain`_ (name c) is suited for documentation of C API. E.g. a
function prototype:
.. code-block:: rst
.. c:function:: int ioctl( int fd, int request )
The C domain of the kernel-doc has some additional features. E.g. you can
*rename* the reference name of a function with a common name like ``open`` or
``ioctl``:
.. code-block:: rst
.. c:function:: int ioctl( int fd, int request )
:name: VIDIOC_LOG_STATUS
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
.. code-block:: rst
:c:func:`VIDIOC_LOG_STATUS`
list tables
-----------
We recommend the use of *list table* formats. The *list table* formats are
double-stage lists. Compared to the ASCII-art they might not be as
comfortable for
readers of the text files. Their advantage is that they are easy to
create or modify and that the diff of a modification is much more meaningful,
because it is limited to the modified content.
The ``flat-table`` is a double-stage list similar to the ``list-table`` with
some additional features:
* column-span: with the role ``cspan`` a cell can be extended through
additional columns
* row-span: with the role ``rspan`` a cell can be extended through
additional rows
* auto span rightmost cell of a table row over the missing cells on the right
side of that table-row. With Option ``:fill-cells:`` this behavior can
changed from *auto span* to *auto fill*, which automatically inserts (empty)
cells instead of spanning the last cell.
options:
* ``:header-rows:`` [int] count of header rows
* ``:stub-columns:`` [int] count of stub columns
* ``:widths:`` [[int] [int] ... ] widths of columns
* ``:fill-cells:`` instead of auto-spanning missing cells, insert missing cells
roles:
* ``:cspan:`` [int] additional columns (*morecols*)
* ``:rspan:`` [int] additional rows (*morerows*)
The example below shows how to use this markup. The first level of the staged
list is the *table-row*. In the *table-row* there is only one markup allowed,
the list of the cells in this *table-row*. Exceptions are *comments* ( ``..`` )
and *targets* (e.g. a ref to ``:ref:`last row <last row>``` / :ref:`last row
<last row>`).
.. code-block:: rst
.. flat-table:: table title
:widths: 2 1 1 3
* - head col 1
- head col 2
- head col 3
- head col 4
* - column 1
- field 1.1
- field 1.2 with autospan
* - column 2
- field 2.1
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
* .. _`last row`:
- column 3
Rendered as:
.. flat-table:: table title
:widths: 2 1 1 3
* - head col 1
- head col 2
- head col 3
- head col 4
* - column 1
- field 1.1
- field 1.2 with autospan
* - column 2
- field 2.1
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
* .. _`last row`:
- column 3

View File

@ -3,3 +3,8 @@
project = "Linux 802.11 Driver Developer's Guide"
tags.add("subproject")
latex_documents = [
('index', '80211.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -9,7 +9,7 @@ Linux 802.11 Driver Developer's Guide
mac80211
mac80211-advanced
.. only:: subproject
.. only:: subproject and html
Indices
=======

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "The Linux driver implementer's API guide"
tags.add("subproject")
latex_documents = [
('index', 'driver-api.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,279 @@
============
Device links
============
By default, the driver core only enforces dependencies between devices
that are borne out of a parent/child relationship within the device
hierarchy: When suspending, resuming or shutting down the system, devices
are ordered based on this relationship, i.e. children are always suspended
before their parent, and the parent is always resumed before its children.
Sometimes there is a need to represent device dependencies beyond the
mere parent/child relationship, e.g. between siblings, and have the
driver core automatically take care of them.
Secondly, the driver core by default does not enforce any driver presence
dependencies, i.e. that one device must be bound to a driver before
another one can probe or function correctly.
Often these two dependency types come together, so a device depends on
another one both with regards to driver presence *and* with regards to
suspend/resume and shutdown ordering.
Device links allow representation of such dependencies in the driver core.
In its standard form, a device link combines *both* dependency types:
It guarantees correct suspend/resume and shutdown ordering between a
"supplier" device and its "consumer" devices, and it guarantees driver
presence on the supplier. The consumer devices are not probed before the
supplier is bound to a driver, and they're unbound before the supplier
is unbound.
When driver presence on the supplier is irrelevant and only correct
suspend/resume and shutdown ordering is needed, the device link may
simply be set up with the ``DL_FLAG_STATELESS`` flag. In other words,
enforcing driver presence on the supplier is optional.
Another optional feature is runtime PM integration: By setting the
``DL_FLAG_PM_RUNTIME`` flag on addition of the device link, the PM core
is instructed to runtime resume the supplier and keep it active
whenever and for as long as the consumer is runtime resumed.
Usage
=====
The earliest point in time when device links can be added is after
:c:func:`device_add()` has been called for the supplier and
:c:func:`device_initialize()` has been called for the consumer.
It is legal to add them later, but care must be taken that the system
remains in a consistent state: E.g. a device link cannot be added in
the midst of a suspend/resume transition, so either commencement of
such a transition needs to be prevented with :c:func:`lock_system_sleep()`,
or the device link needs to be added from a function which is guaranteed
not to run in parallel to a suspend/resume transition, such as from a
device ``->probe`` callback or a boot-time PCI quirk.
Another example for an inconsistent state would be a device link that
represents a driver presence dependency, yet is added from the consumer's
``->probe`` callback while the supplier hasn't probed yet: Had the driver
core known about the device link earlier, it wouldn't have probed the
consumer in the first place. The onus is thus on the consumer to check
presence of the supplier after adding the link, and defer probing on
non-presence.
If a device link is added in the ``->probe`` callback of the supplier or
consumer driver, it is typically deleted in its ``->remove`` callback for
symmetry. That way, if the driver is compiled as a module, the device
link is added on module load and orderly deleted on unload. The same
restrictions that apply to device link addition (e.g. exclusion of a
parallel suspend/resume transition) apply equally to deletion.
Several flags may be specified on device link addition, two of which
have already been mentioned above: ``DL_FLAG_STATELESS`` to express that no
driver presence dependency is needed (but only correct suspend/resume and
shutdown ordering) and ``DL_FLAG_PM_RUNTIME`` to express that runtime PM
integration is desired.
Two other flags are specifically targeted at use cases where the device
link is added from the consumer's ``->probe`` callback: ``DL_FLAG_RPM_ACTIVE``
can be specified to runtime resume the supplier upon addition of the
device link. ``DL_FLAG_AUTOREMOVE`` causes the device link to be automatically
purged when the consumer fails to probe or later unbinds. This obviates
the need to explicitly delete the link in the ``->remove`` callback or in
the error path of the ``->probe`` callback.
Limitations
===========
Driver authors should be aware that a driver presence dependency (i.e. when
``DL_FLAG_STATELESS`` is not specified on link addition) may cause probing of
the consumer to be deferred indefinitely. This can become a problem if the
consumer is required to probe before a certain initcall level is reached.
Worse, if the supplier driver is blacklisted or missing, the consumer will
never be probed.
Sometimes drivers depend on optional resources. They are able to operate
in a degraded mode (reduced feature set or performance) when those resources
are not present. An example is an SPI controller that can use a DMA engine
or work in PIO mode. The controller can determine presence of the optional
resources at probe time but on non-presence there is no way to know whether
they will become available in the near future (due to a supplier driver
probing) or never. Consequently it cannot be determined whether to defer
probing or not. It would be possible to notify drivers when optional
resources become available after probing, but it would come at a high cost
for drivers as switching between modes of operation at runtime based on the
availability of such resources would be much more complex than a mechanism
based on probe deferral. In any case optional resources are beyond the
scope of device links.
Examples
========
* An MMU device exists alongside a busmaster device, both are in the same
power domain. The MMU implements DMA address translation for the busmaster
device and shall be runtime resumed and kept active whenever and as long
as the busmaster device is active. The busmaster device's driver shall
not bind before the MMU is bound. To achieve this, a device link with
runtime PM integration is added from the busmaster device (consumer)
to the MMU device (supplier). The effect with regards to runtime PM
is the same as if the MMU was the parent of the master device.
The fact that both devices share the same power domain would normally
suggest usage of a :c:type:`struct dev_pm_domain` or :c:type:`struct
generic_pm_domain`, however these are not independent devices that
happen to share a power switch, but rather the MMU device serves the
busmaster device and is useless without it. A device link creates a
synthetic hierarchical relationship between the devices and is thus
more apt.
* A Thunderbolt host controller comprises a number of PCIe hotplug ports
and an NHI device to manage the PCIe switch. On resume from system sleep,
the NHI device needs to re-establish PCI tunnels to attached devices
before the hotplug ports can resume. If the hotplug ports were children
of the NHI, this resume order would automatically be enforced by the
PM core, but unfortunately they're aunts. The solution is to add
device links from the hotplug ports (consumers) to the NHI device
(supplier). A driver presence dependency is not necessary for this
use case.
* Discrete GPUs in hybrid graphics laptops often feature an HDA controller
for HDMI/DP audio. In the device hierarchy the HDA controller is a sibling
of the VGA device, yet both share the same power domain and the HDA
controller is only ever needed when an HDMI/DP display is attached to the
VGA device. A device link from the HDA controller (consumer) to the
VGA device (supplier) aptly represents this relationship.
* ACPI allows definition of a device start order by way of _DEP objects.
A classical example is when ACPI power management methods on one device
are implemented in terms of I\ :sup:`2`\ C accesses and require a specific
I\ :sup:`2`\ C controller to be present and functional for the power
management of the device in question to work.
* In some SoCs a functional dependency exists from display, video codec and
video processing IP cores on transparent memory access IP cores that handle
burst access and compression/decompression.
Alternatives
============
* A :c:type:`struct dev_pm_domain` can be used to override the bus,
class or device type callbacks. It is intended for devices sharing
a single on/off switch, however it does not guarantee a specific
suspend/resume ordering, this needs to be implemented separately.
It also does not by itself track the runtime PM status of the involved
devices and turn off the power switch only when all of them are runtime
suspended. Furthermore it cannot be used to enforce a specific shutdown
ordering or a driver presence dependency.
* A :c:type:`struct generic_pm_domain` is a lot more heavyweight than a
device link and does not allow for shutdown ordering or driver presence
dependencies. It also cannot be used on ACPI systems.
Implementation
==============
The device hierarchy, which -- as the name implies -- is a tree,
becomes a directed acyclic graph once device links are added.
Ordering of these devices during suspend/resume is determined by the
dpm_list. During shutdown it is determined by the devices_kset. With
no device links present, the two lists are a flattened, one-dimensional
representations of the device tree such that a device is placed behind
all its ancestors. That is achieved by traversing the ACPI namespace
or OpenFirmware device tree top-down and appending devices to the lists
as they are discovered.
Once device links are added, the lists need to satisfy the additional
constraint that a device is placed behind all its suppliers, recursively.
To ensure this, upon addition of the device link the consumer and the
entire sub-graph below it (all children and consumers of the consumer)
are moved to the end of the list. (Call to :c:func:`device_reorder_to_tail()`
from :c:func:`device_link_add()`.)
To prevent introduction of dependency loops into the graph, it is
verified upon device link addition that the supplier is not dependent
on the consumer or any children or consumers of the consumer.
(Call to :c:func:`device_is_dependent()` from :c:func:`device_link_add()`.)
If that constraint is violated, :c:func:`device_link_add()` will return
``NULL`` and a ``WARNING`` will be logged.
Notably this also prevents the addition of a device link from a parent
device to a child. However the converse is allowed, i.e. a device link
from a child to a parent. Since the driver core already guarantees
correct suspend/resume and shutdown ordering between parent and child,
such a device link only makes sense if a driver presence dependency is
needed on top of that. In this case driver authors should weigh
carefully if a device link is at all the right tool for the purpose.
A more suitable approach might be to simply use deferred probing or
add a device flag causing the parent driver to be probed before the
child one.
State machine
=============
.. kernel-doc:: include/linux/device.h
:functions: device_link_state
::
.=============================.
| |
v |
DORMANT <=> AVAILABLE <=> CONSUMER_PROBE => ACTIVE
^ |
| |
'============ SUPPLIER_UNBIND <============'
* The initial state of a device link is automatically determined by
:c:func:`device_link_add()` based on the driver presence on the supplier
and consumer. If the link is created before any devices are probed, it
is set to ``DL_STATE_DORMANT``.
* When a supplier device is bound to a driver, links to its consumers
progress to ``DL_STATE_AVAILABLE``.
(Call to :c:func:`device_links_driver_bound()` from
:c:func:`driver_bound()`.)
* Before a consumer device is probed, presence of supplier drivers is
verified by checking that links to suppliers are in ``DL_STATE_AVAILABLE``
state. The state of the links is updated to ``DL_STATE_CONSUMER_PROBE``.
(Call to :c:func:`device_links_check_suppliers()` from
:c:func:`really_probe()`.)
This prevents the supplier from unbinding.
(Call to :c:func:`wait_for_device_probe()` from
:c:func:`device_links_unbind_consumers()`.)
* If the probe fails, links to suppliers revert back to ``DL_STATE_AVAILABLE``.
(Call to :c:func:`device_links_no_driver()` from :c:func:`really_probe()`.)
* If the probe succeeds, links to suppliers progress to ``DL_STATE_ACTIVE``.
(Call to :c:func:`device_links_driver_bound()` from :c:func:`driver_bound()`.)
* When the consumer's driver is later on removed, links to suppliers revert
back to ``DL_STATE_AVAILABLE``.
(Call to :c:func:`__device_links_no_driver()` from
:c:func:`device_links_driver_cleanup()`, which in turn is called from
:c:func:`__device_release_driver()`.)
* Before a supplier's driver is removed, links to consumers that are not
bound to a driver are updated to ``DL_STATE_SUPPLIER_UNBIND``.
(Call to :c:func:`device_links_busy()` from
:c:func:`__device_release_driver()`.)
This prevents the consumers from binding.
(Call to :c:func:`device_links_check_suppliers()` from
:c:func:`really_probe()`.)
Consumers that are bound are freed from their driver; consumers that are
probing are waited for until they are done.
(Call to :c:func:`device_links_unbind_consumers()` from
:c:func:`__device_release_driver()`.)
Once all links to consumers are in ``DL_STATE_SUPPLIER_UNBIND`` state,
the supplier driver is released and the links revert to ``DL_STATE_DORMANT``.
(Call to :c:func:`device_links_driver_cleanup()` from
:c:func:`__device_release_driver()`.)
API
===
.. kernel-doc:: drivers/base/core.c
:functions: device_link_add device_link_del

View File

@ -0,0 +1,73 @@
Buffer Sharing and Synchronization
==================================
The dma-buf subsystem provides the framework for sharing buffers for
hardware (DMA) access across multiple device drivers and subsystems, and
for synchronizing asynchronous hardware access.
This is used, for example, by drm "prime" multi-GPU support, but is of
course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a
sg_table and exposed to userspace as a file descriptor to allow passing
between devices, (2) fence, which provides a mechanism to signal when
one device as finished access, and (3) reservation, which manages the
shared or exclusive fence(s) associated with the buffer.
Shared DMA Buffers
------------------
.. kernel-doc:: drivers/dma-buf/dma-buf.c
:export:
.. kernel-doc:: include/linux/dma-buf.h
:internal:
Reservation Objects
-------------------
.. kernel-doc:: drivers/dma-buf/reservation.c
:doc: Reservation Object Overview
.. kernel-doc:: drivers/dma-buf/reservation.c
:export:
.. kernel-doc:: include/linux/reservation.h
:internal:
DMA Fences
----------
.. kernel-doc:: drivers/dma-buf/dma-fence.c
:export:
.. kernel-doc:: include/linux/dma-fence.h
:internal:
Seqno Hardware Fences
~~~~~~~~~~~~~~~~~~~~~
.. kernel-doc:: drivers/dma-buf/seqno-fence.c
:export:
.. kernel-doc:: include/linux/seqno-fence.h
:internal:
DMA Fence Array
~~~~~~~~~~~~~~~
.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
:export:
.. kernel-doc:: include/linux/dma-fence-array.h
:internal:
DMA Fence uABI/Sync File
~~~~~~~~~~~~~~~~~~~~~~~~
.. kernel-doc:: drivers/dma-buf/sync_file.c
:export:
.. kernel-doc:: include/linux/sync_file.h
:internal:

View File

@ -16,11 +16,23 @@ available subsections can be seen below.
basics
infrastructure
dma-buf
device_link
message-based
sound
frame-buffer
input
usb
spi
i2c
hsi
miscellaneous
vme
80211/index
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@ -46,76 +46,6 @@ Device Drivers Base
.. kernel-doc:: drivers/base/bus.c
:export:
Buffer Sharing and Synchronization
----------------------------------
The dma-buf subsystem provides the framework for sharing buffers for
hardware (DMA) access across multiple device drivers and subsystems, and
for synchronizing asynchronous hardware access.
This is used, for example, by drm "prime" multi-GPU support, but is of
course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a
sg_table and exposed to userspace as a file descriptor to allow passing
between devices, (2) fence, which provides a mechanism to signal when
one device as finished access, and (3) reservation, which manages the
shared or exclusive fence(s) associated with the buffer.
dma-buf
~~~~~~~
.. kernel-doc:: drivers/dma-buf/dma-buf.c
:export:
.. kernel-doc:: include/linux/dma-buf.h
:internal:
reservation
~~~~~~~~~~~
.. kernel-doc:: drivers/dma-buf/reservation.c
:doc: Reservation Object Overview
.. kernel-doc:: drivers/dma-buf/reservation.c
:export:
.. kernel-doc:: include/linux/reservation.h
:internal:
fence
~~~~~
.. kernel-doc:: drivers/dma-buf/dma-fence.c
:export:
.. kernel-doc:: include/linux/dma-fence.h
:internal:
.. kernel-doc:: drivers/dma-buf/seqno-fence.c
:export:
.. kernel-doc:: include/linux/seqno-fence.h
:internal:
.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
:export:
.. kernel-doc:: include/linux/dma-fence-array.h
:internal:
.. kernel-doc:: drivers/dma-buf/reservation.c
:export:
.. kernel-doc:: include/linux/reservation.h
:internal:
.. kernel-doc:: drivers/dma-buf/sync_file.c
:export:
.. kernel-doc:: include/linux/sync_file.h
:internal:
Device Drivers DMA Management
-----------------------------

View File

@ -0,0 +1,748 @@
===========================
The Linux-USB Host Side API
===========================
Introduction to USB on Linux
============================
A Universal Serial Bus (USB) is used to connect a host, such as a PC or
workstation, to a number of peripheral devices. USB uses a tree
structure, with the host as the root (the system's master), hubs as
interior nodes, and peripherals as leaves (and slaves). Modern PCs
support several such trees of USB devices, usually
a few USB 3.0 (5 GBit/s) or USB 3.1 (10 GBit/s) and some legacy
USB 2.0 (480 MBit/s) busses just in case.
That master/slave asymmetry was designed-in for a number of reasons, one
being ease of use. It is not physically possible to mistake upstream and
downstream or it does not matter with a type C plug (or they are built into the
peripheral). Also, the host software doesn't need to deal with
distributed auto-configuration since the pre-designated master node
manages all that.
Kernel developers added USB support to Linux early in the 2.2 kernel
series and have been developing it further since then. Besides support
for each new generation of USB, various host controllers gained support,
new drivers for peripherals have been added and advanced features for latency
measurement and improved power management introduced.
Linux can run inside USB devices as well as on the hosts that control
the devices. But USB device drivers running inside those peripherals
don't do the same things as the ones running inside hosts, so they've
been given a different name: *gadget drivers*. This document does not
cover gadget drivers.
USB Host-Side API Model
=======================
Host-side drivers for USB devices talk to the "usbcore" APIs. There are
two. One is intended for *general-purpose* drivers (exposed through
driver frameworks), and the other is for drivers that are *part of the
core*. Such core drivers include the *hub* driver (which manages trees
of USB devices) and several different kinds of *host controller
drivers*, which control individual busses.
The device model seen by USB drivers is relatively complex.
- USB supports four kinds of data transfers (control, bulk, interrupt,
and isochronous). Two of them (control and bulk) use bandwidth as
it's available, while the other two (interrupt and isochronous) are
scheduled to provide guaranteed bandwidth.
- The device description model includes one or more "configurations"
per device, only one of which is active at a time. Devices are supposed
to be capable of operating at lower than their top
speeds and may provide a BOS descriptor showing the lowest speed they
remain fully operational at.
- From USB 3.0 on configurations have one or more "functions", which
provide a common functionality and are grouped together for purposes
of power management.
- Configurations or functions have one or more "interfaces", each of which may have
"alternate settings". Interfaces may be standardized by USB "Class"
specifications, or may be specific to a vendor or device.
USB device drivers actually bind to interfaces, not devices. Think of
them as "interface drivers", though you may not see many devices
where the distinction is important. *Most USB devices are simple,
with only one function, one configuration, one interface, and one alternate
setting.*
- Interfaces have one or more "endpoints", each of which supports one
type and direction of data transfer such as "bulk out" or "interrupt
in". The entire configuration may have up to sixteen endpoints in
each direction, allocated as needed among all the interfaces.
- Data transfer on USB is packetized; each endpoint has a maximum
packet size. Drivers must often be aware of conventions such as
flagging the end of bulk transfers using "short" (including zero
length) packets.
- The Linux USB API supports synchronous calls for control and bulk
messages. It also supports asynchronous calls for all kinds of data
transfer, using request structures called "URBs" (USB Request
Blocks).
Accordingly, the USB Core API exposed to device drivers covers quite a
lot of territory. You'll probably need to consult the USB 3.0
specification, available online from www.usb.org at no cost, as well as
class or device specifications.
The only host-side drivers that actually touch hardware (reading/writing
registers, handling IRQs, and so on) are the HCDs. In theory, all HCDs
provide the same functionality through the same API. In practice, that's
becoming more true, but there are still differences
that crop up especially with fault handling on the less common controllers.
Different controllers don't
necessarily report the same aspects of failures, and recovery from
faults (including software-induced ones like unlinking an URB) isn't yet
fully consistent. Device driver authors should make a point of doing
disconnect testing (while the device is active) with each different host
controller driver, to make sure drivers don't have bugs of their own as
well as to make sure they aren't relying on some HCD-specific behavior.
USB-Standard Types
==================
In ``<linux/usb/ch9.h>`` you will find the USB data types defined in
chapter 9 of the USB specification. These data types are used throughout
USB, and in APIs including this host side API, gadget APIs, and usbfs.
.. kernel-doc:: include/linux/usb/ch9.h
:internal:
Host-Side Data Types and Macros
===============================
The host side API exposes several layers to drivers, some of which are
more necessary than others. These support lifecycle models for host side
drivers and devices, and support passing buffers through usbcore to some
HCD that performs the I/O for the device driver.
.. kernel-doc:: include/linux/usb.h
:internal:
USB Core APIs
=============
There are two basic I/O models in the USB API. The most elemental one is
asynchronous: drivers submit requests in the form of an URB, and the
URB's completion callback handles the next step. All USB transfer types
support that model, although there are special cases for control URBs
(which always have setup and status stages, but may not have a data
stage) and isochronous URBs (which allow large packets and include
per-packet fault reports). Built on top of that is synchronous API
support, where a driver calls a routine that allocates one or more URBs,
submits them, and waits until they complete. There are synchronous
wrappers for single-buffer control and bulk transfers (which are awkward
to use in some driver disconnect scenarios), and for scatterlist based
streaming i/o (bulk or interrupt).
USB drivers need to provide buffers that can be used for DMA, although
they don't necessarily need to provide the DMA mapping themselves. There
are APIs to use used when allocating DMA buffers, which can prevent use
of bounce buffers on some systems. In some cases, drivers may be able to
rely on 64bit DMA to eliminate another kind of bounce buffer.
.. kernel-doc:: drivers/usb/core/urb.c
:export:
.. kernel-doc:: drivers/usb/core/message.c
:export:
.. kernel-doc:: drivers/usb/core/file.c
:export:
.. kernel-doc:: drivers/usb/core/driver.c
:export:
.. kernel-doc:: drivers/usb/core/usb.c
:export:
.. kernel-doc:: drivers/usb/core/hub.c
:export:
Host Controller APIs
====================
These APIs are only for use by host controller drivers, most of which
implement standard register interfaces such as XHCI, EHCI, OHCI, or UHCI. UHCI
was one of the first interfaces, designed by Intel and also used by VIA;
it doesn't do much in hardware. OHCI was designed later, to have the
hardware do more work (bigger transfers, tracking protocol state, and so
on). EHCI was designed with USB 2.0; its design has features that
resemble OHCI (hardware does much more work) as well as UHCI (some parts
of ISO support, TD list processing). XHCI was designed with USB 3.0. It
continues to shift support for functionality into hardware.
There are host controllers other than the "big three", although most PCI
based controllers (and a few non-PCI based ones) use one of those
interfaces. Not all host controllers use DMA; some use PIO, and there is
also a simulator and a virtual host controller to pipe USB over the network.
The same basic APIs are available to drivers for all those controllers.
For historical reasons they are in two layers: :c:type:`struct
usb_bus <usb_bus>` is a rather thin layer that became available
in the 2.2 kernels, while :c:type:`struct usb_hcd <usb_hcd>`
is a more featureful layer
that lets HCDs share common code, to shrink driver size and
significantly reduce hcd-specific behaviors.
.. kernel-doc:: drivers/usb/core/hcd.c
:export:
.. kernel-doc:: drivers/usb/core/hcd-pci.c
:export:
.. kernel-doc:: drivers/usb/core/buffer.c
:internal:
The USB Filesystem (usbfs)
==========================
This chapter presents the Linux *usbfs*. You may prefer to avoid writing
new kernel code for your USB driver; that's the problem that usbfs set
out to solve. User mode device drivers are usually packaged as
applications or libraries, and may use usbfs through some programming
library that wraps it. Such libraries include
`libusb <http://libusb.sourceforge.net>`__ for C/C++, and
`jUSB <http://jUSB.sourceforge.net>`__ for Java.
**Note**
This particular documentation is incomplete, especially with respect
to the asynchronous mode. As of kernel 2.5.66 the code and this
(new) documentation need to be cross-reviewed.
Configure usbfs into Linux kernels by enabling the *USB filesystem*
option (CONFIG_USB_DEVICEFS), and you get basic support for user mode
USB device drivers. Until relatively recently it was often (confusingly)
called *usbdevfs* although it wasn't solving what *devfs* was. Every USB
device will appear in usbfs, regardless of whether or not it has a
kernel driver.
What files are in "usbfs"?
--------------------------
Conventionally mounted at ``/proc/bus/usb``, usbfs features include:
- ``/proc/bus/usb/devices`` ... a text file showing each of the USB
devices on known to the kernel, and their configuration descriptors.
You can also poll() this to learn about new devices.
- ``/proc/bus/usb/BBB/DDD`` ... magic files exposing the each device's
configuration descriptors, and supporting a series of ioctls for
making device requests, including I/O to devices. (Purely for access
by programs.)
Each bus is given a number (BBB) based on when it was enumerated; within
each bus, each device is given a similar number (DDD). Those BBB/DDD
paths are not "stable" identifiers; expect them to change even if you
always leave the devices plugged in to the same hub port. *Don't even
think of saving these in application configuration files.* Stable
identifiers are available, for user mode applications that want to use
them. HID and networking devices expose these stable IDs, so that for
example you can be sure that you told the right UPS to power down its
second server. "usbfs" doesn't (yet) expose those IDs.
Mounting and Access Control
---------------------------
There are a number of mount options for usbfs, which will be of most
interest to you if you need to override the default access control
policy. That policy is that only root may read or write device files
(``/proc/bus/BBB/DDD``) although anyone may read the ``devices`` or
``drivers`` files. I/O requests to the device also need the
CAP_SYS_RAWIO capability,
The significance of that is that by default, all user mode device
drivers need super-user privileges. You can change modes or ownership in
a driver setup when the device hotplugs, or maye just start the driver
right then, as a privileged server (or some activity within one). That's
the most secure approach for multi-user systems, but for single user
systems ("trusted" by that user) it's more convenient just to grant
everyone all access (using the *devmode=0666* option) so the driver can
start whenever it's needed.
The mount options for usbfs, usable in /etc/fstab or in command line
invocations of *mount*, are:
*busgid*\ =NNNNN
Controls the GID used for the /proc/bus/usb/BBB directories.
(Default: 0)
*busmode*\ =MMM
Controls the file mode used for the /proc/bus/usb/BBB directories.
(Default: 0555)
*busuid*\ =NNNNN
Controls the UID used for the /proc/bus/usb/BBB directories.
(Default: 0)
*devgid*\ =NNNNN
Controls the GID used for the /proc/bus/usb/BBB/DDD files. (Default:
0)
*devmode*\ =MMM
Controls the file mode used for the /proc/bus/usb/BBB/DDD files.
(Default: 0644)
*devuid*\ =NNNNN
Controls the UID used for the /proc/bus/usb/BBB/DDD files. (Default:
0)
*listgid*\ =NNNNN
Controls the GID used for the /proc/bus/usb/devices and drivers
files. (Default: 0)
*listmode*\ =MMM
Controls the file mode used for the /proc/bus/usb/devices and
drivers files. (Default: 0444)
*listuid*\ =NNNNN
Controls the UID used for the /proc/bus/usb/devices and drivers
files. (Default: 0)
Note that many Linux distributions hard-wire the mount options for usbfs
in their init scripts, such as ``/etc/rc.d/rc.sysinit``, rather than
making it easy to set this per-system policy in ``/etc/fstab``.
/proc/bus/usb/devices
---------------------
This file is handy for status viewing tools in user mode, which can scan
the text format and ignore most of it. More detailed device status
(including class and vendor status) is available from device-specific
files. For information about the current format of this file, see the
``Documentation/usb/proc_usb_info.txt`` file in your Linux kernel
sources.
This file, in combination with the poll() system call, can also be used
to detect when devices are added or removed:
::
int fd;
struct pollfd pfd;
fd = open("/proc/bus/usb/devices", O_RDONLY);
pfd = { fd, POLLIN, 0 };
for (;;) {
/* The first time through, this call will return immediately. */
poll(&pfd, 1, -1);
/* To see what's changed, compare the file's previous and current
contents or scan the filesystem. (Scanning is more precise.) */
}
Note that this behavior is intended to be used for informational and
debug purposes. It would be more appropriate to use programs such as
udev or HAL to initialize a device or start a user-mode helper program,
for instance.
/proc/bus/usb/BBB/DDD
---------------------
Use these files in one of these basic ways:
*They can be read,* producing first the device descriptor (18 bytes) and
then the descriptors for the current configuration. See the USB 2.0 spec
for details about those binary data formats. You'll need to convert most
multibyte values from little endian format to your native host byte
order, although a few of the fields in the device descriptor (both of
the BCD-encoded fields, and the vendor and product IDs) will be
byteswapped for you. Note that configuration descriptors include
descriptors for interfaces, altsettings, endpoints, and maybe additional
class descriptors.
*Perform USB operations* using *ioctl()* requests to make endpoint I/O
requests (synchronously or asynchronously) or manage the device. These
requests need the CAP_SYS_RAWIO capability, as well as filesystem
access permissions. Only one ioctl request can be made on one of these
device files at a time. This means that if you are synchronously reading
an endpoint from one thread, you won't be able to write to a different
endpoint from another thread until the read completes. This works for
*half duplex* protocols, but otherwise you'd use asynchronous i/o
requests.
Life Cycle of User Mode Drivers
-------------------------------
Such a driver first needs to find a device file for a device it knows
how to handle. Maybe it was told about it because a ``/sbin/hotplug``
event handling agent chose that driver to handle the new device. Or
maybe it's an application that scans all the /proc/bus/usb device files,
and ignores most devices. In either case, it should :c:func:`read()`
all the descriptors from the device file, and check them against what it
knows how to handle. It might just reject everything except a particular
vendor and product ID, or need a more complex policy.
Never assume there will only be one such device on the system at a time!
If your code can't handle more than one device at a time, at least
detect when there's more than one, and have your users choose which
device to use.
Once your user mode driver knows what device to use, it interacts with
it in either of two styles. The simple style is to make only control
requests; some devices don't need more complex interactions than those.
(An example might be software using vendor-specific control requests for
some initialization or configuration tasks, with a kernel driver for the
rest.)
More likely, you need a more complex style driver: one using non-control
endpoints, reading or writing data and claiming exclusive use of an
interface. *Bulk* transfers are easiest to use, but only their sibling
*interrupt* transfers work with low speed devices. Both interrupt and
*isochronous* transfers offer service guarantees because their bandwidth
is reserved. Such "periodic" transfers are awkward to use through usbfs,
unless you're using the asynchronous calls. However, interrupt transfers
can also be used in a synchronous "one shot" style.
Your user-mode driver should never need to worry about cleaning up
request state when the device is disconnected, although it should close
its open file descriptors as soon as it starts seeing the ENODEV errors.
The ioctl() Requests
--------------------
To use these ioctls, you need to include the following headers in your
userspace program:
::
#include <linux/usb.h>
#include <linux/usbdevice_fs.h>
#include <asm/byteorder.h>
The standard USB device model requests, from "Chapter 9" of the USB 2.0
specification, are automatically included from the ``<linux/usb/ch9.h>``
header.
Unless noted otherwise, the ioctl requests described here will update
the modification time on the usbfs file to which they are applied
(unless they fail). A return of zero indicates success; otherwise, a
standard USB error code is returned. (These are documented in
``Documentation/usb/error-codes.txt`` in your kernel sources.)
Each of these files multiplexes access to several I/O streams, one per
endpoint. Each device has one control endpoint (endpoint zero) which
supports a limited RPC style RPC access. Devices are configured by
hub_wq (in the kernel) setting a device-wide *configuration* that
affects things like power consumption and basic functionality. The
endpoints are part of USB *interfaces*, which may have *altsettings*
affecting things like which endpoints are available. Many devices only
have a single configuration and interface, so drivers for them will
ignore configurations and altsettings.
Management/Status Requests
~~~~~~~~~~~~~~~~~~~~~~~~~~
A number of usbfs requests don't deal very directly with device I/O.
They mostly relate to device management and status. These are all
synchronous requests.
USBDEVFS_CLAIMINTERFACE
This is used to force usbfs to claim a specific interface, which has
not previously been claimed by usbfs or any other kernel driver. The
ioctl parameter is an integer holding the number of the interface
(bInterfaceNumber from descriptor).
Note that if your driver doesn't claim an interface before trying to
use one of its endpoints, and no other driver has bound to it, then
the interface is automatically claimed by usbfs.
This claim will be released by a RELEASEINTERFACE ioctl, or by
closing the file descriptor. File modification time is not updated
by this request.
USBDEVFS_CONNECTINFO
Says whether the device is lowspeed. The ioctl parameter points to a
structure like this:
::
struct usbdevfs_connectinfo {
unsigned int devnum;
unsigned char slow;
};
File modification time is not updated by this request.
*You can't tell whether a "not slow" device is connected at high
speed (480 MBit/sec) or just full speed (12 MBit/sec).* You should
know the devnum value already, it's the DDD value of the device file
name.
USBDEVFS_GETDRIVER
Returns the name of the kernel driver bound to a given interface (a
string). Parameter is a pointer to this structure, which is
modified:
::
struct usbdevfs_getdriver {
unsigned int interface;
char driver[USBDEVFS_MAXDRIVERNAME + 1];
};
File modification time is not updated by this request.
USBDEVFS_IOCTL
Passes a request from userspace through to a kernel driver that has
an ioctl entry in the *struct usb_driver* it registered.
::
struct usbdevfs_ioctl {
int ifno;
int ioctl_code;
void *data;
};
/* user mode call looks like this.
* 'request' becomes the driver->ioctl() 'code' parameter.
* the size of 'param' is encoded in 'request', and that data
* is copied to or from the driver->ioctl() 'buf' parameter.
*/
static int
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
{
struct usbdevfs_ioctl wrapper;
wrapper.ifno = ifno;
wrapper.ioctl_code = request;
wrapper.data = param;
return ioctl (fd, USBDEVFS_IOCTL, &wrapper);
}
File modification time is not updated by this request.
This request lets kernel drivers talk to user mode code through
filesystem operations even when they don't create a character or
block special device. It's also been used to do things like ask
devices what device special file should be used. Two pre-defined
ioctls are used to disconnect and reconnect kernel drivers, so that
user mode code can completely manage binding and configuration of
devices.
USBDEVFS_RELEASEINTERFACE
This is used to release the claim usbfs made on interface, either
implicitly or because of a USBDEVFS_CLAIMINTERFACE call, before the
file descriptor is closed. The ioctl parameter is an integer holding
the number of the interface (bInterfaceNumber from descriptor); File
modification time is not updated by this request.
**Warning**
*No security check is made to ensure that the task which made
the claim is the one which is releasing it. This means that user
mode driver may interfere other ones.*
USBDEVFS_RESETEP
Resets the data toggle value for an endpoint (bulk or interrupt) to
DATA0. The ioctl parameter is an integer endpoint number (1 to 15,
as identified in the endpoint descriptor), with USB_DIR_IN added
if the device's endpoint sends data to the host.
**Warning**
*Avoid using this request. It should probably be removed.* Using
it typically means the device and driver will lose toggle
synchronization. If you really lost synchronization, you likely
need to completely handshake with the device, using a request
like CLEAR_HALT or SET_INTERFACE.
USBDEVFS_DROP_PRIVILEGES
This is used to relinquish the ability to do certain operations
which are considered to be privileged on a usbfs file descriptor.
This includes claiming arbitrary interfaces, resetting a device on
which there are currently claimed interfaces from other users, and
issuing USBDEVFS_IOCTL calls. The ioctl parameter is a 32 bit mask
of interfaces the user is allowed to claim on this file descriptor.
You may issue this ioctl more than one time to narrow said mask.
Synchronous I/O Support
~~~~~~~~~~~~~~~~~~~~~~~
Synchronous requests involve the kernel blocking until the user mode
request completes, either by finishing successfully or by reporting an
error. In most cases this is the simplest way to use usbfs, although as
noted above it does prevent performing I/O to more than one endpoint at
a time.
USBDEVFS_BULK
Issues a bulk read or write request to the device. The ioctl
parameter is a pointer to this structure:
::
struct usbdevfs_bulktransfer {
unsigned int ep;
unsigned int len;
unsigned int timeout; /* in milliseconds */
void *data;
};
The "ep" value identifies a bulk endpoint number (1 to 15, as
identified in an endpoint descriptor), masked with USB_DIR_IN when
referring to an endpoint which sends data to the host from the
device. The length of the data buffer is identified by "len"; Recent
kernels support requests up to about 128KBytes. *FIXME say how read
length is returned, and how short reads are handled.*.
USBDEVFS_CLEAR_HALT
Clears endpoint halt (stall) and resets the endpoint toggle. This is
only meaningful for bulk or interrupt endpoints. The ioctl parameter
is an integer endpoint number (1 to 15, as identified in an endpoint
descriptor), masked with USB_DIR_IN when referring to an endpoint
which sends data to the host from the device.
Use this on bulk or interrupt endpoints which have stalled,
returning *-EPIPE* status to a data transfer request. Do not issue
the control request directly, since that could invalidate the host's
record of the data toggle.
USBDEVFS_CONTROL
Issues a control request to the device. The ioctl parameter points
to a structure like this:
::
struct usbdevfs_ctrltransfer {
__u8 bRequestType;
__u8 bRequest;
__u16 wValue;
__u16 wIndex;
__u16 wLength;
__u32 timeout; /* in milliseconds */
void *data;
};
The first eight bytes of this structure are the contents of the
SETUP packet to be sent to the device; see the USB 2.0 specification
for details. The bRequestType value is composed by combining a
USB_TYPE_\* value, a USB_DIR_\* value, and a USB_RECIP_\*
value (from *<linux/usb.h>*). If wLength is nonzero, it describes
the length of the data buffer, which is either written to the device
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
At this writing, you can't transfer more than 4 KBytes of data to or
from a device; usbfs has a limit, and some host controller drivers
have a limit. (That's not usually a problem.) *Also* there's no way
to say it's not OK to get a short read back from the device.
USBDEVFS_RESET
Does a USB level device reset. The ioctl parameter is ignored. After
the reset, this rebinds all device interfaces. File modification
time is not updated by this request.
**Warning**
*Avoid using this call* until some usbcore bugs get fixed, since
it does not fully synchronize device, interface, and driver (not
just usbfs) state.
USBDEVFS_SETINTERFACE
Sets the alternate setting for an interface. The ioctl parameter is
a pointer to a structure like this:
::
struct usbdevfs_setinterface {
unsigned int interface;
unsigned int altsetting;
};
File modification time is not updated by this request.
Those struct members are from some interface descriptor applying to
the current configuration. The interface number is the
bInterfaceNumber value, and the altsetting number is the
bAlternateSetting value. (This resets each endpoint in the
interface.)
USBDEVFS_SETCONFIGURATION
Issues the :c:func:`usb_set_configuration()` call for the
device. The parameter is an integer holding the number of a
configuration (bConfigurationValue from descriptor). File
modification time is not updated by this request.
**Warning**
*Avoid using this call* until some usbcore bugs get fixed, since
it does not fully synchronize device, interface, and driver (not
just usbfs) state.
Asynchronous I/O Support
~~~~~~~~~~~~~~~~~~~~~~~~
As mentioned above, there are situations where it may be important to
initiate concurrent operations from user mode code. This is particularly
important for periodic transfers (interrupt and isochronous), but it can
be used for other kinds of USB requests too. In such cases, the
asynchronous requests described here are essential. Rather than
submitting one request and having the kernel block until it completes,
the blocking is separate.
These requests are packaged into a structure that resembles the URB used
by kernel device drivers. (No POSIX Async I/O support here, sorry.) It
identifies the endpoint type (USBDEVFS_URB_TYPE_\*), endpoint
(number, masked with USB_DIR_IN as appropriate), buffer and length,
and a user "context" value serving to uniquely identify each request.
(It's usually a pointer to per-request data.) Flags can modify requests
(not as many as supported for kernel drivers).
Each request can specify a realtime signal number (between SIGRTMIN and
SIGRTMAX, inclusive) to request a signal be sent when the request
completes.
When usbfs returns these urbs, the status value is updated, and the
buffer may have been modified. Except for isochronous transfers, the
actual_length is updated to say how many bytes were transferred; if the
USBDEVFS_URB_DISABLE_SPD flag is set ("short packets are not OK"), if
fewer bytes were read than were requested then you get an error report.
::
struct usbdevfs_iso_packet_desc {
unsigned int length;
unsigned int actual_length;
unsigned int status;
};
struct usbdevfs_urb {
unsigned char type;
unsigned char endpoint;
int status;
unsigned int flags;
void *buffer;
int buffer_length;
int actual_length;
int start_frame;
int number_of_packets;
int error_count;
unsigned int signr;
void *usercontext;
struct usbdevfs_iso_packet_desc iso_frame_desc[];
};
For these asynchronous requests, the file modification time reflects
when the request was initiated. This contrasts with their use with the
synchronous requests, where it reflects when requests complete.
USBDEVFS_DISCARDURB
*TBS* File modification time is not updated by this request.
USBDEVFS_DISCSIGNAL
*TBS* File modification time is not updated by this request.
USBDEVFS_REAPURB
*TBS* File modification time is not updated by this request.
USBDEVFS_REAPURBNDELAY
*TBS* File modification time is not updated by this request.
USBDEVFS_SUBMITURB
*TBS*

View File

@ -1,13 +1,15 @@
VME Device Driver API
=====================
VME Device Drivers
==================
Driver registration
===================
-------------------
As with other subsystems within the Linux kernel, VME device drivers register
with the VME subsystem, typically called from the devices init routine. This is
achieved via a call to the following function:
.. code-block:: c
int vme_register_driver (struct vme_driver *driver, unsigned int ndevs);
If driver registration is successful this function returns zero, if an error
@ -17,6 +19,8 @@ A pointer to a structure of type 'vme_driver' must be provided to the
registration function. Along with ndevs, which is the number of devices your
driver is able to support. The structure is as follows:
.. code-block:: c
struct vme_driver {
struct list_head node;
const char *name;
@ -38,6 +42,8 @@ with the driver. The match function should return 1 if a device should be
probed and 0 otherwise. This example match function (from vme_user.c) limits
the number of devices probed to one:
.. code-block:: c
#define USER_BUS_MAX 1
...
static int vme_user_match(struct vme_dev *vdev)
@ -51,6 +57,8 @@ The '.probe' element should contain a pointer to the probe routine. The
probe routine is passed a 'struct vme_dev' pointer as an argument. The
'struct vme_dev' structure looks like the following:
.. code-block:: c
struct vme_dev {
int num;
struct vme_bridge *bridge;
@ -66,11 +74,13 @@ dev->bridge->num.
A function is also provided to unregister the driver from the VME core and is
usually called from the device driver's exit routine:
.. code-block:: c
void vme_unregister_driver (struct vme_driver *driver);
Resource management
===================
-------------------
Once a driver has registered with the VME core the provided match routine will
be called the number of times specified during the registration. If a match
@ -86,6 +96,8 @@ specific window or DMA channel (which may be used by a different driver) this
driver allows a resource to be assigned based on the required attributes of the
driver in question:
.. code-block:: c
struct vme_resource * vme_master_request(struct vme_dev *dev,
u32 aspace, u32 cycle, u32 width);
@ -112,6 +124,8 @@ Functions are also provided to free window allocations once they are no longer
required. These functions should be passed the pointer to the resource provided
during resource allocation:
.. code-block:: c
void vme_master_free(struct vme_resource *res);
void vme_slave_free(struct vme_resource *res);
@ -120,7 +134,7 @@ during resource allocation:
Master windows
==============
--------------
Master windows provide access from the local processor[s] out onto the VME bus.
The number of windows available and the available access modes is dependent on
@ -128,11 +142,13 @@ the underlying chipset. A window must be configured before it can be used.
Master window configuration
---------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a master window has been assigned the following functions can be used to
configure it and retrieve the current settings:
.. code-block:: c
int vme_master_set (struct vme_resource *res, int enabled,
unsigned long long base, unsigned long long size, u32 aspace,
u32 cycle, u32 width);
@ -149,11 +165,13 @@ These functions return 0 on success or an error code should the call fail.
Master window access
--------------------
~~~~~~~~~~~~~~~~~~~~
The following functions can be used to read from and write to configured master
windows. These functions return the number of bytes copied:
.. code-block:: c
ssize_t vme_master_read(struct vme_resource *res, void *buf,
size_t count, loff_t offset);
@ -164,6 +182,8 @@ In addition to simple reads and writes, a function is provided to do a
read-modify-write transaction. This function returns the original value of the
VME bus location :
.. code-block:: c
unsigned int vme_master_rmw (struct vme_resource *res,
unsigned int mask, unsigned int compare, unsigned int swap,
loff_t offset);
@ -175,12 +195,14 @@ the value of swap is written the specified offset.
Parts of a VME window can be mapped into user space memory using the following
function:
.. code-block:: c
int vme_master_mmap(struct vme_resource *resource,
struct vm_area_struct *vma)
Slave windows
=============
-------------
Slave windows provide devices on the VME bus access into mapped portions of the
local memory. The number of windows available and the access modes that can be
@ -189,11 +211,13 @@ it can be used.
Slave window configuration
--------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a slave window has been assigned the following functions can be used to
configure it and retrieve the current settings:
.. code-block:: c
int vme_slave_set (struct vme_resource *res, int enabled,
unsigned long long base, unsigned long long size,
dma_addr_t mem, u32 aspace, u32 cycle);
@ -210,13 +234,15 @@ These functions return 0 on success or an error code should the call fail.
Slave window buffer allocation
------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions are provided to allow the user to allocate and free a contiguous
buffers which will be accessible by the VME bridge. These functions do not have
to be used, other methods can be used to allocate a buffer, though care must be
taken to ensure that they are contiguous and accessible by the VME bridge:
.. code-block:: c
void * vme_alloc_consistent(struct vme_resource *res, size_t size,
dma_addr_t *mem);
@ -225,14 +251,14 @@ taken to ensure that they are contiguous and accessible by the VME bridge:
Slave window access
-------------------
~~~~~~~~~~~~~~~~~~~
Slave windows map local memory onto the VME bus, the standard methods for
accessing memory should be used.
DMA channels
============
------------
The VME DMA transfer provides the ability to run link-list DMA transfers. The
API introduces the concept of DMA lists. Each DMA list is a link-list which can
@ -241,29 +267,35 @@ executed, reused and destroyed.
List Management
---------------
~~~~~~~~~~~~~~~
The following functions are provided to create and destroy DMA lists. Execution
of a list will not automatically destroy the list, thus enabling a list to be
reused for repetitive tasks:
.. code-block:: c
struct vme_dma_list *vme_new_dma_list(struct vme_resource *res);
int vme_dma_list_free(struct vme_dma_list *list);
List Population
---------------
~~~~~~~~~~~~~~~
An item can be added to a list using the following function ( the source and
destination attributes need to be created before calling this function, this is
covered under "Transfer Attributes"):
.. code-block:: c
int vme_dma_list_add(struct vme_dma_list *list,
struct vme_dma_attr *src, struct vme_dma_attr *dest,
size_t count);
NOTE: The detailed attributes of the transfers source and destination
.. note::
The detailed attributes of the transfers source and destination
are not checked until an entry is added to a DMA list, the request
for a DMA channel purely checks the directions in which the
controller is expected to transfer data. As a result it is
@ -271,7 +303,7 @@ NOTE: The detailed attributes of the transfers source and destination
source or destination is in an unsupported VME address space.
Transfer Attributes
-------------------
~~~~~~~~~~~~~~~~~~~
The attributes for the source and destination are handled separately from adding
an item to a list. This is due to the diverse attributes required for each type
@ -280,33 +312,43 @@ and pattern sources and destinations (where appropriate):
Pattern source:
.. code-block:: c
struct vme_dma_attr *vme_dma_pattern_attribute(u32 pattern, u32 type);
PCI source or destination:
.. code-block:: c
struct vme_dma_attr *vme_dma_pci_attribute(dma_addr_t mem);
VME source or destination:
.. code-block:: c
struct vme_dma_attr *vme_dma_vme_attribute(unsigned long long base,
u32 aspace, u32 cycle, u32 width);
The following function should be used to free an attribute:
.. code-block:: c
void vme_dma_free_attribute(struct vme_dma_attr *attr);
List Execution
--------------
~~~~~~~~~~~~~~
The following function queues a list for execution. The function will return
once the list has been executed:
.. code-block:: c
int vme_dma_list_exec(struct vme_dma_list *list);
Interrupts
==========
----------
The VME API provides functions to attach and detach callbacks to specific VME
level and status ID combinations and for the generation of VME interrupts with
@ -314,13 +356,15 @@ specific VME level and status IDs.
Attaching Interrupt Handlers
----------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following functions can be used to attach and free a specific VME level and
status ID combination. Any given combination can only be assigned a single
callback function. A void pointer parameter is provided, the value of which is
passed to the callback function, the use of this pointer is user undefined:
.. code-block:: c
int vme_irq_request(struct vme_dev *dev, int level, int statid,
void (*callback)(int, int, void *), void *priv);
@ -329,31 +373,37 @@ passed to the callback function, the use of this pointer is user undefined:
The callback parameters are as follows. Care must be taken in writing a callback
function, callback functions run in interrupt context:
.. code-block:: c
void callback(int level, int statid, void *priv);
Interrupt Generation
--------------------
~~~~~~~~~~~~~~~~~~~~
The following function can be used to generate a VME interrupt at a given VME
level and VME status ID:
.. code-block:: c
int vme_irq_generate(struct vme_dev *dev, int level, int statid);
Location monitors
=================
-----------------
The VME API provides the following functionality to configure the location
monitor.
Location Monitor Management
---------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following functions are provided to request the use of a block of location
monitors and to free them after they are no longer required:
.. code-block:: c
struct vme_resource * vme_lm_request(struct vme_dev *dev);
void vme_lm_free(struct vme_resource * res);
@ -362,15 +412,19 @@ Each block may provide a number of location monitors, monitoring adjacent
locations. The following function can be used to determine how many locations
are provided:
.. code-block:: c
int vme_lm_count(struct vme_resource * res);
Location Monitor Configuration
------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a bank of location monitors has been allocated, the following functions
are provided to configure the location and mode of the location monitor:
.. code-block:: c
int vme_lm_set(struct vme_resource *res, unsigned long long base,
u32 aspace, u32 cycle);
@ -379,12 +433,14 @@ are provided to configure the location and mode of the location monitor:
Location Monitor Use
--------------------
~~~~~~~~~~~~~~~~~~~~
The following functions allow a callback to be attached and detached from each
location monitor location. Each location monitor can monitor a number of
adjacent locations:
.. code-block:: c
int vme_lm_attach(struct vme_resource *res, int num,
void (*callback)(void *));
@ -392,22 +448,27 @@ adjacent locations:
The callback function is declared as follows.
.. code-block:: c
void callback(void *data);
Slot Detection
==============
--------------
This function returns the slot ID of the provided bridge.
.. code-block:: c
int vme_slot_num(struct vme_dev *dev);
Bus Detection
=============
-------------
This function returns the bus ID of the provided bridge.
.. code-block:: c
int vme_bus_num(struct vme_dev *dev);

View File

@ -1,340 +0,0 @@
Introduction
============
This document describes how to use the dynamic debug (dyndbg) feature.
Dynamic debug is designed to allow you to dynamically enable/disable
kernel code to obtain additional kernel information. Currently, if
CONFIG_DYNAMIC_DEBUG is set, then all pr_debug()/dev_dbg() and
print_hex_dump_debug()/print_hex_dump_bytes() calls can be dynamically
enabled per-callsite.
If CONFIG_DYNAMIC_DEBUG is not set, print_hex_dump_debug() is just
shortcut for print_hex_dump(KERN_DEBUG).
For print_hex_dump_debug()/print_hex_dump_bytes(), format string is
its 'prefix_str' argument, if it is constant string; or "hexdump"
in case 'prefix_str' is build dynamically.
Dynamic debug has even more useful features:
* Simple query language allows turning on and off debugging
statements by matching any combination of 0 or 1 of:
- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string
* Provides a debugfs control file: <debugfs>/dynamic_debug/control
which can be read to display the complete list of known debug
statements, to help guide you
Controlling dynamic debug Behaviour
===================================
The behaviour of pr_debug()/dev_dbg()s are controlled via writing to a
control file in the 'debugfs' filesystem. Thus, you must first mount
the debugfs filesystem, in order to make use of this feature.
Subsequently, we refer to the control file as:
<debugfs>/dynamic_debug/control. For example, if you want to enable
printing from source file 'svcsock.c', line 1603 you simply do:
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
If you make a mistake with the syntax, the write will fail thus:
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
<debugfs>/dynamic_debug/control
-bash: echo: write error: Invalid argument
Viewing Dynamic Debug Behaviour
===========================
You can view the currently configured behaviour of all the debug
statements via:
nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
...
You can also apply standard Unix text manipulation filters to this
data, e.g.
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
62
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
42
The third column shows the currently enabled flags for each debug
statement callsite (see below for definitions of the flags). The
default value, with no flags enabled, is "=_". So you can view all
the debug statement callsites with any non-default flags:
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
Command Language Reference
==========================
At the lexical level, a command comprises a sequence of words separated
by spaces or tabs. So these are all equivalent:
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
Command submissions are bounded by a write() system call.
Multiple commands can be written together, separated by ';' or '\n'.
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
> <debugfs>/dynamic_debug/control
If your query set is big, you can batch them too:
~# cat query-batch-file > <debugfs>/dynamic_debug/control
A another way is to use wildcard. The match rule support '*' (matches
zero or more characters) and '?' (matches exactly one character).For
example, you can match all usb drivers:
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification.
command ::= match-spec* flags-spec
The match-spec's are used to choose a subset of the known pr_debug()
callsites to which to apply the flags-spec. Think of them as a query
with implicit ANDs between each pair. Note that an empty list of
match-specs will select all debug statement callsites.
A match specification comprises a keyword, which controls the
attribute of the callsite to be compared, and a value to compare
against. Possible keywords are:
match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
'line' line-range
line-range ::= lineno |
'-'lineno |
lineno'-' |
lineno'-'lineno
// Note: line-range cannot contain space, e.g.
// "1-30" is valid range but "1 - 30" is not.
lineno ::= unsigned-int
The meanings of each keyword are:
func
The given string is compared against the function name
of each callsite. Example:
func svc_tcp_accept
file
The given string is compared against either the full pathname, the
src-root relative pathname, or the basename of the source file of
each callsite. Examples:
file svcsock.c
file kernel/freezer.c
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
module
The given string is compared against the module name
of each callsite. The module name is the string as
seen in "lsmod", i.e. without the directory or the .ko
suffix and with '-' changed to '_'. Examples:
module sunrpc
module nfsd
format
The given string is searched for in the dynamic debug format
string. Note that the string does not need to match the
entire format, only some part. Whitespace and other
special characters can be escaped using C octal character
escape \ooo notation, e.g. the space character is \040.
Alternatively, the string can be enclosed in double quote
characters (") or single quote characters (').
Examples:
format svcrdma: // many of the NFS/RDMA server pr_debugs
format readahead // some pr_debugs in the readahead cache
format nfsd:\040SETATTR // one way to match a format with whitespace
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
line
The given line number or range of line numbers is compared
against the line number of each pr_debug() callsite. A single
line number matches the callsite line number exactly. A
range of line numbers matches any callsite between the first
and last line number inclusive. An empty first number means
the first line in the file, an empty line number means the
last number in the file. Examples:
line 1603 // exactly line 1603
line 1600-1605 // the six lines from line 1600 to line 1605
line -1605 // the 1605 lines from line 1 to line 1605
line 1600- // all lines from line 1600 to the end of the file
The flags specification comprises a change operation followed
by one or more flag characters. The change operation is one
of the characters:
- remove the given flags
+ add the given flags
= set the flags to the given flags
The flags are:
p enables the pr_debug() callsite.
f Include the function name in the printed message
l Include line number in the printed message
m Include module name in the printed message
t Include thread ID in messages not generated from interrupt context
_ No flags are set. (Or'd with others on input)
For print_hex_dump_debug() and print_hex_dump_bytes(), only 'p' flag
have meaning, other flags ignored.
For display, the flags are preceded by '='
(mnemonic: what the flags are currently equal to).
Note the regexp ^[-+=][flmpt_]+$ matches a flags specification.
To clear all flags at once, use "=_" or "-flmpt".
Debug messages during Boot Process
==================================
To activate debug messages for core code and built-in modules during
the boot process, even before userspace and debugfs exists, use
dyndbg="QUERY", module.dyndbg="QUERY", or ddebug_query="QUERY"
(ddebug_query is obsoleted by dyndbg, and deprecated). QUERY follows
the syntax described above, but must not exceed 1023 characters. Your
bootloader may impose lower limits.
These dyndbg params are processed just after the ddebug tables are
processed, as part of the arch_initcall. Thus you can enable debug
messages in all code run after this arch_initcall via this boot
parameter.
On an x86 system for example ACPI enablement is a subsys_initcall and
dyndbg="file ec.c +p"
will show early Embedded Controller transactions during ACPI setup if
your machine (typically a laptop) has an Embedded Controller.
PCI (or other devices) initialization also is a hot candidate for using
this boot parameter for debugging purposes.
If foo module is not built-in, foo.dyndbg will still be processed at
boot time, without effect, but will be reprocessed when module is
loaded later. dyndbg_query= and bare dyndbg= are only processed at
boot.
Debug Messages at Module Initialization Time
============================================
When "modprobe foo" is called, modprobe scans /proc/cmdline for
foo.params, strips "foo.", and passes them to the kernel along with
params given in modprobe args or /etc/modprob.d/*.conf files,
in the following order:
1. # parameters given via /etc/modprobe.d/*.conf
options foo dyndbg=+pt
options foo dyndbg # defaults to +p
2. # foo.dyndbg as given in boot args, "foo." is stripped and passed
foo.dyndbg=" func bar +p; func buz +mp"
3. # args to modprobe
modprobe foo dyndbg==pmf # override previous settings
These dyndbg queries are applied in order, with last having final say.
This allows boot args to override or modify those from /etc/modprobe.d
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
modprobe args to override both.
In the foo.dyndbg="QUERY" form, the query must exclude "module foo".
"foo" is extracted from the param-name, and applied to each query in
"QUERY", and only 1 match-spec of each type is allowed.
The dyndbg option is a "fake" module parameter, which means:
- modules do not need to define it explicitly
- every module gets it tacitly, whether they use pr_debug or not
- it doesn't appear in /sys/module/$module/parameters/
To see it, grep the control file, or inspect /proc/cmdline.
For CONFIG_DYNAMIC_DEBUG kernels, any settings given at boot-time (or
enabled by -DDEBUG flag during compilation) can be disabled later via
the sysfs interface if the debug messages are no longer needed:
echo "module module_name -p" > <debugfs>/dynamic_debug/control
Examples
========
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
<debugfs>/dynamic_debug/control
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
<debugfs>/dynamic_debug/control
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
<debugfs>/dynamic_debug/control
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
<debugfs>/dynamic_debug/control
// enable messages in files of which the paths include string "usb"
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
// enable all messages
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
// add module, function to all enabled messages
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
// boot-args example, with newlines and comments for readability
Kernel command line: ...
// see whats going on in dyndbg=value processing
dynamic_debug.verbose=1
// enable pr_debugs in 2 builtins, #cmt is stripped
dyndbg="module params +p #cmt ; module sys +p"
// enable pr_debugs in 2 functions in a module loaded later
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"

View File

@ -19,7 +19,7 @@ forever.
This should not cause problems for anybody, since everybody using a
2.1.x kernel should have updated their C library to a suitable version
anyway (see the file "Documentation/Changes".)
anyway (see the file "Documentation/process/changes.rst".)
1.2 Allow Mixed Locks Again
---------------------------

View File

@ -11,7 +11,7 @@ Updated 2006 by Horms <horms@verge.net.au>
In order to use a diskless system, such as an X-terminal or printer server
for example, it is necessary for the root filesystem to be present on a
non-disk device. This may be an initramfs (see Documentation/filesystems/
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a
filesystem mounted via NFS. The following text describes on how to use NFS
for the root filesystem. For the rest of this text 'client' means the
diskless system, and 'server' means the NFS server.
@ -284,7 +284,7 @@ They depend on various facilities being available:
"kernel <relative-path-below /tftpboot>". The nfsroot parameters
are passed to the kernel by adding them to the "append" line.
It is common to use serial console in conjunction with pxeliunx,
see Documentation/serial-console.txt for more information.
see Documentation/admin-guide/serial-console.rst for more information.
For more information on isolinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/

View File

@ -1305,7 +1305,16 @@ second). The meanings of the columns are as follows, from left to right:
- nice: niced processes executing in user mode
- system: processes executing in kernel mode
- idle: twiddling thumbs
- iowait: waiting for I/O to complete
- iowait: In a word, iowait stands for waiting for I/O to complete. But there
are several problems:
1. Cpu will not wait for I/O to complete, iowait is the time that a task is
waiting for I/O to complete. When cpu goes into idle state for
outstanding task io, another task will be scheduled on this CPU.
2. In a multi-core CPU, the task waiting for I/O to complete is not running
on any CPU, so the iowait of each CPU is difficult to calculate.
3. The value of iowait field in /proc/stat will decrease in certain
conditions.
So, the iowait is not reliable by reading from /proc/stat.
- irq: servicing interrupts
- softirq: servicing softirqs
- steal: involuntary wait

View File

@ -119,7 +119,7 @@ separated by spaces:
253:0 Device with major 253 and minor 0
Authoritative information can be found in
"Documentation/kernel-parameters.txt".
"Documentation/admin-guide/kernel-parameters.rst".
(*) rw

View File

@ -3,3 +3,8 @@
project = "Linux GPU Driver Developer's Guide"
tags.add("subproject")
latex_documents = [
('index', 'gpu.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -188,7 +188,7 @@ Connectors state change detection must be cleanup up with a call to
Output discovery and initialization example
-------------------------------------------
::
.. code-block:: c
void intel_crt_init(struct drm_device *dev)
{

Some files were not shown because too many files have changed in this diff Show More