mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-06-27 06:48:31 +00:00
parent
ee49b71be2
commit
2f1679e5cf
20 changed files with 11715 additions and 0 deletions
632
third_party/awk/awk.1
vendored
Normal file
632
third_party/awk/awk.1
vendored
Normal file
|
@ -0,0 +1,632 @@
|
|||
.de EX
|
||||
.nf
|
||||
.ft CW
|
||||
..
|
||||
.de EE
|
||||
.br
|
||||
.fi
|
||||
.ft 1
|
||||
..
|
||||
.de TF
|
||||
.IP "" "\w'\fB\\$1\ \ \fP'u"
|
||||
.PD 0
|
||||
..
|
||||
.TH AWK 1
|
||||
.CT 1 files prog_other
|
||||
.SH NAME
|
||||
awk \- pattern-directed scanning and processing language
|
||||
.SH SYNOPSIS
|
||||
.B awk
|
||||
[
|
||||
.BI \-F
|
||||
.I fs
|
||||
]
|
||||
[
|
||||
.BI \-v
|
||||
.I var=value
|
||||
]
|
||||
[
|
||||
.I 'prog'
|
||||
|
|
||||
.BI \-f
|
||||
.I progfile
|
||||
]
|
||||
[
|
||||
.I file ...
|
||||
]
|
||||
.SH DESCRIPTION
|
||||
.I Awk
|
||||
scans each input
|
||||
.I file
|
||||
for lines that match any of a set of patterns specified literally in
|
||||
.I prog
|
||||
or in one or more files
|
||||
specified as
|
||||
.B \-f
|
||||
.IR progfile .
|
||||
With each pattern
|
||||
there can be an associated action that will be performed
|
||||
when a line of a
|
||||
.I file
|
||||
matches the pattern.
|
||||
Each line is matched against the
|
||||
pattern portion of every pattern-action statement;
|
||||
the associated action is performed for each matched pattern.
|
||||
The file name
|
||||
.B \-
|
||||
means the standard input.
|
||||
Any
|
||||
.I file
|
||||
of the form
|
||||
.I var=value
|
||||
is treated as an assignment, not a filename,
|
||||
and is executed at the time it would have been opened if it were a filename.
|
||||
The option
|
||||
.B \-v
|
||||
followed by
|
||||
.I var=value
|
||||
is an assignment to be done before
|
||||
.I prog
|
||||
is executed;
|
||||
any number of
|
||||
.B \-v
|
||||
options may be present.
|
||||
The
|
||||
.B \-F
|
||||
.I fs
|
||||
option defines the input field separator to be the regular expression
|
||||
.IR fs .
|
||||
.PP
|
||||
An input line is normally made up of fields separated by white space,
|
||||
or by the regular expression
|
||||
.BR FS .
|
||||
The fields are denoted
|
||||
.BR $1 ,
|
||||
.BR $2 ,
|
||||
\&..., while
|
||||
.B $0
|
||||
refers to the entire line.
|
||||
If
|
||||
.BR FS
|
||||
is null, the input line is split into one field per character.
|
||||
.PP
|
||||
A pattern-action statement has the form:
|
||||
.IP
|
||||
.IB pattern " { " action " }
|
||||
.PP
|
||||
A missing
|
||||
.BI { " action " }
|
||||
means print the line;
|
||||
a missing pattern always matches.
|
||||
Pattern-action statements are separated by newlines or semicolons.
|
||||
.PP
|
||||
An action is a sequence of statements.
|
||||
A statement can be one of the following:
|
||||
.PP
|
||||
.EX
|
||||
.ta \w'\f(CWdelete array[expression]\fR'u
|
||||
.RS
|
||||
.nf
|
||||
.ft CW
|
||||
if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
|
||||
while(\fI expression \fP)\fI statement\fP
|
||||
for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
|
||||
for(\fI var \fPin\fI array \fP)\fI statement\fP
|
||||
do\fI statement \fPwhile(\fI expression \fP)
|
||||
break
|
||||
continue
|
||||
{\fR [\fP\fI statement ... \fP\fR] \fP}
|
||||
\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
|
||||
print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
|
||||
printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
|
||||
return\fR [ \fP\fIexpression \fP\fR]\fP
|
||||
next #\fR skip remaining patterns on this input line\fP
|
||||
nextfile #\fR skip rest of this file, open next, start at top\fP
|
||||
delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
|
||||
delete\fI array\fP #\fR delete all elements of array\fP
|
||||
exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
|
||||
.fi
|
||||
.RE
|
||||
.EE
|
||||
.DT
|
||||
.PP
|
||||
Statements are terminated by
|
||||
semicolons, newlines or right braces.
|
||||
An empty
|
||||
.I expression-list
|
||||
stands for
|
||||
.BR $0 .
|
||||
String constants are quoted \&\f(CW"\ "\fR,
|
||||
with the usual C escapes recognized within.
|
||||
Expressions take on string or numeric values as appropriate,
|
||||
and are built using the operators
|
||||
.B + \- * / % ^
|
||||
(exponentiation), and concatenation (indicated by white space).
|
||||
The operators
|
||||
.B
|
||||
! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
|
||||
are also available in expressions.
|
||||
Variables may be scalars, array elements
|
||||
(denoted
|
||||
.IB x [ i ] \fR)
|
||||
or fields.
|
||||
Variables are initialized to the null string.
|
||||
Array subscripts may be any string,
|
||||
not necessarily numeric;
|
||||
this allows for a form of associative memory.
|
||||
Multiple subscripts such as
|
||||
.B [i,j,k]
|
||||
are permitted; the constituents are concatenated,
|
||||
separated by the value of
|
||||
.BR SUBSEP .
|
||||
.PP
|
||||
The
|
||||
.B print
|
||||
statement prints its arguments on the standard output
|
||||
(or on a file if
|
||||
.BI > " file
|
||||
or
|
||||
.BI >> " file
|
||||
is present or on a pipe if
|
||||
.BI | " cmd
|
||||
is present), separated by the current output field separator,
|
||||
and terminated by the output record separator.
|
||||
.I file
|
||||
and
|
||||
.I cmd
|
||||
may be literal names or parenthesized expressions;
|
||||
identical string values in different statements denote
|
||||
the same open file.
|
||||
The
|
||||
.B printf
|
||||
statement formats its expression list according to the
|
||||
.I format
|
||||
(see
|
||||
.IR printf (3)).
|
||||
The built-in function
|
||||
.BI close( expr )
|
||||
closes the file or pipe
|
||||
.IR expr .
|
||||
The built-in function
|
||||
.BI fflush( expr )
|
||||
flushes any buffered output for the file or pipe
|
||||
.IR expr .
|
||||
.PP
|
||||
The mathematical functions
|
||||
.BR atan2 ,
|
||||
.BR cos ,
|
||||
.BR exp ,
|
||||
.BR log ,
|
||||
.BR sin ,
|
||||
and
|
||||
.B sqrt
|
||||
are built in.
|
||||
Other built-in functions:
|
||||
.TF length
|
||||
.TP
|
||||
.B length
|
||||
the length of its argument
|
||||
taken as a string,
|
||||
number of elements in an array for an array argument,
|
||||
or length of
|
||||
.B $0
|
||||
if no argument.
|
||||
.TP
|
||||
.B rand
|
||||
random number on [0,1).
|
||||
.TP
|
||||
.B srand
|
||||
sets seed for
|
||||
.B rand
|
||||
and returns the previous seed.
|
||||
.TP
|
||||
.B int
|
||||
truncates to an integer value.
|
||||
.TP
|
||||
\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
|
||||
the
|
||||
.IR n -character
|
||||
substring of
|
||||
.I s
|
||||
that begins at position
|
||||
.I m
|
||||
counted from 1.
|
||||
If no
|
||||
.IR n ,
|
||||
use the rest of the string.
|
||||
.TP
|
||||
.BI index( s , " t" )
|
||||
the position in
|
||||
.I s
|
||||
where the string
|
||||
.I t
|
||||
occurs, or 0 if it does not.
|
||||
.TP
|
||||
.BI match( s , " r" )
|
||||
the position in
|
||||
.I s
|
||||
where the regular expression
|
||||
.I r
|
||||
occurs, or 0 if it does not.
|
||||
The variables
|
||||
.B RSTART
|
||||
and
|
||||
.B RLENGTH
|
||||
are set to the position and length of the matched string.
|
||||
.TP
|
||||
\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
|
||||
splits the string
|
||||
.I s
|
||||
into array elements
|
||||
.IB a [1] \fR,
|
||||
.IB a [2] \fR,
|
||||
\&...,
|
||||
.IB a [ n ] \fR,
|
||||
and returns
|
||||
.IR n .
|
||||
The separation is done with the regular expression
|
||||
.I fs
|
||||
or with the field separator
|
||||
.B FS
|
||||
if
|
||||
.I fs
|
||||
is not given.
|
||||
An empty string as field separator splits the string
|
||||
into one array element per character.
|
||||
.TP
|
||||
\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
|
||||
substitutes
|
||||
.I t
|
||||
for the first occurrence of the regular expression
|
||||
.I r
|
||||
in the string
|
||||
.IR s .
|
||||
If
|
||||
.I s
|
||||
is not given,
|
||||
.B $0
|
||||
is used.
|
||||
.TP
|
||||
\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
|
||||
same as
|
||||
.B sub
|
||||
except that all occurrences of the regular expression
|
||||
are replaced;
|
||||
.B sub
|
||||
and
|
||||
.B gsub
|
||||
return the number of replacements.
|
||||
.TP
|
||||
.BI sprintf( fmt , " expr" , " ...\fB)
|
||||
the string resulting from formatting
|
||||
.I expr ...
|
||||
according to the
|
||||
.IR printf (3)
|
||||
format
|
||||
.IR fmt .
|
||||
.TP
|
||||
.BI system( cmd )
|
||||
executes
|
||||
.I cmd
|
||||
and returns its exit status. This will be \-1 upon error,
|
||||
.IR cmd 's
|
||||
exit status upon a normal exit,
|
||||
256 +
|
||||
.I sig
|
||||
upon death-by-signal, where
|
||||
.I sig
|
||||
is the number of the murdering signal,
|
||||
or 512 +
|
||||
.I sig
|
||||
if there was a core dump.
|
||||
.TP
|
||||
.BI tolower( str )
|
||||
returns a copy of
|
||||
.I str
|
||||
with all upper-case characters translated to their
|
||||
corresponding lower-case equivalents.
|
||||
.TP
|
||||
.BI toupper( str )
|
||||
returns a copy of
|
||||
.I str
|
||||
with all lower-case characters translated to their
|
||||
corresponding upper-case equivalents.
|
||||
.PD
|
||||
.PP
|
||||
The ``function''
|
||||
.B getline
|
||||
sets
|
||||
.B $0
|
||||
to the next input record from the current input file;
|
||||
.B getline
|
||||
.BI < " file
|
||||
sets
|
||||
.B $0
|
||||
to the next record from
|
||||
.IR file .
|
||||
.B getline
|
||||
.I x
|
||||
sets variable
|
||||
.I x
|
||||
instead.
|
||||
Finally,
|
||||
.IB cmd " | getline
|
||||
pipes the output of
|
||||
.I cmd
|
||||
into
|
||||
.BR getline ;
|
||||
each call of
|
||||
.B getline
|
||||
returns the next line of output from
|
||||
.IR cmd .
|
||||
In all cases,
|
||||
.B getline
|
||||
returns 1 for a successful input,
|
||||
0 for end of file, and \-1 for an error.
|
||||
.PP
|
||||
Patterns are arbitrary Boolean combinations
|
||||
(with
|
||||
.BR "! || &&" )
|
||||
of regular expressions and
|
||||
relational expressions.
|
||||
Regular expressions are as in
|
||||
.IR egrep ;
|
||||
see
|
||||
.IR grep (1).
|
||||
Isolated regular expressions
|
||||
in a pattern apply to the entire line.
|
||||
Regular expressions may also occur in
|
||||
relational expressions, using the operators
|
||||
.B ~
|
||||
and
|
||||
.BR !~ .
|
||||
.BI / re /
|
||||
is a constant regular expression;
|
||||
any string (constant or variable) may be used
|
||||
as a regular expression, except in the position of an isolated regular expression
|
||||
in a pattern.
|
||||
.PP
|
||||
A pattern may consist of two patterns separated by a comma;
|
||||
in this case, the action is performed for all lines
|
||||
from an occurrence of the first pattern
|
||||
though an occurrence of the second.
|
||||
.PP
|
||||
A relational expression is one of the following:
|
||||
.IP
|
||||
.I expression matchop regular-expression
|
||||
.br
|
||||
.I expression relop expression
|
||||
.br
|
||||
.IB expression " in " array-name
|
||||
.br
|
||||
.BI ( expr , expr,... ") in " array-name
|
||||
.PP
|
||||
where a
|
||||
.I relop
|
||||
is any of the six relational operators in C,
|
||||
and a
|
||||
.I matchop
|
||||
is either
|
||||
.B ~
|
||||
(matches)
|
||||
or
|
||||
.B !~
|
||||
(does not match).
|
||||
A conditional is an arithmetic expression,
|
||||
a relational expression,
|
||||
or a Boolean combination
|
||||
of these.
|
||||
.PP
|
||||
The special patterns
|
||||
.B BEGIN
|
||||
and
|
||||
.B END
|
||||
may be used to capture control before the first input line is read
|
||||
and after the last.
|
||||
.B BEGIN
|
||||
and
|
||||
.B END
|
||||
do not combine with other patterns.
|
||||
They may appear multiple times in a program and execute
|
||||
in the order they are read by
|
||||
.IR awk .
|
||||
.PP
|
||||
Variable names with special meanings:
|
||||
.TF FILENAME
|
||||
.TP
|
||||
.B ARGC
|
||||
argument count, assignable.
|
||||
.TP
|
||||
.B ARGV
|
||||
argument array, assignable;
|
||||
non-null members are taken as filenames.
|
||||
.TP
|
||||
.B CONVFMT
|
||||
conversion format used when converting numbers
|
||||
(default
|
||||
.BR "%.6g" ).
|
||||
.TP
|
||||
.B ENVIRON
|
||||
array of environment variables; subscripts are names.
|
||||
.TP
|
||||
.B FILENAME
|
||||
the name of the current input file.
|
||||
.TP
|
||||
.B FNR
|
||||
ordinal number of the current record in the current file.
|
||||
.TP
|
||||
.B FS
|
||||
regular expression used to separate fields; also settable
|
||||
by option
|
||||
.BI \-F fs\fR.
|
||||
.TP
|
||||
.BR NF
|
||||
number of fields in the current record.
|
||||
.TP
|
||||
.B NR
|
||||
ordinal number of the current record.
|
||||
.TP
|
||||
.B OFMT
|
||||
output format for numbers (default
|
||||
.BR "%.6g" ).
|
||||
.TP
|
||||
.B OFS
|
||||
output field separator (default space).
|
||||
.TP
|
||||
.B ORS
|
||||
output record separator (default newline).
|
||||
.TP
|
||||
.B RLENGTH
|
||||
the length of a string matched by
|
||||
.BR match .
|
||||
.TP
|
||||
.B RS
|
||||
input record separator (default newline).
|
||||
If empty, blank lines separate records.
|
||||
If more than one character long,
|
||||
.B RS
|
||||
is treated as a regular expression, and records are
|
||||
separated by text matching the expression.
|
||||
.TP
|
||||
.B RSTART
|
||||
the start position of a string matched by
|
||||
.BR match .
|
||||
.TP
|
||||
.B SUBSEP
|
||||
separates multiple subscripts (default 034).
|
||||
.PD
|
||||
.PP
|
||||
Functions may be defined (at the position of a pattern-action statement) thus:
|
||||
.IP
|
||||
.B
|
||||
function foo(a, b, c) { ...; return x }
|
||||
.PP
|
||||
Parameters are passed by value if scalar and by reference if array name;
|
||||
functions may be called recursively.
|
||||
Parameters are local to the function; all other variables are global.
|
||||
Thus local variables may be created by providing excess parameters in
|
||||
the function definition.
|
||||
.SH ENVIRONMENT VARIABLES
|
||||
If
|
||||
.B POSIXLY_CORRECT
|
||||
is set in the environment, then
|
||||
.I awk
|
||||
follows the POSIX rules for
|
||||
.B sub
|
||||
and
|
||||
.B gsub
|
||||
with respect to consecutive backslashes and ampersands.
|
||||
.SH EXAMPLES
|
||||
.TP
|
||||
.EX
|
||||
length($0) > 72
|
||||
.EE
|
||||
Print lines longer than 72 characters.
|
||||
.TP
|
||||
.EX
|
||||
{ print $2, $1 }
|
||||
.EE
|
||||
Print first two fields in opposite order.
|
||||
.PP
|
||||
.EX
|
||||
BEGIN { FS = ",[ \et]*|[ \et]+" }
|
||||
{ print $2, $1 }
|
||||
.EE
|
||||
.ns
|
||||
.IP
|
||||
Same, with input fields separated by comma and/or spaces and tabs.
|
||||
.PP
|
||||
.EX
|
||||
.nf
|
||||
{ s += $1 }
|
||||
END { print "sum is", s, " average is", s/NR }
|
||||
.fi
|
||||
.EE
|
||||
.ns
|
||||
.IP
|
||||
Add up first column, print sum and average.
|
||||
.TP
|
||||
.EX
|
||||
/start/, /stop/
|
||||
.EE
|
||||
Print all lines between start/stop pairs.
|
||||
.PP
|
||||
.EX
|
||||
.nf
|
||||
BEGIN { # Simulate echo(1)
|
||||
for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
|
||||
printf "\en"
|
||||
exit }
|
||||
.fi
|
||||
.EE
|
||||
.SH SEE ALSO
|
||||
.IR grep (1),
|
||||
.IR lex (1),
|
||||
.IR sed (1)
|
||||
.br
|
||||
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
|
||||
.IR "The AWK Programming Language" ,
|
||||
Addison-Wesley, 1988. ISBN 0-201-07981-X.
|
||||
.SH BUGS
|
||||
There are no explicit conversions between numbers and strings.
|
||||
To force an expression to be treated as a number add 0 to it;
|
||||
to force it to be treated as a string concatenate
|
||||
\&\f(CW""\fP to it.
|
||||
.PP
|
||||
The scope rules for variables in functions are a botch;
|
||||
the syntax is worse.
|
||||
.PP
|
||||
Only eight-bit characters sets are handled correctly.
|
||||
.SH UNUSUAL FLOATING-POINT VALUES
|
||||
.I Awk
|
||||
was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
|
||||
and Infinity values, which are supported by all modern floating-point
|
||||
hardware.
|
||||
.PP
|
||||
Because
|
||||
.I awk
|
||||
uses
|
||||
.IR strtod (3)
|
||||
and
|
||||
.IR atof (3)
|
||||
to convert string values to double-precision floating-point values,
|
||||
modern C libraries also convert strings starting with
|
||||
.B inf
|
||||
and
|
||||
.B nan
|
||||
into infinity and NaN values respectively. This led to strange results,
|
||||
with something like this:
|
||||
.PP
|
||||
.EX
|
||||
.nf
|
||||
echo nancy | awk '{ print $1 + 0 }'
|
||||
.fi
|
||||
.EE
|
||||
.PP
|
||||
printing
|
||||
.B nan
|
||||
instead of zero.
|
||||
.PP
|
||||
.I Awk
|
||||
now follows GNU AWK, and prefilters string values before attempting
|
||||
to convert them to numbers, as follows:
|
||||
.TP
|
||||
.I "Hexadecimal values"
|
||||
Hexadecimal values (allowed since C99) convert to zero, as they did
|
||||
prior to C99.
|
||||
.TP
|
||||
.I "NaN values"
|
||||
The two strings
|
||||
.B +nan
|
||||
and
|
||||
.B \-nan
|
||||
(case independent) convert to NaN. No others do.
|
||||
(NaNs can have signs.)
|
||||
.TP
|
||||
.I "Infinity values"
|
||||
The two strings
|
||||
.B +inf
|
||||
and
|
||||
.B \-inf
|
||||
(case independent) convert to positive and negative infinity, respectively.
|
||||
No others do.
|
Loading…
Add table
Add a link
Reference in a new issue