Awk
Awk - whose name comes from the three creators, Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan - is a language of treatment of lines, available on the majority of the systems Unix and under Windows with Cygwin in particular. It is mainly used for the handling of textual files for research operations, complex replacement and transformations. It acts like a programmable filter taking a series of lines in entry (in the form of files or directly via the standard entry) and writing on the standard exit, which can be redirected towards another file or programs. Awk reads on the entry line by line, then selects (or not) the lines to treat by rational expressions (and possibly of the numbers of lines). Once the selected line, it is cut out in fields according to a separator of entry indicated in the Awk program by the symbol FS (which by defect corresponds to the space character or tabulation). Then the various fields are available in variables: $1 (first field), $2 (second field), $3 (third field),…, $NF (last field).
“awk” is also the extension of file name used for the scripts written in this language (seldom used).
Syntax is inspired by the C:
awk
where the structure of the program is:
'motif1 { action1 }
motif2 {action2}… '
Each line of the file is compared successively with the various reasons (generally expressions rational, and overall a Boolean expression) and the action of the first
reason returning the true value is carried out. In this case, or if no reason is accepted, the program reads the following line of the file and compares it
with the reasons on the basis of the first.
Some options:
-
-F separateur: allows to modify the separator of fields; -
-f fichier: reads the program starting from a file.
Detailed description
A file is divided into lines ( records in English) themselves divided into fields ( fields in English)
- lines: separator
; meter NR. - fields: separator spaces or tabulation; meter NF.
The separators of input-output are stored in variables and can be modified:
- lines: variables RS and GOLDS
- fields: variables FS and OFS
To return to it field:
-
$nwherenis a strictly positive entirety; -
$0turns over the whole line.
Two special masks:
-
BEGIN: a program defines before beginning the analysis of the file; -
END: a program defines after the analysis.
To define an interval, one uses the comma like this:
-
NR == 1, NR == 10: the associated action will be applied to lines 1 to 10.
Several functions are already implemented:
-
print,printf: functions of posting; -
cos (expr),sin (expr),exp (expr),log (expr); -
getline (): reads the following entry of a line, turns over 0 so fine of file ( EOF: end off file ), 1 if not; -
index (s1, s2): turn over the position of the chain s2 in s1, turns over 0 if s2 does not appear in s1; -
int (expr): whole part of an expression; -
length (S): length of the chain S; -
substr (S, N, L): turn over part of the length L and character string S starting to position N.
Structures of controls: syntax comes directly from C:
-
yew (test) {actions} else {actions} -
while (test) {actions} -
C {actions} while (test) -
for (expr1; expr2; expr3) {actions}
Some examples
-
awk “{print $0}” fichier: post all the lines of file (idem thatcat fichier). -
awk “/2/ {print $0}” ref.txt: post all the lines where character 2 is present (idem thatgrep “2” ref.txt). -
awk “$1~/2/{print $0}” ref.txt: post all the lines where character 2 is present in the first field. -
awk “{print NR ": " , $0}” fichier: post the contents of file , but each line is preceded by its number. -
awk - F: “{print $1}” /etc/passwd: return the list of the users (idemcut - D: - F 1 /etc/passwd). -
AWK “BEGIN {FS = " : "}{print $1}” /etc/passwd: idem that the preceding order -
awk “{s=s+$1} END {print S}” fichier: writing the sum of all the numbers of the first column of file .
Implementation
There exist the various programs which use the syntax of original Awk, here most known:- nawk (abbreviation of new awk ) which extends the functionalities of the initial version;
- mawk a version known for its speed in certain cases;
- gawk the version of the GNU available under principal OS, with a modification which exists to work on network TCP/IP.
| Random links: | Patrick Peuvion | Solar twin | The Splitting mill | Line 94 of the tram of Brussels | County of Hood | Centre_Spatial_Kennedy |