linux awk命令详解

awk是一个强大的文本分析工具，相对于grep的查找，sed的编辑，awk在其对数据分析并生成报告时，显得尤为强大。简单来说awk就是把文件逐行的读入，以空格为默认分隔符将每行切片，切开的部分再进行各种分析处理。

awk有3个不同版本: awk、nawk和gawk，未作特别说明，一般指gawk，gawk 是 AWK 的 GNU 版本。

awk其名称得自于它的创始人 Alfred Aho 、Peter Weinberger 和 Brian Kernighan 姓氏的首个字母。实际上 AWK 的确拥有自己的语言： AWK 程序设计语言 ， 三位创建者已将它正式定义为”样式扫描和处理语言”。它允许您创建简短的程序，这些程序读取输入文件、为数据排序、处理数据、对输入执行计算以及生成报表，还有无数其他的功能。

使用方法

[En]

Although the operation can be complex, the syntax is always like this, where pattern represents what AWK looks for in the data, and action is a series of commands that are executed when a match is found. Curly braces ({}) do not need to appear all the time in the program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression that you want to represent, surrounded by diagonal bars.

awk语言的最基本功能是在文件或者字符串中基于指定规则浏览和抽取信息，awk抽取信息后，才能进行其他文本操作。完整的awk脚本通常用来格式化文本文件中的信息。

[En]

Typically, awk processes units as a behavior of a file. Awk receives each line of the file and then executes the appropriate command to process the text.

调用awk

[En]

There are three ways to call awk

[En]

This chapter focuses on the command line approach.

入门实例

[En]

Suppose the output of last-n 5 is as follows

[En]

If you only show the five most recently logged in accounts

awk工作流程是这样的：读入有’\n’换行符分割的一条记录，然后将记录按指定的域分隔符划分域，填充域，$0则表示所有域,$1表示第一个域,$n表示第n个域。默认域分隔符是”空白键” 或 “[tab]键”,所以$1表示登录用户，$3表示登录用户ip,以此类推。 如果它只显示/etc/passwd的帐户 [En] If it just shows the account of / etc/passwd 这是awk+操作的一个示例，其中每一行都执行操作{print$1}。

[En]

This is an example of awk+action, where each line executes action {print $1}. -F指定域分隔符为’:’。 如果只显示/etc/passwd的帐号和帐号对应的外壳，并且用Tab键分隔帐号和外壳 [En] If only the account of / etc/passwd and the corresponding shell of the account are displayed, and the account and shell are separated by the tab key 如果只显示/etc/passwd的帐户和帐户的对应外壳，并且帐户和外壳之间用逗号分隔，并且在所有行中添加列名和外壳，则在最后一行添加“Blue，/bin/nosh”。 [En] If you only show the account of / etc/passwd and the corresponding shell of the account, and the account and shell are separated by a comma, and you add the column name name,shell in all lines, add “blue,/bin/nosh” on the last line. awk工作流程是这样的：先执行BEGING，然后读取文件，读入有/n换行符分割的一条记录，然后将记录按指定的域分隔符划分域，填充域，$0则表示所有域,$1表示第一个域,$n表示第n个域,随后开始执行模式所对应的动作action。接着开始读入第二条记录······直到所有的记录都读完，最后执行END操作。

[En]

Search / etc/passwd all lines with the root keyword

[En]

This is an example of the use of pattern, where lines that match pattern (in this case, root) will execute action (no action specified, default output of each line).

[En]

Search supports rules, such as looking for awk-F:’/ ^ root/’ / etc/passwd at the beginning of root

[En]

Search for all lines with the root keyword in / etc/passwd and display the corresponding shell

awk内置变量

awk有许多内置变量用来设置环境信息，这些变量可以被改变，下面给出了最常用的一些变量。

STATISTICS/ETC/PASSWD：文件名、每行行号、每行列数、对应整行内容：

[En]

Statistics / etc/passwd: file name, line number of each line, number of columns per line, corresponding full line content:

[En]

Using printf instead of print can make the code more concise and easy to read

print和printf

awk中同时提供了print和printf两种打印输出的函数。

[En]

Where the argument to the print function can be a variable, a numeric value, or a string. Strings must be referenced in double quotes and parameters separated by commas. If there is no comma, the parameters are concatenated and cannot be distinguished. Here, the comma serves the same purpose as the delimiter of the output file, except that the latter is a space.

printf函数，其用法和c语言中printf基本相似,可以格式化字符串,输出复杂时，printf更加好用，代码更易懂。

awk编程

[En]

In addition to awk’s built-in variables, awk can also customize variables.

[En]

Count the number of accounts in / etc/passwd below

count是自定义变量。之前的action{}里都是只有一个print,其实print只是一个语句，而action{}可以有多个语句，以;号隔开。

[En]

Count is not initialized here. Although the default is 0, it is appropriate to initialize it to 0:

[En]

Count the number of bytes occupied by files in a folder

[En]

If displayed in M units:

[En]

Note that the statistics do not include subdirectories of the folder.

awk中的条件语句是从C语言中借鉴来的，见如下声明方式：

[En]

Count the number of bytes occupied by files in a folder and filter files with a size of 4096 (usually folders):

awk中的循环语句同样借鉴于C语言，支持while、do/while、for、break、continue，这些关键字的语义和C语言中的语义完全相同。

[En]

Because the subscript of an array in awk can be numeric and alphabetic, the subscript of an array is often referred to as a key. The values and keywords are stored in an internal table for the key/value application hash. Because hash is not stored sequentially, when you display the contents of the array, you will find that they are not displayed in the order you expected. Arrays, like variables, are created automatically when they are used, and awk also automatically determines whether they store numbers or strings. In general, arrays in awk are used to collect information from records and can be used to calculate sums, count words, track the number of times the template has been matched, and so on.

[En]

Show / etc/passwd ‘s account

Original: https://www.cnblogs.com/yymn/p/5675995.html
Author: 菜鸡一枚
Title: linux awk命令详解

(0)