cut is a processing object for each behavior, and this mechanism is the same as sed.
1. What is the general basis of cut? In other words, how can I tell cut what I want to locate?
The cut command mainly accepts three positioning methods:
- First, bytes, with option - b
- Second, characters, with options-
- Third, fields, with the option - f
2. With "byte" positioning, give the simplest example?
For example, when you execute the ps command, you will output something similar to the following:
[rocrocket@rocrocket programming]$ who rocrocket :0 2009-01-08 11:07 rocrocket pts/0 2009-01-08 11:23 (:0.0) rocrocket pts/1 2009-01-08 14:15 (:0.0)
If we want to extract the third byte of each line, this is it:
[rocrocket@rocrocket programming]$ who|cut -b 3 c c c
See, you can set which byte to extract after - b. in fact, there is no space between - b and 3, but spaces are recommended.
3. What if I want to extract the 3rd, 4th, 5th and 8th bytes during "byte" positioning?
-b supports the writing of forms 3-5, and multiple positions are separated by commas. Take an example:
[rocrocket@rocrocket programming]$ who|cut -b 3-5,8 croe croe croe
However, it should be noted that if the - b option is used in the cut command, when executing this command, the cut will first sort all the positions behind * * - b from small to large, and then extract * *. You can't reverse the order of positioning. This example can illustrate this problem:
[rocrocket@rocrocket programming]$ who|cut -b 8,3-5 croe croe croe
4. What other tips like "3-5" are there? List them!
[rocrocket@rocrocket programming]$ who rocrocket :0 2009-01-08 11:07 rocrocket pts/0 2009-01-08 11:23 (:0.0) rocrocket pts/1 2009-01-08 14:15 (:0.0) [rocrocket@rocrocket programming]$ who|cut -b -3 roc roc roc [rocrocket@rocrocket programming]$ who|cut -b 3- crocket :0 2009-01-08 11:07 crocket pts/0 2009-01-08 11:23 (:0.0) crocket pts/1 2009-01-08 14:15 (:0.0)
As you can see, - 3 means from the first byte to the third byte, and 3 - means from the third byte to the end of the line. If you are careful, you can see that in both cases, the third byte "c" is included.
What do you think if I execute who|cut -b -3,3?
The answer is to output the whole line without two consecutive overlapping c's. Look:
[rocrocket@rocrocket programming]$ who|cut -b -3,3- rocrocket :0 2009-01-08 11:07 rocrocket pts/0 2009-01-08 11:23 (:0.0) rocrocket pts/1 2009-01-08 14:15 (:0.0)
5. Give me the simplest example of using characters as positioning marks!
[rocrocket@rocrocket programming]$ who|cut -c 3-5,8 croe croe croe
But how does it look like it's no different from -b? Is it possible that - B and - c work the same? In fact, it seems the same, but it's just because this example is not good. who outputs single byte characters, so there is no difference between - B and - c. If you extract Chinese, the difference will be seen. Let's take a look at the extraction of Chinese:
[rocrocket@rocrocket programming]$ cat cut_ch.txt Monday Tuesday Wednesday Thursday [rocrocket@rocrocket programming]$ cut -b 3 cut_ch.txt � � � � [rocrocket@rocrocket programming]$ cut -c 3 cut_ch.txt one two three four
You see, if you use - c, it will take characters as the unit, and the output is normal; And - b will only be silly to byte (8 binary bits) to calculate, the output is garbled.
Now that you have mentioned this knowledge point, add another word. If you have spare power, improve it.
When multi byte characters are encountered, you can use the - N option, - n is used to tell cut not to split multi byte characters. Examples are as follows:
[rocrocket@rocrocket programming]$ cat cut_ch.txt |cut -b 2 � � � � [rocrocket@rocrocket programming]$ cat cut_ch.txt |cut -nb 2 [rocrocket@rocrocket programming]$ cat cut_ch.txt |cut -nb 1,2,3 Star Star Star Star
6. What's going on with the domain?
Why is there "domain" extraction? Because - b and - c mentioned just now can only extract information from fixed format documents, but there is nothing to do with non fixed format information. At this time, "domain" comes in handy.
(the following explanation is based on the assumption that you have a better understanding of the content and organization of the / etc/passwd file.)
If you look at the / etc/passwd file, you will find that it does not have a fixed format like the output information of who, but is scattered. However, colons play a very important role in every line of this file. Colons are used to separate each item.
Fortunately, the cut command provides such an extraction method. Specifically, it is OK to set the "spacer" and then set the "extraction field"!
Take the first five lines of / etc/passwd as an example:
[rocrocket@rocrocket programming]$ cat /etc/passwd|head -n 5 root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin [rocrocket@rocrocket programming]$ cat /etc/passwd|head -n 5|cut -d : -f 1 root bin daemon adm lp
You see, use - d to set the spacer as a colon, and then use - f to set the first domain I want to take, and then press enter, and all user names will be listed!
Of course, when setting * * - f, a format similar to 3-5 or 4 - * * can also be used:
[rocrocket@rocrocket programming]$ cat /etc/passwd|head -n 5|cut -d : -f 1,3-5 root:0:0:root bin:1:1:bin daemon:2:2:daemon adm:3:4:adm lp:4:7:lp [rocrocket@rocrocket programming]$ cat /etc/passwd|head -n 5|cut -d : -f 1,3-5,7 root:0:0:root:/bin/bash bin:1:1:bin:/sbin/nologin daemon:2:2:daemon:/sbin/nologin adm:3:4:adm:/sbin/nologin lp:4:7:lp:/sbin/nologin [rocrocket@rocrocket programming]$ cat /etc/passwd|head -n 5|cut -d : -f -2 root:x bin:x daemon:x adm:x lp:x
7. How to distinguish between spaces and tabs? I feel a little confused. What should I do?
Sometimes tabs are really difficult to recognize. There is a way to see whether a space is composed of several spaces or one tab
[rocrocket@rocrocket programming]$ cat tab_space.txt this is tab finish. this is several space finish. [rocrocket@rocrocket programming]$ sed -n l tab_space.txt this is tab\tfinish.$ this is several space finish.$
See, if it is a TAB, it will be displayed as a \ t symbol. If it is a space, it will be displayed as it is.
With this method, you can judge the tabs and spaces.
Note that the characters after sed -n above are lowercase letters of L. don't read it wrong.
8. What symbol should I use in cut -d to set custom table characters or spaces?
Quietly tell you that the default spacer of the - d option of cut is the tab (\ T), so when you want to use the tab, you can completely omit the - d option and directly use - f to get the field! Don't worry, trust me!
If you set a space as a spacer, this is it:
[rocrocket@rocrocket programming]$ cat tab_space.txt |cut -d ' ' -f 1
this
this
Note that there must be a space between two single quotes. You can't be lazy.
Moreover, you can only set one space after - d, but multiple spaces are not allowed, because cut only allows the spacer to be one character.
[rocrocket@rocrocket programming]$ cat tab_space.txt |cut -d ' ' -f 1
cut: the delimiter must be a single character
Try `cut --help' for more information
9. What are the defects and deficiencies of cut?
Guess? Yes, when dealing with multiple spaces.
If some fields in the file are separated by several spaces, it is a little troublesome to use cut, because cut is only good at dealing with the text content "separated by one character".