Chapter 3

3. Text-Fu

This is the Third chapter for learning Linux on Let’s Learn Linux.

Learn basic text manipulation and navigation.

Subsections of 3. Text-Fu

1. stdout (Standard Out)

Lesson Content

By now, we’ve become familiar with many commands and their output and that brings us to our next subject I/O (input/output) streams. Let’s run the following command and we’ll discuss how this works.

$ echo Hello World > peanuts.txt

What just happened? Well check the directory where you ran that command and lo and behold you should see a file called peanuts.txt, look inside that file and you should see the text Hello World. Lots of things just happened in one command so let’s break it down.

First let’s break down the first part:

$ echo Hello World

We know this prints out Hello World to the screen, but how? Processes use I/O streams to receive input and return output. By default the echo command takes the input (standard input or stdin) from the keyboard and returns the output (standard output or stdout) to the screen. So that’s why when you type echo Hello World in your shell, you get Hello World on the screen. However, I/O redirection allows us to change this default behavior giving us greater file flexibility.

Let’s proceed to the next part of the command:

 > 

The > is a redirection operator that allows us the change where standard output goes. It allows us to send the output of echo Hello World to a file instead of the screen. If the file does not already exist it will create it for us. However, if it does exist it will overwrite it (you can add a shell flag to prevent this depending on what shell you are using).

And that’s basically how stdout redirection works!

Well let’s say I didn’t want to overwrite my peanuts.txt, luckily there is a redirection operator for that as well, >>

$ echo Hello World >> peanuts.txt

This will append Hello World to the end of the peanuts.txt file, if the file doesn’t already exist it will create it for us like it did with the > redirector!

Exercise

Try a couple of commands:

$ ls -l /var/log > myoutput.txt
$ echo Hello World > rm
$ > somefile.txt 

Quiz Question

# What redirector do you use to append output to a file? > It should be noted that in Linux all these streams are treated as if they were files. Also, linux assigns unique values to each of these data streams. ```0 = stdin, 1 = stdout, 2 = stderr``` 1. [ ] add 2. [ ] append 3. [ ] \> 4. [x] \>\>

2. stdin (Standard In)

Lesson Content

In our previous lesson we learned that we have different stdout streams we can use, such as a file or the screen. Well there are also different standard input (stdin) streams we can use as well. We know that we have stdin from devices like the keyboard, but we can use files, output from other processes and the terminal as well, let’s see an example.

Let’s use the peanuts.txt file in the previous lesson for this example, remember it had the text Hello World in it.

$ cat < peanuts.txt > banana.txt 

Just like we had > for stdout redirection, we can use < for stdin redirection.

Normally in the cat command, you send a file to it and that file becomes the stdin, in this case, we redirected peanuts.txt to be our stdin. Then the output of cat peanuts.txt which would be Hello World gets redirected to another file called banana.txt.

Exercise

Try out a couple of commands:

$ echo < peanuts.txt > banana.txt
$ ls < peanuts.txt > banana.txt
$ pwd < peanuts.txt > banana.txt

Quiz Question

# What redirector do you use to redirect stdin? > stdin is an input stream where data is sent to and read by a program. It is a file descriptor in Unix-like operating systems, and programming languages, such as C, Perl, and Java. 1. [ ] \<\~ 2. [ ] \<\- 3. [ ] \<\< 4. [x] \<

3. stderr (Standard Error)

Lesson Content

Let’s try something a little different now, let’s try to list the contents of a directory that doesn’t exist on your system and redirect the output to the peanuts.txt file again.

$ ls /fake/directory > peanuts.txt 

What you should see is:

ls: cannot access /fake/directory: No such file or directory

Now you’re probably thinking, shouldn’t that message have been sent to the file? There is actually another I/O stream in play here called standard error (stderr). By default, stderr sends its output to the screen as well, it’s a completely different stream than stdout. So you’ll need to redirect its output a different way.

Unfortunately the redirector is not as nice as using < or > but it’s pretty close. We will have to use file descriptors. A file descriptor is a non-negative number that is used to access a file or stream. We will go in depth about this later, but for now know that the file descriptor for stdin, stdout and stderr is 0, 1, and 2 respectively.

So now if we want to redirect our stderr to the file we can do this:

$ ls /fake/directory 2> peanuts.txt

You should see just the stderr messages in peanuts.txt.

Now what if I wanted to see both stderr and stdout in the peanuts.txt file? It’s possible to do this with file descriptors as well:

$ ls /fake/directory > peanuts.txt 2>&1

This sends the results of ls /fake/directory to the peanuts.txt file and then it redirects stderr to the stdout via 2>&1. The order of operations here matters, 2>&1 sends stderr to whatever stdout is pointing to. In this case stdout is pointing to a file, so 2>&1 also sends stderr to a file. So if you open up that peanuts.txt file you should see both stderr and stdout. In our case, the above command only outputs stderr.

There is a shorter way to redirect both stdout and stderr to a file:

$ ls /fake/directory &> peanuts.txt

Now what if I don’t want any of that cruft and want to get rid of stderr messages completely? Well you can also redirect output to a special file call /dev/null and it will discard any input.

$ ls /fake/directory 2> /dev/null

Exercise

What is the following command doing?

$ ls /fake/directory >> /dev/null 2>&1

Quiz Question

# What is the redirector for stderr? > It would be more correct to say that ```stdin, stdout, and stderr``` are "I/O streams" rather than files. As you've noticed, these entities do not live in the filesystem. But the Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice, that really means that you can use the same library functions and interfaces (printf, scanf, read, write, select, etc.) without worrying about whether the I/O stream is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction. 1. [ ] \>\~ 2. [ ] \>2 3. [ ] 1\> 4. [x] 2>

4. pipe and tee

Lesson Content

Let’s get into some plumbing now, not really but kinda. Let’s try a command:

$ ls -la /etc

You should see a very long list of items, it’s a little hard to read actually. Instead of redirecting this output to a file, wouldn’t it be nice if we could just see the output in another command like less? Well we can!

$ ls -la /etc | less 

The pipe operator |, represented by a vertical bar, allows us to get the stdout of a command and make that the stdin to another process. In this case, we took the stdout of ls -la /etc and then piped it to the less command. The pipe command is extremely useful and we will continue to use it for all eternity.

Well what if I wanted to write the output of my command to two different streams? That’s possible with the tee command:

$ ls | tee peanuts.txt

You should see the output of ls on your screen and if you open up the peanuts.txt file you should see the same information!

Exercise

Try the following commands:

$ ls | tee peanuts.txt banan.txt

Quiz Question

# What key represents the pipe operator? > Pipes help you mash-up two or more commands at the same time and run them consecutively. You can use powerful commands which can perform complex tasks in a jiffy. 1. [ ] \<\^ 2. [ ] \> 3. [ ] \< 4. [x] |

5. env (Environment)

Lesson Content

Run the following command:

$ echo $HOME

You should see the path to your home directory, mine looks like /home/pete.

What about this command?

$ echo $USER

You should see your username!

Where is this information coming from? It’s coming from your environment variables. You can view these by typing

$ env

This outputs a whole lot of information about the environment variables you currently have set. These variables contain useful information that the shell and other processes can use.

Here is a short example:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin
PWD=/home/user
USER=pete

One particularly important variable is the PATH Variable. You can access these variables by sticking a $ infront of the variable name like so:

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin

This returns a list of paths separated by a colon that your system searches when it runs a command. Let’s say you manually download and install a package from the internet and put it in to a non standard directory and want to run that command, you type $ coolcommand and the prompt says command not found. Well that’s silly you are looking at the binary in a folder and know it exists. What is happening is that $PATH variable doesn’t check that directory for this binary so it’s throwing an error.

Let’s say you had tons of binaries you wanted to run out of that directory, you can just modify the PATH variable to include that directory in your PATH environment variable.

Exercise

What does the following output? Why?

$ echo $HOME

Quiz Question

# How do you see your environment variables? > The env command allows you to display your current environment or run a specified command in a changed environment. If no flags or parameters are specified, the env command displays your current environment, showing one Name=Value pair per line. 1. [ ] show env 2. [ ] envar 3. [ ] PATH 4. [x] env

env

6. cut

Lesson Content

We’re gonna learn a couple of useful commands that you can use to process text. Before we get started, let’s create a file that we’ll be working with. Copy and paste the following command, once you do that add a TAB in between lazy and dog (hold down Ctrl-v + TAB).

$ echo 'The quick brown; fox jumps over the lazy  dog' > sample.txt

First command we’ll be learning about is the cut command. It extracts portions of text from a file.

To extract contents by a list of characters:

$ cut -c 5 sample.txt

This outputs the 5th character in each line of the file. In this case it is “q”, note that the space also counts as a character.

To extract the contents by a field, we’ll need to do a little modification:

$ cut -f 2 sample.txt

The -f or field flag cuts text based off of fields, by default it uses TABs as delimiters, so everything separated by a TAB is considered a field. You should see “dog” as your output.

You can combine the field flag with the delimiter flag to extract the contents by a custom delimiter:

$ cut -f 1 -d ";" sample.txt

This will change the TAB delimiter to a “;” delimiter and since we are cutting the first field, the result should be “The quick brown”.

Exercise

What does the following command do? Why?

$ cut -c 5-10 sample.txt
$ cut -c 5- sample.txt
$ cut -c -5 sample.txt

Quiz Question

# What command would you use to get the first character of every line in a file? > Use the cut command to write selected bytes, characters, or fields from each line of a file to standard output. 1. [ ] cut -c 2. [ ] cut -s 3. [ ] cut 1 4. [x] cut -c 1

7. paste

Lesson Content

The paste command is similar to the cat command, it merges lines together in a file. Let’s create a new file with the following contents:

sample2.txt
The
quick
brown
fox

Let’s combine all these lines into one line:

$ paste -s sample2.txt

The default delimiter for paste is TAB, so now there is one line with TABs separating each word.

Let’s change this delimiter (-d) to something a little more readable:

$ paste -d ' ' -s sample2.txt

Now everything should be on one line delimited by spaces.

Exercise

Try to paste multiple files together, what happens?

Quiz Question

# What flag do you use with paste to make everything go on one line? > Paste command is one of the useful commands in Unix or Linux operating system. It is used to join files horizontally (parallel merging) by outputting lines consisting of lines from each file specified, separated by tab as delimiter, to the standard output. When no file is specified, or put dash (“-“) instead of file name, paste reads from standard input and gives output as it is until a interrupt command [Ctrl-c] is given. 1. [ ] -c 2. [ ] -o 3. [ ] -l 4. [x] -s

8. head

Lesson Content

Let’s say we have a very long file, in fact we have many to choose from, go ahead and cat /var/log/syslog. You should see pages upon pages of text. What if I just wanted to see the first couple of lines in this text file? Well we can do that with the head command, by default the head command will show you the first 10 lines in a file.

$ head /var/log/syslog

You can also modify the line count to whatever you choose, let’s say I wanted to see the first 15 lines instead.

$ head -n 15 /var/log/syslog

The -n flag stands for number of lines.

Exercise

What does the following command do and why?

$ head -c 15 /var/log/syslog

Quiz Question

# What flag would you use to change the number of lines you want to view for the head command? > It is the complementary of Tail command. The head command, as the name implies, print the top N number of data of the given input. By default, it prints the first 10 lines of the specified files. 1. [ ] -nl 2. [ ] -h 3. [ ] head 4. [x] -n

9. tail

Lesson Content

Similar to the head command, the tail command lets you see the last 10 lines of a file by default.

$ tail /var/log/syslog

Along with head you can change the number of lines you want to see.

$ tail -n 10 /var/log/syslog

Another great option you can use is the -f (follow) flag, this will follow the file as it grows. Give it a try and see what happens.

$ tail -f /var/log/syslog

Your syslog file will be continually changing while you interact with your system and using tail -f you can see everything that is getting added to that file.

Exercise

Look at the man page of tail and read some of the other commands we didn’t discuss.

$ man tail

Quiz Question

# What is the flag used to follow a file in tail? > It is the complementary of head command.The tail command, as the name implies, print the last N number of data of the given input. By default it prints the last 10 lines of the specified files. 1. [ ] tail -c 2. [ ] tailf 3. [ ] follow 4. [x] -f

10. expand and unexpand

Lesson Content

In our lesson on the cut command, we had our sample.txt file that contained a tab. Normally TABs would usually show a noticeable difference but some text files don’t show that well enough. Having TABs in a text file may not be the desired spacing you want. To change your TABs to spaces, use the expand command.

$ expand sample.txt

The command above will print output with each TAB converted into a group of spaces. To save this output in a file, use output redirection like below.

$ expand sample.txt > result.txt

Opposite to expand, we can convert back each group of spaces to a TAB with the unexpand command:

$ unexpand -a result.txt

Exercise

What happens if you just type expand with no file input?

Quiz Question

# What command is used to convert TABs to spaces? > The expand command writes the named files or standard input to standard output, and replaces the tab characters with one or more space characters. 1. [ ] unexpand 2. [ ] space 3. [ ] tab 4. [x] expand

11. join and split

Lesson Content

The join command allows you to join multiple files together by a common field:

Let’s say I had two files that I wanted to join together:

file1.txt
1 John
2 Jane
3 Mary

file2.txt
1 Doe
2 Doe
3 Sue

$ join file1.txt file2.txt
1 John Doe
2 Jane Doe
3 Mary Sue

See how it joined together my files? They are joined together by the first field by default and the fields have to be identical, if they are not you can sort them, so in this case the files are joined via 1, 2, 3.

How would we join the following files?

file1.txt
John 1
Jane 2
Mary 3

file2.txt
1 Doe
2 Doe
3 Sue

To join this file you need to specify which fields you are joining, in this case we want field 2 on file1.txt and field 1 on file2.txt, so the command would look like this:

$ join -1 2 -2 1 file1.txt file2.txt
1 John Doe
2 Jane Doe
3 Mary Sue

-1 refers to file1.txt and -2 refers to file2.txt. Pretty neat. You can also split a file up into different files with the split command:

$ split somefile

This will split it into different files, by default it will split them once they reach a 1000 line limit. The files are named x** by default.

Exercise

Join two files with different number of lines in each file, what happens?

Quiz Question

# What command would you use to join files named cat dog cow? > - The join command reads the files specified by the File1 and File2 parameters, joins lines in the files according to the flags, and writes the results to standard output. The File1 and File2 parameters must be text files. > - Split command in Linux is used to split large files into smaller files. It splits the files into 1000 lines per file(by default) and even allows users to change the number of lines as per requirement. 1. [ ] join cat cow dog 2. [ ] join cow dog cat 3. [ ] join dog cat cow 4. [x] join cat dog cow

12. sort

Lesson Content

The sort command is useful for sorting lines.

file1.txt
dog
cow
cat
elephant
bird

$ sort file1.txt
bird
cat
cow
dog
elephant

You can also do a reverse sort:

$ sort -r file1.txt
elephant
dog
cow
cat
bird

And also sort via numerical value:

$ sort -n file1.txt
bird
cat
cow
elephant
dog

Exercise

The real power of sort comes with its ability to be combined with other commands, try the following command and see what happens?

$ ls /etc | sort -rn

Quiz Question

# What flag do you use to do a reverse sort? > SORT command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts file assuming the contents are ASCII. Using options in the sort command can also be used to sort numerically. 1. [ ] sort -d 2. [ ] revsort 3. [ ] rev 4. [x] -r

13. tr (Translate)

Lesson Content

The tr (translate) command allows you to translate a set of characters into another set of characters. Let’s try an example of translating all lower case characters to uppercase characters.

$ tr a-z A-Z
hello
HELLO

As you can see we made the ranges of a-z into A-Z and all text we type that is lowercase gets uppercased.

Exercise

Try the following command what happens?

$ tr -d ello
hello

Quiz Question

# What command is used to translate characters? > The tr command is a Linux command-line utility that translates or deletes characters from standard input (stdin) and writes the result to standard output (stdout). Use tr to perform different text transformations, including case conversion, squeezing or deleting characters, and basic text replacement. 1. [ ] change 2. [ ] translate 3. [ ] convert 4. [x] tr

14. uniq (Unique)

Lesson Content

The uniq (unique) command is another useful tool for parsing text.

Let’s say you had a file with lots of duplicates:

reading.txt
book
book
paper
paper
article
article
magazine

And you wanted to remove the duplicates, well you can use the uniq command:

$ uniq reading.txt
book
paper
article
magazine

Let’s get the count of how many occurrences of a line:

$ uniq -c reading.txt
2 book
2 paper
2 article
1 magazine

Let’s just get unique values:

$ uniq -u reading.txt
magazine

Let’s just get duplicate values:

$ uniq -d reading.txt
book
paper
article

Note : uniq does not detect duplicate lines unless they are adjacent. For eg:

Let’s say you had a file with duplicates which are not adjacent:

reading.txt
book
paper
book
paper
article
magazine
article
$ uniq reading.txt
reading.txt
book
paper
book
paper
article
magazine
article

The result returned by uniq will contain all the entries unlike the very first example.

To overcome this limitation of uniq we can use sort in combination with uniq:

$ sort reading.txt | uniq
article
book
magazine
paper

Exercise

What result would you get if you tried uniq -uc?

Quiz Question

# What command would you use to remove duplicates in a file? > The uniq command deletes repeated lines in a file. The uniq command reads either standard input or a file specified by the InFile parameter. The command first compares adjacent lines and then removes the second and succeeding duplications of a line. Duplicated lines must be adjacent. (Before issuing the uniq command, use the sort command to make all duplicate lines adjacent.) Finally, the uniq command writes the resultant unique lines either to standard output or to the file specified by the OutFile parameter. The InFile and OutFile parameters must specify different files. 1. [ ] one 2. [ ] only 3. [ ] distinct 4. [x] uniq

15. wc and nl

Lesson Content

The wc (word count) command shows the total count of words in a file.

$ wc /etc/passwd
 96     265    5925 /etc/passwd

It display the number of lines, number of words and number of bytes, respectively.

To just see just the count of a certain field, use the -l, -w, or -c respectively.

$ wc -l /etc/passwd
96

Another command you can use to check the count of lines on a file is the nl (number lines) command.

file1.txt
i
like
turtles
$ nl file1.txt
1. i
2. like
3. turtles

Exercise

How would you get the total count of lines by using the nl file without searching through the entire output? Hint: Use some of the other commands you learned in this course.

Quiz Question

# What command would you use to get the total number of words in a file and just the words? > nl command is a Unix/Linux utility that is used for numbering lines, accepting input either from a file or from STDIN. It copies each specified file to STDOUT, with line numbers appended before the lines. 1. [ ] wc -c 2. [ ] wc -l 3. [ ] wc 4. [x] wc -w

16. grep

Lesson Content

The grep command is quite possibly the most common text processing command you will use. It allows you to search files for characters that match a certain pattern. What if you wanted to know if a file existed in a certain directory or if you wanted to see if a string was found in a file? You certainly wouldn’t dig through every line of text, you would use grep!

Let’s use our sample.txt file as an example:

$ grep fox sample.txt

You should see that grep found fox in the sample.txt file.

You can also grep patterns that are case insensitive with the -i flag:

$ grep -i somepattern somefile

To get even more flexible with grep you can combine it with other commands with |.

$ env | grep -i User

As you can see grep is pretty versatile. You can even use regular expressions in your pattern:

$ ls /somedir | grep '.txt$'

Should return all files ending with .txt in somedir.

Exercise

You may have heard of egrep or fgrep, these are deprecated grep calls and have since been replaced by grep -E and grep -F. Read the grep manpage to learn more.

Quiz Question

# What command do you use to find a certain pattern? > - Do not run the grep command on a special file because it produces unpredictable results. Input lines should not contain the NULL character. > - Input files should end with the newline character. The newline character will not be matched by the regular expressions. > - Although some flags can be specified simultaneously, some flags override others. For example, the -l option takes precedence over all other flags. And if you specify both the -E and -F flags, the last one specified takes priority. 1. [ ] look 2. [ ] find 3. [ ] search 4. [x] grep