Table of Contents Link to heading
- Text Files
- Viewing Files in the Terminal
- Viewing Files Using a Pager
- Viewing Parts of a File
- Input/Output (I/O) Redirection
- Sorting Files or STDIN
- Viewing File Statistics
- Filtering Files
- Regular Expressions (Regex)
- Command-Line Pipes
Text Files Link to heading
only contain text, no formatting features compared to a word processing file.
- A significant number of commands exist to help users manipulate (view and modify) text files.
- There are features available for the shell to control the output of commands.
Viewing Files in the Terminal Link to heading
ideal for quickly viewing small-sized files.
- The
cat
command, short for concatenate, can be used to create and display text files, as well as combining copies of text files. - To display the content of a file in the standard output using this command, type the command followed by the file name.
- Although the terminal is the default output of this command, the file content can also be redirected to other files or input to another command by using redirection characters.
Viewing Files Using a Pager Link to heading
ideal for viewing large-sized files as pagers allow users to move around a document.
- There are two commonly used pager commands:
- The
less
command provides advanced paging capability, but is not included with all Linux distributions.- The
pager
command references theless
command.
- The
- The
more
command has fewer features thanless
, but is available in most Linux distributions.
- The
- To view a file using the
pager
command, type the command followed by the file name.
Viewing Parts of a File Link to heading
The
head
andtail
commands are used to display only the first or last few lines of a file.
head -n count path/to/file
displays the first “count” lines of files.tail [-n] [-]count path/to/file
displays the last “count” lines of files.tail +count path/to/file
displays the file starting at the specified line number.- Without the -n (or –lines) option, the default value implies 10 lines.
Monitoring Live File Changes Link to heading
For example, if you were to log in as the root user, you could troubleshoot issues with the email server by viewing live changes to the /var/log/mail.log log file.
Print the last lines of a given file and keep reading file until Ctrl + C:
tail --follow path/to/file
Keep reading file until Ctrl + C, even if the file is inaccessible:
tail --retry --follow path/to/file
Show last “num” lines in ‘file’ and refresh every “n” seconds:
tail --lines count --sleep-interval seconds --follow path/to/file
Input/Output (I/O) Redirection Link to heading
< or > characters, allows command-line information to be redirected to different streams.
- Redirection is achieved by using the arrows, < >, characters.
- It allows the user to redirect:
- STDIN so that input comes from a file.
- STDOUT/STDERR so that output goes to a file.
Standard Output (STDOUT) Link to heading
stream #1, the normal output of commands that function correctly (without errors).
Using the
>
character, the STDOUT of a command can be redirected to a file:ls /usr/games > stdout.txt
- The file contains the output of the echo command.
The
>
character overwrites a file.The
>>
character only appends to a file.
Standard Error (STDERR) Link to heading
stream #2, error messages generated by commands.
Using the
2>
character, the STDERR of a command can be redirected to a file:ls /fake 2> stderr.txt
- The file contains the error of the echo command if the specified directory does not exist.
- This error messages is sent to this file.
The
>
character overwrites a file.The
>>
character only appends to a file.
Redirecting Multiple Streams Link to heading
Using the
&>
character, both STDOUT and STDERR of a command can be redirected to a file.ls /fake /usr/games &> both_streams.txt
- The file contains:
- The error message since the spcified directory does not exist.
- All files under the /usr/games directory.
- The file contains:
To redirected STDOUT and STDERR to different files, use both
>
and2>
logically:ls /fake /user/games > stdout.txt 2> stderr.txt
Standard Input (STDIN) Link to heading
information entered by the user as input to a command.
It has a file descriptor of 0
Some commands accept both STDIN to as data from the keyboard and from a file by using the
<
character.tr 'a-z' 'A-Z' < example.txt > new_example.txt
- The
tr
command is a translation utility that runs replacements based on single characters and character sets. - Every word, from a to z, in the file is capitalised.
- Save the resulting output by redirecting it into another file.
- The
Most commands do accept file names as arguments, so this use case is relatively rare. However, for those that do not, this method could be used to have the shell read from the file instead of relying on the command to have this ability.
Sorting Files or STDIN Link to heading
The
sort
command is used to rearrange the lines of STDIN or files.
Sort a file in ascending order:
sort path/to/file
Sort a file in descending order:
sort --reverse path/to/file
Sort a file in case-insensitive way:
sort --ignore-case path/to/file
Sort a file numerically rather than alphabetically:
sort --numeric-sort path/to/file
Sort /etc/passwd by the 3rd field of each line numerically, using “:” as a field delimiter:
sort --field-separator=: --key=3n /etc/passwd
Sort a file while deleting duplicate lines:
sort --unique path/to/file
Sort a file, printing the output to the specified output file (can be used to sort a file in-place):
sort --output=file1_name file2_name
Sort numbers with exponents:
sort --general-numeric-sort path/to/file
Viewing File Statistics Link to heading
The
wc
command provides the number of lines, words, and bytes (1 byte = 1 character in a text file) for a file, and a total line count if more than one file is specified.
Count all lines in a file:
wc --lines path/to/file
Count all words in a file:
wc --words path/to/file
Count all bytes in a file:
wc --bytes path/to/file
Count all characters in a file (taking multi-byte characters into account):
wc --chars path/to/file
Count all lines, words, and bytes from stdin:
find . | wc
Count the length of the longest line in number of characters:
wc --max-line-length path/to/file
Filtering Files Link to heading
The
cut
command can extract columns of text from STDIN or files. Thus, the file must contain columns separated by a delimiter (default, the Tab character).
Print a specific character/field range of each line:
command | cut --characters|fields=1|1,10|1-10|1-|-10
Print a range of each line with a specific delimiter:
command | cut --delimiter="," --fields=1
Print a range of each line of the specific file:
cut --characters=1 path/to/file
The
grep
command can be used to filter lines in STDOUT or files that matches a specified pattern:
Search for a pattern within a file:
grep "search_pattern" path/to/file
Search for an exact string (disables regular expressions):
grep --fixed-strings "exact_string" path/to/file
Search for a pattern in all files recursively in a directory, showing line numbers of matches, ignoring binary files:
grep --recursive --line-number --binary-files=without-match "search_pattern" path/to/directory
Use extended regular expressions (supports ?, +, {}, (), and |), in case-insensitive mode:
grep --extended-regexp --ignore-case "search_pattern" path/to/file
Print 3 lines of context around, before, or after each match:
grep --context|before-context|after-context=3 "search_pattern" path/to/file
Print file name and line number for each match with colour output:
grep --with-filename --line-number --color=always "search_pattern" path/to/file
Search for lines matching a pattern, printing only the matched text:
grep --only-matching "search_pattern" path/to/file
Search stdin for lines that do not match a pattern:
cat path/to/file | grep --invert-match "search_pattern"
Regular Expressions (Regex) Link to heading
are a collection of normal and special characters that are used to filter simple or complex patterns in files.
- Normal characters are alphanumeric characters which match themselves.
- e.g. an a character would match an a.
- Special characters have special meanings when used within patterns by
commands, such as the
grep
,more
, andless
commands.
- Double quotes around a string are used to specify a regular expression search.
Basic Regular Expression (BRE) Link to heading
available to a wide variety of Linux commands.
Character | Matches |
---|---|
. | Any single character except for the new line character |
[ ] | A single character from the list or range of possible characters. If the first character is the caret ^, it means any character not in the list |
* | Zero or more occurrences of a character or pattern preceding it |
^ | The pattern at the beginning of the line; otherwise, just a literal ^ |
$ | The pattern at the end of the line; otherwise, just a literal $ |
\ | The character preceding it appends to be a special Regular Expression character |
grep "r..f" file.txt
- The letter r followed by exactly two characters and then the letter f (e.g. roof)
grep "...." file.txt
- At least four characters (e.g., roof, rooftop)
grep "[0-9]" file.txt
- A numeric character
grep "[abcd]" file.txt
orgrep "[a-d]" file.txt
- One of the four letters (a, b, c, and d)
grep "[^0-9]" file.txt
- No numeric character presented
grep "re*d" file.txt
- The letter e is repeated zero or more times (e.g., red, roof)
grep "r[oe]*d" file.txt
- The letter o OR the letter e is repeated zero or more times (e.g., red, roof)
grep "^Root" file.txt
- The line begins with the word root
grep "r$" alpha file.txt
- The line ends with the letter r
grep "re\*" file.md
- The word re\ (e.g. markdown document - **beware** )
Extended Regular Expression (ERE) Link to heading
available to advanced Linux commands,
- To use extended regular expressions:
- The
-E
(or--extended-regexp
) option to thegrep
command - The
egrep
command
Character | Matches |
---|---|
? | Zero or more occurrences of a character or pattern preceding it |
+ | One or more occurrences of a character or pattern preceding it |
| | Either a character or pattern preceding or following it |
Brackets | Description |
---|---|
() | Used to limit the scope of the | character |
{} | Used to specify a number of times a pattern preceding it should be repeated |
egrep "colou?r" vocab.txt
- The word colo followed by zero or more occurrences of the u character
egrep "colou+r" vocab.txt
- The letter u is repeated one or more times
egrep "learnt|learned" file.txt
oregrep "learn(t|ed)" vocab.txt
- Ether the word learnt or the word learned
egrep "[0-9]{3}" /etc/passwd
- A three-digit number
Command-Line Pipes Link to heading
| character, makes the output of one command (before |) becomes input for the next command (after |).
Multiple pipes can be used consecutively to link multiple commands together.
Each command only sees the output of the previous command.
ls /usr/games | nl -b p"^git"
- The
nl
command is a utility for numbering lines from STDIN or files.
- The
The full output of the
ls
command is passed to thenl
command by the shell instead of being printed to the screen.The
nl
command takes this output as input data and numbers lines that matches a BRE pattern.The output of
nl
is then printed to the screen.