Featured image

Table of Contents Link to heading

Text Files Link to heading

only contain text, no formatting features compared to a word processing file.

  • A significant number of commands exist to help users manipulate (view and modify) text files.
  • There are features available for the shell to control the output of commands.

Viewing Files in the Terminal Link to heading

ideal for quickly viewing small-sized files.

  • The cat command, short for concatenate, can be used to create and display text files, as well as combining copies of text files.
  • To display the content of a file in the standard output using this command, type the command followed by the file name.
  • Although the terminal is the default output of this command, the file content can also be redirected to other files or input to another command by using redirection characters.

Viewing Files Using a Pager Link to heading

ideal for viewing large-sized files as pagers allow users to move around a document.

  • There are two commonly used pager commands:
    1. The less command provides advanced paging capability, but is not included with all Linux distributions.
      • The pager command references the less command.
    2. The more command has fewer features than less, but is available in most Linux distributions.
  • To view a file using the pager command, type the command followed by the file name.

Viewing Parts of a File Link to heading

The head and tail commands are used to display only the first or last few lines of a file.

  • head -n count path/to/file displays the first “count” lines of files.
  • tail [-n] [-]count path/to/file displays the last “count” lines of files.
  • tail +count path/to/file displays the file starting at the specified line number.
  • Without the -n (or –lines) option, the default value implies 10 lines.

Monitoring Live File Changes Link to heading

For example, if you were to log in as the root user, you could troubleshoot issues with the email server by viewing live changes to the /var/log/mail.log log file.

  • Print the last lines of a given file and keep reading file until Ctrl + C:

    tail --follow path/to/file

  • Keep reading file until Ctrl + C, even if the file is inaccessible:

    tail --retry --follow path/to/file

  • Show last “num” lines in ‘file’ and refresh every “n” seconds:

    tail --lines count --sleep-interval seconds --follow path/to/file

Input/Output (I/O) Redirection Link to heading

< or > characters, allows command-line information to be redirected to different streams.

  • Redirection is achieved by using the arrows, < >, characters.
  • It allows the user to redirect:
  1. STDIN so that input comes from a file.
  2. STDOUT/STDERR so that output goes to a file.

Standard Output (STDOUT) Link to heading

stream #1, the normal output of commands that function correctly (without errors).

  • Using the > character, the STDOUT of a command can be redirected to a file:

    ls /usr/games > stdout.txt

    • The file contains the output of the echo command.
  • The > character overwrites a file.

  • The >> character only appends to a file.

Standard Error (STDERR) Link to heading

stream #2, error messages generated by commands.

  • Using the 2> character, the STDERR of a command can be redirected to a file:

    ls /fake 2> stderr.txt

    • The file contains the error of the echo command if the specified directory does not exist.
    • This error messages is sent to this file.
  • The > character overwrites a file.

  • The >> character only appends to a file.

Redirecting Multiple Streams Link to heading

  • Using the &> character, both STDOUT and STDERR of a command can be redirected to a file.

    ls /fake /usr/games &> both_streams.txt

    • The file contains:
      1. The error message since the spcified directory does not exist.
      2. All files under the /usr/games directory.
  • To redirected STDOUT and STDERR to different files, use both > and 2> logically:

    ls /fake /user/games > stdout.txt 2> stderr.txt

Standard Input (STDIN) Link to heading

information entered by the user as input to a command.

  • It has a file descriptor of 0

  • Some commands accept both STDIN to as data from the keyboard and from a file by using the < character.

    tr 'a-z' 'A-Z' < example.txt > new_example.txt

    • The tr command is a translation utility that runs replacements based on single characters and character sets.
    • Every word, from a to z, in the file is capitalised.
    • Save the resulting output by redirecting it into another file.
  • Most commands do accept file names as arguments, so this use case is relatively rare. However, for those that do not, this method could be used to have the shell read from the file instead of relying on the command to have this ability.

Sorting Files or STDIN Link to heading

The sort command is used to rearrange the lines of STDIN or files.

  • Sort a file in ascending order:

    sort path/to/file

  • Sort a file in descending order:

    sort --reverse path/to/file

  • Sort a file in case-insensitive way:

    sort --ignore-case path/to/file

  • Sort a file numerically rather than alphabetically:

    sort --numeric-sort path/to/file

  • Sort /etc/passwd by the 3rd field of each line numerically, using “:” as a field delimiter:

    sort --field-separator=: --key=3n /etc/passwd

  • Sort a file while deleting duplicate lines:

    sort --unique path/to/file

  • Sort a file, printing the output to the specified output file (can be used to sort a file in-place):

    sort --output=file1_name file2_name

  • Sort numbers with exponents:

    sort --general-numeric-sort path/to/file

Viewing File Statistics Link to heading

The wc command provides the number of lines, words, and bytes (1 byte = 1 character in a text file) for a file, and a total line count if more than one file is specified.

  • Count all lines in a file:

    wc --lines path/to/file

  • Count all words in a file:

    wc --words path/to/file

  • Count all bytes in a file:

    wc --bytes path/to/file

  • Count all characters in a file (taking multi-byte characters into account):

    wc --chars path/to/file

  • Count all lines, words, and bytes from stdin:

    find . | wc

  • Count the length of the longest line in number of characters:

    wc --max-line-length path/to/file

wc

Filtering Files Link to heading

The cut command can extract columns of text from STDIN or files. Thus, the file must contain columns separated by a delimiter (default, the Tab character).

  • Print a specific character/field range of each line:

    command | cut --characters|fields=1|1,10|1-10|1-|-10

  • Print a range of each line with a specific delimiter:

    command | cut --delimiter="," --fields=1

  • Print a range of each line of the specific file:

    cut --characters=1 path/to/file

The grep command can be used to filter lines in STDOUT or files that matches a specified pattern:

  • Search for a pattern within a file:

    grep "search_pattern" path/to/file

  • Search for an exact string (disables regular expressions):

    grep --fixed-strings "exact_string" path/to/file

  • Search for a pattern in all files recursively in a directory, showing line numbers of matches, ignoring binary files:

    grep --recursive --line-number --binary-files=without-match "search_pattern" path/to/directory

  • Use extended regular expressions (supports ?, +, {}, (), and |), in case-insensitive mode:

    grep --extended-regexp --ignore-case "search_pattern" path/to/file

  • Print 3 lines of context around, before, or after each match:

    grep --context|before-context|after-context=3 "search_pattern" path/to/file

  • Print file name and line number for each match with colour output:

    grep --with-filename --line-number --color=always "search_pattern" path/to/file

  • Search for lines matching a pattern, printing only the matched text:

    grep --only-matching "search_pattern" path/to/file

  • Search stdin for lines that do not match a pattern:

    cat path/to/file | grep --invert-match "search_pattern"

Regular Expressions (Regex) Link to heading

are a collection of normal and special characters that are used to filter simple or complex patterns in files.

  1. Normal characters are alphanumeric characters which match themselves.
    • e.g. an a character would match an a.
  2. Special characters have special meanings when used within patterns by commands, such as the grep, more, and less commands.
  • Double quotes around a string are used to specify a regular expression search.

Basic Regular Expression (BRE) Link to heading

available to a wide variety of Linux commands.

CharacterMatches
.Any single character except for the new line character
[ ]A single character from the list or range of possible characters. If the first character is the caret ^, it means any character not in the list
*Zero or more occurrences of a character or pattern preceding it
^The pattern at the beginning of the line; otherwise, just a literal ^
$The pattern at the end of the line; otherwise, just a literal $
\The character preceding it appends to be a special Regular Expression character
  • grep "r..f" file.txt
    • The letter r followed by exactly two characters and then the letter f (e.g. roof)
  • grep "...." file.txt
    • At least four characters (e.g., roof, rooftop)
  • grep "[0-9]" file.txt
    • A numeric character
  • grep "[abcd]" file.txt or grep "[a-d]" file.txt
    • One of the four letters (a, b, c, and d)
  • grep "[^0-9]" file.txt
    • No numeric character presented
  • grep "re*d" file.txt
    • The letter e is repeated zero or more times (e.g., red, roof)
  • grep "r[oe]*d" file.txt
    • The letter o OR the letter e is repeated zero or more times (e.g., red, roof)
  • grep "^Root" file.txt
    • The line begins with the word root
  • grep "r$" alpha file.txt
    • The line ends with the letter r
  • grep "re\*" file.md
    • The word re\ (e.g. markdown document - **beware** )

Extended Regular Expression (ERE) Link to heading

available to advanced Linux commands,

  • To use extended regular expressions:
  1. The -E (or --extended-regexp) option to the grep command
  2. The egrep command
CharacterMatches
?Zero or more occurrences of a character or pattern preceding it
+One or more occurrences of a character or pattern preceding it
|Either a character or pattern preceding or following it
BracketsDescription
()Used to limit the scope of the | character
{}Used to specify a number of times a pattern preceding it should be repeated
  • egrep "colou?r" vocab.txt
    • The word colo followed by zero or more occurrences of the u character
  • egrep "colou+r" vocab.txt
    • The letter u is repeated one or more times
  • egrep "learnt|learned" file.txt or egrep "learn(t|ed)" vocab.txt
    • Ether the word learnt or the word learned
  • egrep "[0-9]{3}" /etc/passwd
    • A three-digit number

Command-Line Pipes Link to heading

| character, makes the output of one command (before |) becomes input for the next command (after |).

  • Multiple pipes can be used consecutively to link multiple commands together.

  • Each command only sees the output of the previous command.

    ls /usr/games | nl -b p"^git"

    • The nl command is a utility for numbering lines from STDIN or files.
  • The full output of the ls command is passed to the nl command by the shell instead of being printed to the screen.

  • The nl command takes this output as input data and numbers lines that matches a BRE pattern.

  • The output of nl is then printed to the screen.