Featured image

Table of Contents Link to heading

Compression Link to heading

reduces the amount of data needed to store or transmit a file while storing it in such a way that the file can be restored.

  • The compression algorithm is a procedure the computer uses to encode the original file, and as a result, make it smaller.
  • When talking about compression, there are two types:
  1. Lossless: No information is removed from the file.
    • Compressing a file and decompressing it leaves something identical to the original.
    • e.g., GIFs and PNGs
  2. Lossy: Information might be removed from the file.
    • It is compressed in such a way that uncompressing a file will result in a file that is slightly different from the original.
    • For instance, an image with two subtly different shades of green might be made smaller by treating those two shades as the same.
    • e.g. JPEGs

The gzip command compresses/decompresses files using the Lemple-Ziv data compression algorithm (LZ77).

  • Compress a file, replacing it with a gzipped compressed version:

    gzip file.ext

  • Compress a directory recursively, replacing it with a gzipped compressed version:

    gzip --recursive directory/

  • Compress a file, keeping the original file:

    gzip --keep file.ext

  • Compress a file specifying the output filename:

    gzip --stdout file.ext > compressed_file.ext.gz

  • Specify the compression level. 1=Fastest (Worst), 9=Slowest (Best); defaults to 6:

    gzip -9 --stdout file.ext > compressed_file.ext.gz

  • List information about a compressed file, including compressed size, uncompressed size, ratio, and uncompressed_name.

    gzip --list compressed_file.ext.gz

Decompression Link to heading

restores compressed data to its original form.

  • Decompress a file, replacing it with the original uncompressed version:

    gzip --decompress file.ext.gz

  • Decompress a directory, replacing it with the original uncompressed version:

    gzip --decompress --recursive file.ext.gz

  • Decompress a gzipped file specifying the output file name:

    gzip --stdout --decompress file.ext.gz > uncompressed_file.ext

The gunzip command is just a script that calls gzip with the right parameters.

  • Extract a file from an archive, replacing the original file if it exists:

    gunzip archive.tar.gz

  • Extract a directory from an archive, replacing the original directory if it exists:

    gunzip --recursive archive.tar.gz

  • Extract a file to a target destination:

    gunzip --stdout archive.tar.gz > archive.tar

  • Extract a file and keep the archive file:

    gunzip --keep archive.tar.gz

  • List the contents of a compressed file:

    gunzip --list file.txt.gz

  • Decompress an archive from stdin:

    cat path/to/archive.gz | gunzip

Other Compression Utilities Link to heading

  • Different compression utilities have different features (e.g. some provides the use of passwords protect the archive file) and compression techniques used.
  • Depending on the given situation, one utility may be favoured over the others.
  • The bzip utilities use a different compression algorithm called Burrows-Wheeler block sorting, which can compress files smaller than gzip at the expense of more CPU time.
  • Files compressed with the bzip command use the .bz or .bz2 extension.
  • The xz and unxz utilities are functionally similar to gzip and gunzip in that they use the Lempel-Ziv-Markov (LZMA) chain algorithm, which can result in lower decompression CPU times that are on a par with gzip while providing the better compression ratios typically associated with the bzip2 tools.
  • Files compressed with the xz command use the .xz extension.

Archiving Link to heading

when many files or directories are compressed into one file.

The tar archiving utility:

  • Create a tar file, called a tarball, from multiple files.
  • Often combined with a compression method, such as gzip or bzip2.
  • Travels recursively into subdirectories by default, hence not a need for an -r option.
  • Has three modes:
    1. Create - makes a new archive out of a series of files.
    2. Extract - pulls one or more files out of an archive.
    3. List - shows the contents of the archive without extracting.

Remembering the modes is key to figuring out the command-line options necessary to do what you want.

In addition to the mode, remember where to specify the name of the archive, as you may be entering multiple file names on a command line.

Create Mode Link to heading

  • [c]reate an archive and write it to a [f]ile:

    tar cf path/to/target.tar path/to/file1 path/to/file2 ...
    tar --create --file path/to/target.tar path/to/file1 path/to/file2 ...

  • [c]reate a g[z]ipped archive and write it to a [f]ile:

    tar czf path/to/target.tar.gz path/to/file1 path/to/file2 ...
    tar --create --gzip --file path/to/target.tar.gz path/to/file1 path/to/file2 ...

  • [c]reate a g[z]ipped archive from a directory using relative paths:

    tar czf path/to/target.tar.gz C path/to/directory .
    tar --create --gzip --file path/to/target.tar.gz *-directory=path/to/directory .

  • [c]reate a compressed archive and write it to a [f]ile, using [a]rchive suffix to determine the compression program:

    tar caf path/to/target.tar.xz path/to/file1 path/to/file2 ...

Extract Mode Link to heading

  • E[x]tract a (compressed) archive [f]ile into the current directory [v]erbosely:

    tar xvf path/to/source.tar[.gz|.bz2|.xz]
    tar --extract --verbose --file path/to/source.tar[.gz|.bz2|.xz]

  • E[x]tract a (compressed) archive [f]ile into the target directory:

    tar xf path/to/source.tar[.gz|.bz2|.xz] --directory=path/to/directory

  • E[x]tract files matching a pattern from an archive [f]ile:

    tar xf path/to/source.tar[.gz|.bz2|.xz] --wildcards "*.html"

List Mode Link to heading

  • Lis[t] the contents of a tar [f]ile [v]erbosely:

    tar tvf path/to/source.tar
    tar --list --verbose --file path/to/source.tar

Cheatsheet Link to heading

Microsoft ZIP Files Link to heading

The ZIP command is the default archiving utility in Microsoft. Although not prevalent in Linux, ZIP files are still well-supported by using the zip and unzip commands.

  • Although the same commands and options can be used interchangeably to do the creation and extraction with tar and gzip/gunzip, this is not the case with zip and unzip.
  • The same option has different meanings for the two different commands.

Compression Link to heading

  • Add files/directories to a specific archive ([r]ecursively):

    zip -r path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...

  • Remove files/directories from a specific archive ([d]elete):

    zip -d path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...

  • Archive files/directories e[x]cluding specified ones:

    zip -r path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ... -x path/to/excluded_files_or_directories

  • Archive files/directories with a specific compression level. -1=Fastest (Worst), -9=Slowest (Best); default, -6:

    zip -r -9 path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...

  • Create an [e]ncrypted archive with an encrypted password entered by the user in a prompt:

    zip -r -e path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...

  • Archive files/directories to a multi-part [s]plit zip file (e.g. 3 GB parts):

    zip -r -s 3g path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...

  • Print a specific archive contents:

    zip -sf path/to/compressed.zip

Decompression Link to heading

  • Extract all files/directories from specific archives into the current directory:

    unzip path/to/archive1.zip path/to/archive2.zip ...

  • Extract archives to a specific path:

    unzip path/to/archive1.zip path/to/archive2.zip ... -d path/to/output

  • Extract files/directories from archives to stdout:

    unzip -c path/to/archive1.zip path/to/archive2.zip ...

  • Extract the contents of the file(s) to stdout alongside the extracted file names:

    unzip -O gbk path/to/archive1.zip path/to/archive2.zip ...

  • List the contents of a specific archive without extracting them:

    unzip -l path/to/archive.zip

  • Extract a specific file from an archive:

    unzip -j path/to/archive.zip path/to/file_in_archive1 path/to/file_in_archive2 ...

Advantages of Archiving and Compression Link to heading

  1. When making a large number of files available, such as the source code to an application or a collection of documents, it is easier for people to download a compressed archive than it is to download files individually.
  2. Log files have a habit of filling disks, so it is helpful to split them by date and compress older versions.
  3. When backing up directories, it is easier to keep them all in one archive than it is to version (update) each file.
  4. Some streaming devices, such as tapes, perform better if a stream of data is sent rather than individual files.
  5. It can often be faster to compress a file before sending it to a tape drive or over a slower network and decompress it at the other end than it would be to send it uncompressed.