Table of Contents Link to heading
- Compression
- Decompression
- Other Compression Utilities
- Archiving
- Microsoft ZIP Files
- Advantages of Archiving and Compression
Compression Link to heading
reduces the amount of data needed to store or transmit a file while storing it in such a way that the file can be restored.
- The compression algorithm is a procedure the computer uses to encode the original file, and as a result, make it smaller.
- When talking about compression, there are two types:
- Lossless: No information is removed from the file.
- Compressing a file and decompressing it leaves something identical to the original.
- e.g., GIFs and PNGs
- Lossy: Information might be removed from the file.
- It is compressed in such a way that uncompressing a file will result in a file that is slightly different from the original.
- For instance, an image with two subtly different shades of green might be made smaller by treating those two shades as the same.
- e.g. JPEGs
The
gzip
command compresses/decompresses files using the Lemple-Ziv data compression algorithm (LZ77).
Compress a file, replacing it with a gzipped compressed version:
gzip file.ext
Compress a directory recursively, replacing it with a gzipped compressed version:
gzip --recursive directory/
Compress a file, keeping the original file:
gzip --keep file.ext
Compress a file specifying the output filename:
gzip --stdout file.ext > compressed_file.ext.gz
Specify the compression level. 1=Fastest (Worst), 9=Slowest (Best); defaults to 6:
gzip -9 --stdout file.ext > compressed_file.ext.gz
List information about a compressed file, including compressed size, uncompressed size, ratio, and uncompressed_name.
gzip --list compressed_file.ext.gz
Decompression Link to heading
restores compressed data to its original form.
Decompress a file, replacing it with the original uncompressed version:
gzip --decompress file.ext.gz
Decompress a directory, replacing it with the original uncompressed version:
gzip --decompress --recursive file.ext.gz
Decompress a gzipped file specifying the output file name:
gzip --stdout --decompress file.ext.gz > uncompressed_file.ext
The
gunzip
command is just a script that callsgzip
with the right parameters.
Extract a file from an archive, replacing the original file if it exists:
gunzip archive.tar.gz
Extract a directory from an archive, replacing the original directory if it exists:
gunzip --recursive archive.tar.gz
Extract a file to a target destination:
gunzip --stdout archive.tar.gz > archive.tar
Extract a file and keep the archive file:
gunzip --keep archive.tar.gz
List the contents of a compressed file:
gunzip --list file.txt.gz
Decompress an archive from stdin:
cat path/to/archive.gz | gunzip
Other Compression Utilities Link to heading
- Different compression utilities have different features (e.g. some provides the use of passwords protect the archive file) and compression techniques used.
- Depending on the given situation, one utility may be favoured over the others.
- The
bzip
utilities use a different compression algorithm called Burrows-Wheeler block sorting, which can compress files smaller than gzip at the expense of more CPU time. - Files compressed with the
bzip
command use the .bz or .bz2 extension. - The
xz
andunxz
utilities are functionally similar togzip
andgunzip
in that they use the Lempel-Ziv-Markov (LZMA) chain algorithm, which can result in lower decompression CPU times that are on a par withgzip
while providing the better compression ratios typically associated with thebzip2
tools. - Files compressed with the
xz
command use the .xz extension.
Archiving Link to heading
when many files or directories are compressed into one file.
The tar
archiving utility:
- Create a tar file, called a tarball, from multiple files.
- Often combined with a compression method, such as
gzip
orbzip2
. - Travels recursively into subdirectories by default, hence not a need for an -r option.
- Has three modes:
- Create - makes a new archive out of a series of files.
- Extract - pulls one or more files out of an archive.
- List - shows the contents of the archive without extracting.
Remembering the modes is key to figuring out the command-line options necessary to do what you want.
In addition to the mode, remember where to specify the name of the archive, as you may be entering multiple file names on a command line.
Create Mode Link to heading
[c]reate an archive and write it to a [f]ile:
tar cf path/to/target.tar path/to/file1 path/to/file2 ...
tar --create --file path/to/target.tar path/to/file1 path/to/file2 ...
[c]reate a g[z]ipped archive and write it to a [f]ile:
tar czf path/to/target.tar.gz path/to/file1 path/to/file2 ...
tar --create --gzip --file path/to/target.tar.gz path/to/file1 path/to/file2 ...
[c]reate a g[z]ipped archive from a directory using relative paths:
tar czf path/to/target.tar.gz C path/to/directory .
tar --create --gzip --file path/to/target.tar.gz *-directory=path/to/directory .
[c]reate a compressed archive and write it to a [f]ile, using [a]rchive suffix to determine the compression program:
tar caf path/to/target.tar.xz path/to/file1 path/to/file2 ...
Extract Mode Link to heading
E[x]tract a (compressed) archive [f]ile into the current directory [v]erbosely:
tar xvf path/to/source.tar[.gz|.bz2|.xz]
tar --extract --verbose --file path/to/source.tar[.gz|.bz2|.xz]
E[x]tract a (compressed) archive [f]ile into the target directory:
tar xf path/to/source.tar[.gz|.bz2|.xz] --directory=path/to/directory
E[x]tract files matching a pattern from an archive [f]ile:
tar xf path/to/source.tar[.gz|.bz2|.xz] --wildcards "*.html"
List Mode Link to heading
Lis[t] the contents of a tar [f]ile [v]erbosely:
tar tvf path/to/source.tar
tar --list --verbose --file path/to/source.tar
Cheatsheet Link to heading
tar cheatsheet for system administrators pic.twitter.com/ljmIyaBkrb
— Linuxopsys (@linuxopsys) June 12, 2023
Microsoft ZIP Files Link to heading
The
ZIP
command is the default archiving utility in Microsoft. Although not prevalent in Linux, ZIP files are still well-supported by using thezip
andunzip
commands.
- Although the same commands and options can be used interchangeably to do the
creation and extraction with
tar
andgzip
/gunzip
, this is not the case withzip
andunzip
. - The same option has different meanings for the two different commands.
Compression Link to heading
Add files/directories to a specific archive ([r]ecursively):
zip -r path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...
Remove files/directories from a specific archive ([d]elete):
zip -d path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...
Archive files/directories e[x]cluding specified ones:
zip -r path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ... -x path/to/excluded_files_or_directories
Archive files/directories with a specific compression level. -1=Fastest (Worst), -9=Slowest (Best); default, -6:
zip -r -9 path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...
Create an [e]ncrypted archive with an encrypted password entered by the user in a prompt:
zip -r -e path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...
Archive files/directories to a multi-part [s]plit zip file (e.g. 3 GB parts):
zip -r -s 3g path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ...
Print a specific archive contents:
zip -sf path/to/compressed.zip
Decompression Link to heading
Extract all files/directories from specific archives into the current directory:
unzip path/to/archive1.zip path/to/archive2.zip ...
Extract archives to a specific path:
unzip path/to/archive1.zip path/to/archive2.zip ... -d path/to/output
Extract files/directories from archives to stdout:
unzip -c path/to/archive1.zip path/to/archive2.zip ...
Extract the contents of the file(s) to stdout alongside the extracted file names:
unzip -O gbk path/to/archive1.zip path/to/archive2.zip ...
List the contents of a specific archive without extracting them:
unzip -l path/to/archive.zip
Extract a specific file from an archive:
unzip -j path/to/archive.zip path/to/file_in_archive1 path/to/file_in_archive2 ...
Advantages of Archiving and Compression Link to heading
- When making a large number of files available, such as the source code to an application or a collection of documents, it is easier for people to download a compressed archive than it is to download files individually.
- Log files have a habit of filling disks, so it is helpful to split them by date and compress older versions.
- When backing up directories, it is easier to keep them all in one archive than it is to version (update) each file.
- Some streaming devices, such as tapes, perform better if a stream of data is sent rather than individual files.
- It can often be faster to compress a file before sending it to a tape drive or over a slower network and decompress it at the other end than it would be to send it uncompressed.