A Basic Guide to Data Compression

A Basic Guide to Data Compression

Data is all around us. Every image you see, every video you watch, every song you hear, and every document you read. Everything is some form of data. Considering the current day and age, the need for data is even greater than before. All organizations use data, from digital marketing companies to large corporations to your local pizza joint. Even downloading a movie from YouTube or Netflix means they’re sharing data with you. But challenges arise when transferring data, especially huge sets of data.

This is where data compression makes an entry. As the name suggests, data compression is a means to compress data. The level of compression depends on the requirements of the user. But the concept of data compression remains the same.

In this article, we’ll see what data compression is about and what the types of data compression are. This article will help those of you who’re interested in learning data compression, either as a hobby or as a career.

What is Data Compression?

As said earlier, data compression is the process of compressing a piece or set of data. The objective here is to reduce the size of the file without sacrificing quality.

People use data compression for various reasons. One reason is easy sharing. Suppose you want to share a 100-GB file. You cannot send this file via email; You’ll have to use an online file sharing service. Even when using an online file sharing service, you will use up a lot of time uploading the file. In situations like this, data compression comes in handy. Reducing the size of the file from 100GB to something around 60 or 75 GB makes data upload and download times bearable.

Data compression does have its limitations, though. You cannot keep compressing the file. Once you hit the limit, you cannot compress the file any further. Even if you did, the quality of the compressed data takes a hit.

Data compression has 2 types.

  1. Lossy Compression
  2. Lossless Compression

We’ll discuss them in detail.

Lossy Compression

In lossy compression, the size of a file is reduced by removing unnecessary bits of information. Lossy data compression is pretty common when compressing audio, video, and image files. Formats that are popular when it comes to lossy compression are MP3 and JPEG.

We’ll consider the 2 formats mentioned above to further understand lossy compression.

When compressing JPEG files, parts that aren’t vital to the image are removed. A good example would be removing the multiple shades of a color in an image and sticking to the basic shade to compensate for the smaller size.

The same goes for MP3 as well. When compressing an MP3 file, audio bits that humans cannot hear are omitted. This results in an MP3 file with a lower size and without loss of quality. The bits that were removed during compression don’t affect the user’s listening experience. However, the more an MP3 file is compressed, the more the loss in audio quality becomes apparent.

Lossy compression is great for general data compression. But it isn’t recommended for tasks where every bit of information is vital. For example, photographers who expect sharper colors and fewer jagged edges on their images wouldn’t prefer lossy compression to reduce the size of their images.

Lossless Compression

Lossless compression refers to a type of data compression where there’s no loss of information when compressing a file. Unlike lossy compression, lossless compression retains all bits of a file when compressing, whether they are important or unimportant. Lossless compression achieves this by eliminating redundancy.

Say, for example, that you want to compress the string “AAAAAAABBCCCCCCCCCDDD.” In lossless compression, the repeating data, in this case, the repeating strings, are grouped together based on their similarity. This means that instead of allocating separate space for each string, the repeating strings are put together based on similar properties. So, the string “AAAAAAABBCCCCCCCCCDDD” will be stored as “A7B2C9D3.” This technique saves a lot of space without losing any data.

A common example of lossless data compression is zip files. Data compressed in zip files is often important or contains executable files. Zip files are often extracted using WinRAR, but there are alternatives as well.

Nobody likes losing bits of vital data when compressing. Also, trying to compress executables using lossy compression corrupts the executable file. This is mainly because all parts of an executable file are required for proper functioning. Missing bits of data in an executable file render it unusable.

You now know the basics of data compression. What’s discussed in this article is just the tip of the iceberg. When talking about data compression, there are algorithms, various tools used, multiple data compression techniques, and more.

Data compression is a vital part of everyday life. You may not know it, but many websites, especially audio and video streaming websites, rely on data compression for quick transfer and use of data. Since its earliest instance in 1838 in the form of Morse code, data compression has come a long way. One can expect more changes in the years to come.