The Size of Data

The Size of Data

I frequently get questions about data sizes, bit rate and bandwidth and data speeds, particularly with regard to both media (user data, photos, sound and video) as well as internet usage.  A bit is just a piece of information in a computer equivalent to the status of a light switch; is it on or off?  If it’s on, it’s stored as a 1 in the computer’s head.  If it’s off, it’s a 0.  All data is manipulated and stored in this way, as 1s and 0s.   It is the computer’s job to use all these “light switches” to manipulate (calculate), store and retrieve the information that they contain.

Now for some definitions:

1 byte = 8 bits

1 kilobyte (Kb) = 1024 bytes = about 8 thousand bits

1 megabyte (Mb) = 1024 kilobytes = 1 million bytes = 8 million bits

1 gigabyte (Gb) = 1024 megabytes = 1 billion bytes = 8 billion bits

1 terabyte (Tb) = 1024 gigabytes = a million megabytes = a thousand billion bytes – 8000 billion bits

As you can see, each multiple goes up by a factor of 1024 which is pretty close to a thousand.  If we don’t mind rounding off a bit, it gets pretty easy to approximate conversions from kilobytes to gigabytes and back again.  For example, if you wanted to know how many 5 Mb (megabyte) pictures could fit on a 1 Tb (terabyte) hard drive, you just divide them in, ie., a million megabytes divided by 5 equals 200,000 pictures.  If instead you were storing 700 Mb movies that you pirated off the net, you could only get about 1400 movies on there.  If they were 7 Gb Blu-Ray movie rips, you could only get about 140 on to that same 1 Tb hard drive.

As an aside, your hard drive manufacturer very misleadingly uses an even billion bits to define a gigabyte, when it is really 1,073,741,824 bits, at least as far as the computer is concerned.  That is why a terabyte hard drive only really formats out to 931 gigabytes; it wasn’t really a terabyte in the first place.   (1,000,000,000,000 bits divided by 1,073,741,824 bits per gigabyte equals 931 Gb).  You can now see that my previous examples are actually estimated a little too high in practical use.  Oh well.

Typical Files Sizes:

Word Document, in fact this document                 20 kilobytes

MP3 song file, 4 minutes, 320 kbps                          8-10 megabytes

Web cam JPG photo, 640 X 480                                  50 kilobytes

High end camera set on Med JPG                             3-6 megabytes

Same photo in RAW format                                         20 megabytes or more

90 minute DVD movie in AVI                                       700 to 2200 megabytes

Same movie ripped from Blu-Ray                             2 – 25 gigabytes (4 gb is optimal for me)

 

 

Bitrate (or Bit Rate, take yer pick)

1 bit per second is the idea that only one of these switches can be known or manipulated per second.  The English alphabet requires the use of 8 bits to represent each letter, so if a computer was only capable of 1 bps and you wanted to store your name on the computer, it could take 8 seconds X the number of letters in your name to do so.  Fortunately, computers are MUCH faster than that.

We care about bitrate because it is determinant of performance.  How fast is our internet connection or that big download?  How good is the sound I am listening to?  How sharp is the picture on the screen while watching a Blu-Ray?  These are all affected by the bitrates involved.  As a very general rule, the higher the bitrate, the better the performance.  Also, it is important to understand that these rates are always specified in bits, where actual data sizes are almost always in bytes.

 

Bitrate Examples

Audio

A retail audio CD can contain up to about 650 Mb (megabytes, 5,452,595,200 bits) of data, and usually runs up to about 70 minutes.  If we reverse calculate the real bitrate, it comes out to: 5,452,595,200 bits divided by 4200 seconds  = 1.3 megabits per second, mbps.

Going the other direction, Audio CDs are created at a sampling rate of 44,100 hz, at 16 bits per sample, times 2 channels for stereo.  I know, I know, that all sounds pretty complicated, but if we multiply it all out for 70 minutes of run time, it comes to a total of 5,927,040,000 total bits created, so we are at least in the ballpark.  The point is, audio playback from CD runs at about 1.3 mbps.

MP3 files are created according to an encoding bitrate as well.  First all of the data is extracted from the Audio CD in a process known as “ripping”.  Then that raw data (usually in WAV format) is “encoded” to a bitrate that will determine the final audio quality.  Encoding is the process by which the software determines which bits of the raw data to keep and discard and then compress the remaining data into MP3 format.

320 kbps – Is almost indistinguishable from the original CD, the new standard bitrate.

128 kbps – Lowest reasonable audio quality for music, used to be typical for MP3s ten years ago.

64 kbps – Not recommended for music, but good for speech, audio books.

32 kbps – Poor quality, often used for web audio and internet radio stations.

At 320 kbps, MP3 files take up about a quarter as much space as a raw rip.  Audio purists swear by a format called FLAC, which represents 100% data ripped from the CD.  In other words, a true FLAC rip will take just as much space on your hard drive as the original disk would, plus you have to have FLAC capability on your playback software or devices.

 

Video

Video quality varies tremendously with bitrates used to create the file.   Blu-Ray disks typically play back at approximately 40 mbps, DVD plays back at 10 mbps.  Rips are actually at a much lower bitrate.  If you download a pirated Blu-Ray rip that is 6 gigabytes in size, and it is exactly 2 hours long, the playback bitrate works out to a little over 7 mbps.  A typical 700 Mb AVI file that runs 90 minutes works out to 1 mbps.  The lower you go, the worse the picture gets.

 

Data Transfer, Internet Speed

When it comes to data transfer, it is important to distinguish between bandwidth and throughput.  I see some discrepancies on the net about how these two terms are applied, but generally bandwidth is the size of the pipe and throughput is how much is actually going through it.  Bandwidth is usually expressed in multiples of bits/second while throughput is usually in multiples of bytes/second.

For example, you may have purchased a 5 megabit connection from your internet service provider.  That means that it is theoretically capable of transferring 5 megabits of information every second.  In practical terms though, it usually only works out to about 80% of that, and it’s expressed in bytes per second.  5 mbps bandwidth works out to maybe 480 kilobytes per second throughput.  Under those conditions, your connection is saturated or entirely used up in the transfer such that you can’t actually do anything else with it.  If you can’t surf the net at home, or it is super slow, you need to go through the house and see what devices are downloading what.  If your two teens are both downloading movies at the same time, and they have not configured their torrent clients properly, they will suck the pipe dry and you won’t be able to check your email.

 

USB Connections, Hard Drives

There are basically three flavors of USB, versions 1.1, 2.0 and 3.0, they run at 12, 480, and 5000 megabits/second respectively.  However, USB carries a lot of data overhead with it, so typical throughputs are much lower.  Your USB 2.0 flash drive (the standard on the vast majority of computers) can theoretically transfer stuff at 60 megabytes/second maximum but in reality it will be much closer to 10-12 MB/second.  This also applies to your external hard drive.  If you only have USB 2.0, it can only transfer about 10-12  megabytes/second to the drive, regardless of how fast the drive itself may be.

Most external drives are now sold with USB 3.0 capability, which should give you up to about  10 X the throughput, so maybe 100 MB/s throughput.  You do have to have USB 3.0 on your machine though, they are the blue ports.  Black ports are only USB 2.0, save those for your mouse, keyboard, printers etc.

The SATA interface that a hard drive uses inside a modern computer runs at either 3000 megabits/second or 6000 megabits/second and drives are rated for either one or the other standard.  The SATA bus is much more efficient than is USB, so practical throughput is much closer to optimal.  A 6000 megabit/second hard drive can deliver 600 megabytes/second throughput, where maximum theoretical would be 750 megabytes/second.

Posted in Tech Ed
Archives