Bits, Bytes and Words

A single Binary Digit is known as a bit, and can represent either the value 0 or 1. Bits can be implemented in computer hardware using switches. A simple way to think about it is, that If the switch is on then the bit is 1 and if the switch is off then the bit is 0.  Of course, the underlying electrical engineering is much more involved than this, but this is the essential idea.

A byte is a sequence of bits. Since the mid 1960's a byte has been standarized to be 8 bits in length. 01000001 is an example value that may be represented by a single byte. Since there are 8 bits in a byte there are 28 different possible sequences for one byte, ranging from 00000000 to 11111111. This means that a byte can be used to represent 28, or  256 distinct values.  Like bits, bytes too may be used in sequence to allow for more possibilities.  Two contiguous bytes allows for  216, or  65,536 distinct values to be represented, while 4 contiguous bytes allows for 232, or  4,294,967,296 distinct values.

Computer keyboards have a limited number of keys and even with the multiple values provided by the use of the Shift, Alt, Ctrl and other such keys, the number of distinct key values is less than 256.  Thus, every keystroke may be represented by a unique one byte binary value. 

Since each character (letters, decimal digits and special characters such as punctuation marks, etc) can be represented by a single byte value, a standard is needed to insure that the mapping of characters to byte values is consistent across computer systems.  There are two standard codes that use one byte to represent a character, ASCII (ass'-key) and EBCDIC (ib'-suh-dik). ASCII, the American Standard Code for Information Interchange, is the code that is most commonly used today. EBCDIC, Extended Binary Coded Decimal Interchange Code, was used by IBM on its large mainframe computers in the past. Since these codes are limited to 256 possible combinations, certain character sets, such as Chinese, Arabic, Japanese, Klingon and others, cannot be represented using these codes. This problem has been solved by developing another standard, Unicode, which uses 2 bytes for each character. This extension allows 216 different symbols to be represented, a total of 65,536. The use of Unicode provides for international standardization and uniformity but consumes twice the amount of computer resources.  Each character requires two bytes of memory to be represented and likewise twice the number of bytes need to be transmitted across communication channels.

The term word refers to the standard number of bits that are manipulated as a unit by any particular CPU. For decades most CPUs had a word size of 32 bits (or 4 contiguous bytes), but word sizes of 64 bits are becoming more and more commonplace. The signifcance of the word size of a particular computer system is that it reflects the amount of data that can be transmitted between memory and the processor in one chunk.  Likewise, it may reflect the size of data that can be manipulated by the CPU's ALU in one cycle.  Computers can process data of larger sizes, but the word size reflects the size of the data values the computer has been designed to readily process directly.  All other things being equal, (and they never are), larger word size implies faster and more capable processing.

Further Information

Wikipedia contains exhaustive information about bits, bytes and words.

howSTUFFworks also has an article on (How Bits and Bytes Work)