How Numbers are Stored

Numbers, as stored in a computer, are of two basic types: fixed point and floating point

Fixed point is a term used to refer to whole numbers (numbers without a decimal point) or integers. Floating point is a term used to refer to numbers that have a decimal point. Thus from the point of view of computing the number 45 is different from the number 45.0. Intuitively fixed point numbers are used to count and floating point numbers are used to measure.

Numbers are stored using the binary number system with a fixed size. For purposes of illustration, I will assume a fixed size of 32 bits.

Storage of Integer (Fixed Point) Numbers

With a 32 bit number size the first bit is used for the sign with a positive value represented by a zero and a negative value represented by a 1. The remaining 31 bits are used to store the binary representation of the number. As we saw in the binary number system link the decimal number 91 has a binary representation of 1011011. This representation takes 7 places. So with a 32 bit representation of an integer we need 24 more digits which gives us 0000000000000000000000001011011. The representation of -91 is more complicated. It uses a method called twos complement. If negative numbers are represented in twos complement form then computers don't need circuitry to perform subtraction; they can just add a negative value instead. If you are curious about twos complement Wikipedia has an article (link). The representation of -91 is 11111111111111111111111110100101.

Since we only have 31 digits to represent a whole number there is a limit in size to the whole numbers that can be represented. The largest whole number is 01111111111111111111111111111111. This converts into the decimal value 2,147,483,647. The smallest value is -2,147,483,648. An easier to remember form is that the largest 32 bit integer is 231 - 1. Note that the exponent is one less than the number of bits. If you wish to represent an integer larger than this value you will have to use other means.

Storage of Real (Floating Point) Numbers

Floating point numbers are stored in exponent notion. For example the number 214.7 in exponent notation is 0.2147 x 103. The 0.2147 is called the significand (or fractional part) and the 3 is the exponent. The number 12.75 in decimal converts to binary as 1100.11 so in exponent notation it would be 0.110011 x 24, with significand of 110011 and exponent of 4 (100 in binary). To represent this number in a 32 bit word would use the first bit for the sign, the next 8 bits for the exponent, and the remaining 23 bits for the significand. This type of representation is sometimes referred to as single precision. Since the significand is limited to 23 bits, any significand that requires more than 23 bits can not be represented exactly. Most decimal numbers require more than 23 binary digits to represent which means that most floating point numbers are approximated on a computer. For example the decimal number 0.2 when converted to binary is 0.001100110011... with the 0011 pattern repeating forever. Single precision gives approximately 7 significant digits of approximation. When arithmetic is performed the precision of the result tends to get worse. To combat this problem many computes allow floating point numbers to be represented in a longer form. Double precision representation uses 64 bits to store a real number, with one bit for the sign, 11 bits for the exponent, and 52 bits for the significand. This works out to approximately 16 significant digits of precision.

Floating point numbers also have a limit on the size that can be stored. For single precision the limit is approximately 10-38 to 1038. For double precision the limit is approximately 10-308 to 10308.

More details

Floating Point
Fixed Point