In floating-point representation, we have two types. They are

• Float
• Double

## Float:

Variables declared with float datatype can able to store 32-bit decimal point value [e.g. 33.34], the representation of decimal-point in binary is completely different when compared with decimal to binary.

To represent the decimal-point value in a binary we need to follow a set of standard procedures given in "IEEE 754 floating-point standard".

The range of the float is -3.4E+38 to +3.4E+38. So as per the range, the float variable can able to store the max value of 3.4 x 10^38. Here E+38 = 10^38.

## IEEE floating-point representation for float:

For a variable with the float data type, the 32-bits are further divided into three parts. They are

• Sign (1bit)
• Exponent (8bits)
• Mantissa (23bits)

The below example shows the IEEE floating-point representation for a value 33.34.

## Double:

The size of the double is 8 Bytes i.e 64-bits.

• The same IEEE 754 floating-point standard will be used to represent double values.
• The range of double is -1.7E+308 to  +1.7E+308. The double variable can store a maximum value of 1.7x10^308.
• The double is the same as a float but it provides an extended precision to store a larger value.

## IEEE floating-point representation for double:

For a variable with the double data type, the 64-bits are further divided into

• Sign (1 bit)
• Exponent (11 bits)
• Mantissa (52 bits)

For Example, Computer stores the given floating-point by following the procedure given in IEEE 754. Let's see the procedure step by step

Step 1: Consider a decimal point value of 33.34.

• 33.34 is a positive value. so sign bit = 0

Step 2: Find the equivalent binary for the value left to the decimal point i.e 33.

Step 3: Now we should need to find the binary equivalent for the value 0.34. Repeat the procedure given below until we get the repeated value.

we got 100001.  010101110000101000111

Step 4: Left shift the decimal point to n-times until we reach the last '1'. We can also write it as below:

1.00001010101110000101000111 x 2^5

Mantissa = 00001010101110000101000111

Step 5: Now we need to bias the exponent value.

Exponent = (127 + 5) = 132

• The bias value will be calculated by using the formulae (2^k) - 1. Here K is the no of bits for exponent.
• For float, the bias value is (2^8) - 1 i.e. 127.
• For double, the bias is (2^11) - 1 i.e. 1023.

As we know, the value which we took was a positive value. so the

• sign bit contains 0
• exponent is 10000100
• The mantissa is 00001010101110000101000111.

## How to declare and define the float and double variable?

Use the keyword float before the variable, to specify the variable is of type float.

Example c program using float and double datatype

#include <stdio.h>

int main()
{
/* <datatype> <variable 1>,<variable 2>,<variable 3>...... <variable N>; */
float value;        /* Declaring a float variable */
double value2;  /* Declaring a double variable */
value = 33.34f;    /* Defining value to a float variable */
value2 = 33.34;  /* Defining a value to a double variable */
/* printing the value on to the screen */
printf("The float value is %f and double value is %lf", value, value2);
return 0;
}

Explanation:

• While defining the value to the float variable, we should need to mention "f" or "F" at the end else by default the value will be considered as double.
• %f is the type specifier which is used to specify the variable is of type float.
• %lf is the type specifier which is used to specify the variable is of type double.
• No type modifiers such as signed and unsigned are applicable to float and double variable.