In floating-point representation, we have two types. They are

• Float
• Double

## Float:

Variables declared with float datatype can able to store 32-bit decimal point value [e.g. 33.34], the representation of decimal-point in binary is completely different when compared with decimal to binary.

To represent the decimal-point value in a binary we need to follow a set of standard procedures given in “The IEEE 754 floating-point standard”.

The range of the float is -3.4E+38 to +3.4E+38. So as per the range, the float variable can able to store the max value of 3.4 x 10^38. Here E+38 = 10^38.

## IEEE floating-point representation for float:

For a variable with the float data type, the 32-bits are further divided into three parts. They are

• Sign (1bit)
• Exponent (8bits)
• Mantissa (23bits)

The below example shows the IEEE floating-point representation for a value `33.34`.

## Double:

The size of the double is 8 Bytes i.e 64-bits.

• The same IEEE 754 floating-point standard will be used to represent double values.
• The range of double is -1.7E+308 to  +1.7E+308. The double variable can store a maximum value of 1.7×10^308.
• The double is the same as a float but it provides an extended precision to store a larger value.

## IEEE floating-point representation for double:

For a variable with the double data type, the 64-bits are further divided into

• Sign (1 bit)
• Exponent (11 bits)
• Mantissa (52 bits)

For Example, Computer stores the given floating-point by following the procedure given in IEEE 754. Let’s see the procedure step by step

Step 1: Consider a decimal point value of 33.34.

• 33.34 is a positive value. so sign bit = 0

Step 2: Find the equivalent binary for the value left to the decimal point i.e 33.

Step 3: Now we should need to find the binary equivalent for the value 0.34. Repeat the procedure given below until we get the repeated value.

we got 100001.  010101110000101000111

Step 4: Left shift the decimal point to n-times until we reach the last ‘1’. We can also write it as below:

1.00001010101110000101000111 x 2^5

`Mantissa = 00001010101110000101000111`

Step 5: Now we need to bias the exponent value.

Exponent = (127 + 5) = 132

• The bias value will be calculated by using the formulae `(2^k) - 1`. Here K is the no of bits for exponent.
• For float, the bias value is `(2^8) - 1` i.e. 127.
• For double, the bias is `(2^11) - 1` i.e. 1023.

As we know, the value which we took was a positive value. so the

• sign bit contains 0
• exponent is 10000100
• The mantissa is 00001010101110000101000111.

## How to declare and define the float and double variable?

Use the keyword `float` before the variable, to specify the variable is of type float.

Example c program using float and double datatype

``````#include <stdio.h>

int main()
{
/* <datatype> <variable 1>,<variable 2>,<variable 3>...... <variable N>; */

/* Declaring a float variable */
float value;
/* Declaring a double variable */
double value2;
/* Defining value to a float variable */
value = 33.34f;
/* Defining a value to a double variable */
value2 = 33.34;
/* printing the value on to the screen */
printf("The float value is %f and double value is %lf", value, value2);
return 0;
}``````

Explanation:

• While defining the value of the float variable, we should need to mention “`f`” or “F” at the end else by default the value will be considered as `double`.
• `%f` is the type specifier which is used to specify the variable is of type `float`.
• `%lf` is the type specifier which is used to specify the variable is of type `double`.
• No type modifiers such as `signed` and unsigned are applicable to float and double variables.

Categorized in:

Tagged in: