In floating-point representation, we have two types. They are

  • Float
  • Double

Float:

Variables declared with float datatype can able to store 32-bit decimal point value [e.g. 33.34], the representation of decimal-point in binary is completely different when compared with decimal to binary.

To represent the decimal-point value in a binary we need to follow a set of standard procedures given in “The IEEE 754 floating-point standard”.

The range of the float is -3.4E+38 to +3.4E+38. So as per the range, the float variable can able to store the max value of 3.4 x 10^38. Here E+38 = 10^38.

IEEE floating-point representation for float:

For a variable with the float data type, the 32-bits are further divided into three parts. They are

  • Sign (1bit)
  • Exponent (8bits)
  • Mantissa (23bits)

The below example shows the IEEE floating-point representation for a value 33.34.

Double:

The size of the double is 8 Bytes i.e 64-bits.

  • The same IEEE 754 floating-point standard will be used to represent double values.
  • The range of double is -1.7E+308 to  +1.7E+308. The double variable can store a maximum value of 1.7×10^308.
  • The double is the same as a float but it provides an extended precision to store a larger value.

IEEE floating-point representation for double:

For a variable with the double data type, the 64-bits are further divided into

  • Sign (1 bit)
  • Exponent (11 bits)
  • Mantissa (52 bits)

For Example, Computer stores the given floating-point by following the procedure given in IEEE 754. Let’s see the procedure step by step

Step 1: Consider a decimal point value of 33.34.

  • 33.34 is a positive value. so sign bit = 0

Step 2: Find the equivalent binary for the value left to the decimal point i.e 33.

Step 3: Now we should need to find the binary equivalent for the value 0.34. Repeat the procedure given below until we get the repeated value.

we got 100001.  010101110000101000111

Step 4: Left shift the decimal point to n-times until we reach the last ‘1’. We can also write it as below:

1.00001010101110000101000111 x 2^5

Mantissa = 00001010101110000101000111

Step 5: Now we need to bias the exponent value.

Exponent = (127 + 5) = 132

  • The bias value will be calculated by using the formulae (2^k) - 1. Here K is the no of bits for exponent.
  • For float, the bias value is (2^8) - 1 i.e. 127.
  • For double, the bias is (2^11) - 1 i.e. 1023.

As we know, the value which we took was a positive value. so the

  • sign bit contains 0
  • exponent is 10000100
  • The mantissa is 00001010101110000101000111.

How to declare and define the float and double variable?

Use the keyword float before the variable, to specify the variable is of type float.

Example c program using float and double datatype

#include <stdio.h>

int main()
{
   /* <datatype> <variable 1>,<variable 2>,<variable 3>...... <variable N>; */

    /* Declaring a float variable */
    float value;      
    /* Declaring a double variable */  
    double value2;  
    /* Defining value to a float variable */
    value = 33.34f;    
     /* Defining a value to a double variable */
    value2 = 33.34; 
    /* printing the value on to the screen */
    printf("The float value is %f and double value is %lf", value, value2);    
    return 0;
}

Explanation:

  • While defining the value of the float variable, we should need to mention “f” or “F” at the end else by default the value will be considered as double.
  • %f is the type specifier which is used to specify the variable is of type float.
  • %lf is the type specifier which is used to specify the variable is of type double.
  • No type modifiers such as signed and unsigned are applicable to float and double variables.

Categorized in:

Tagged in: