In floating-point representation, we have two types. They are
- Float
- Double
Float:
Variables declared with float datatype can able to store 32-bit decimal point value [e.g. 33.34], the representation of decimal-point in binary is completely different when compared with decimal to binary.
To represent the decimal-point value in a binary we need to follow a set of standard procedures given in “The IEEE 754 floating-point standard”.
The range of the float is -3.4E+38 to +3.4E+38. So as per the range, the float variable can able to store the max value of 3.4 x 10^38. Here E+38 = 10^38.
IEEE floating-point representation for float:
For a variable with the float data type, the 32-bits are further divided into three parts. They are
- Sign (1bit)
- Exponent (8bits)
- Mantissa (23bits)
The below example shows the IEEE floating-point representation for a value 33.34
.
Double:
The size of the double is 8 Bytes i.e 64-bits.
- The same IEEE 754 floating-point standard will be used to represent double values.
- The range of double is -1.7E+308 to +1.7E+308. The double variable can store a maximum value of 1.7×10^308.
- The double is the same as a float but it provides an extended precision to store a larger value.
IEEE floating-point representation for double:
For a variable with the double data type, the 64-bits are further divided into
- Sign (1 bit)
- Exponent (11 bits)
- Mantissa (52 bits)
For Example, Computer stores the given floating-point by following the procedure given in IEEE 754. Let’s see the procedure step by step
Step 1: Consider a decimal point value of 33.34.
- 33.34 is a positive value. so sign bit = 0
Step 2: Find the equivalent binary for the value left to the decimal point i.e 33.
Step 3: Now we should need to find the binary equivalent for the value 0.34. Repeat the procedure given below until we get the repeated value.
we got 100001. 010101110000101000111
Step 4: Left shift the decimal point to n-times until we reach the last ‘1’. We can also write it as below:
1.00001010101110000101000111 x 2^5
Mantissa = 00001010101110000101000111
Step 5: Now we need to bias the exponent value.
Exponent = (127 + 5) = 132
- The bias value will be calculated by using the formulae
(2^k) - 1
. Here K is the no of bits for exponent. - For float, the bias value is
(2^8) - 1
i.e. 127. - For double, the bias is
(2^11) - 1
i.e. 1023.
As we know, the value which we took was a positive value. so the
- sign bit contains 0
- exponent is 10000100
- The mantissa is 00001010101110000101000111.
How to declare and define the float and double variable?
Use the keyword float
before the variable, to specify the variable is of type float.
Example c program using float and double datatype
#include <stdio.h>
int main()
{
/* <datatype> <variable 1>,<variable 2>,<variable 3>...... <variable N>; */
/* Declaring a float variable */
float value;
/* Declaring a double variable */
double value2;
/* Defining value to a float variable */
value = 33.34f;
/* Defining a value to a double variable */
value2 = 33.34;
/* printing the value on to the screen */
printf("The float value is %f and double value is %lf", value, value2);
return 0;
}
Explanation:
- While defining the value of the float variable, we should need to mention “
f
” or “F” at the end else by default the value will be considered asdouble
. %f
is the type specifier which is used to specify the variable is of typefloat
.%lf
is the type specifier which is used to specify the variable is of typedouble
.- No type modifiers such as
signed
and unsigned are applicable to float and double variables.
Hi paperbun.org admin, Thanks for the great post!