

The ANSI standard for C has a provision that allows expressions to be evaluated in singleprecision arithmetic if there is no double (or long double) operand in the expression. The C compiler supports this provision.
Floating point constants are doubleprecision, unless explicitly stated to be float. For example, in the statements
float a,b; ... a = b + 1.0;because the constant 1.0 has type double, b is promoted to double before the addition and the result is converted back to float. However, the constant can be made explicitly a float:
a = b + 1.0f;or
a = b + (float) 1.0;In this case, the statement can potentially be compiled to a single instruction. Singleprecision operations tend to be faster than doubleprecision operations.
Whether a computation can be done in singleprecision is decided based on the operands of each operator. Consider the following:
float s; double d;s * s is computed to produce a singleprecision result, which is promoted to doubleprecision and added to d. Note that using singleprecision (as versus doubleprecision) arithmetic can result in loss of precision, as illustrated in the following example.d = d + s * s;
float f = 8191.f * 8191.f; /* evaluate as a float */ double d = 8191. * 8191. ; /* evaluate as a double */ printf ("As float: %f\nAs double: %f\n", f, d);The result is:
As float: 67092480.000000 As double: 67092481.000000Also, long int variables (same as int) have more precision than float variables. Consider the following example:
int i,j; i = 0x7ffffff; j = i * 1.0; printf("j = %x\n", j); j = i * 1.0f; printf("j = %x\n", j);The first printf() statement outputs
7ffffff
, while the second prints 0
.
The second printf() prints 0
because
the nearest float to 0x7fffffff has a value of 0x80000000.
When the value is converted to an integer, the result is 0,
and a floating point imprecise result exception occurs. A
trap occurs if this exception was enabled.
A function that is declared to return a float may actually return either a float or a double. If the function declaration is a prototype declaration in which at least one of the parameters is float, the function returns a float. Otherwise, it returns a double with precision limited to that of a float. (All of this is transparent.) For example:
float retflt(float); /* actually returns a float */ float retdbl1(); /* actually returns a double */ float retdbl2(int); /* actually returns a double */Arguments work as follows:
double takeflt(float x); /* takes a float */double takedbl(x) float x; /* takes a double */