DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
Floating point operations

# Single-precision floating point operations

The ANSI standard for C has a provision that allows expressions to be evaluated in single-precision arithmetic if there is no double (or long double) operand in the expression. The C compiler supports this provision.

Floating point constants are double-precision, unless explicitly stated to be float. For example, in the statements

```   float a,b;
...
a = b + 1.0;
```
because the constant 1.0 has type double, b is promoted to double before the addition and the result is converted back to float. However, the constant can be made explicitly a float:
```   a = b + 1.0f;
```
or
```   a = b + (float) 1.0;
```
In this case, the statement can potentially be compiled to a single instruction. Single-precision operations tend to be faster than double-precision operations.

Whether a computation can be done in single-precision is decided based on the operands of each operator. Consider the following:

```   float s;
double d;

d = d + s * s;
```
s * s is computed to produce a single-precision result, which is promoted to double-precision and added to d. Note that using single-precision (as versus double-precision) arithmetic can result in loss of precision, as illustrated in the following example.
```   float f  = 8191.f * 8191.f; /* evaluate as a float  */
double d = 8191.  * 8191. ; /* evaluate as a double */
printf ("As float:  %f\nAs double: %f\n", f, d);
```
The result is:
```   As float: 67092480.000000
As double: 67092481.000000
```
Also, long int variables (same as int) have more precision than float variables. Consider the following example:
```   int i,j;
i = 0x7ffffff;
j = i * 1.0;
printf("j = %x\n", j);
j = i * 1.0f;
printf("j = %x\n", j);
```
The first printf() statement outputs `7ffffff`, while the second prints `0`. The second printf() prints `0` because the nearest float to 0x7fffffff has a value of 0x80000000. When the value is converted to an integer, the result is 0, and a floating point imprecise result exception occurs. A trap occurs if this exception was enabled.

A function that is declared to return a float may actually return either a float or a double. If the function declaration is a prototype declaration in which at least one of the parameters is float, the function returns a float. Otherwise, it returns a double with precision limited to that of a float. (All of this is transparent.) For example:

```   float retflt(float);        /* actually returns a float  */
float retdbl1();            /* actually returns a double */
float retdbl2(int);         /* actually returns a double */
```
Arguments work as follows:
```   double takeflt(float x);    /* takes a float  */

double takedbl(x)
float x;                    /* takes a double */
```

Next topic: Double-extended-precision
Previous topic: Exceptions, sticky bits, and trap bits