GSoC #5: Sum Problem of Fixed-Point Type

The original blog post is here.

The Problem

If you want to sum up several `fixbv` fixed-point variables, what will you do?

``````a + b + c + d
``````

It seems to be correct but there would be a trap on the bit width.

Here we use `(total_width, integer_width, fractional_width)` to indicate the bit width of the `fixbv` variable, its integer part, and its fractional part.

As written in MEP-111, if two fixed-point variables are added, the fractional part of the result should be the longest of the operands; and the integer part of the result should be the longest plus one, in order to avoid overflow.

This is an example of adding a variable in format `(8, 3, 4)` and another variable in format `(8, 0, 7)`:

``````  siii.ffff000
+ ssss.fffffff
--------------
siiii.fffffff
``````

We can see that the first operand has been added zeros in the tail of the fractional part, and the second operand has been added sign bits in the beginning, in order to perform point alignment.

A Critical Example

Considering the following situation:

``````>>> a.format
(16, 4, 11)
>>> b.format
(16, 4, 11)
>>> c.format
(8, 3, 4)
>>> d.format
(8, 7, 0)
``````

If we add like `a + b + c + d`, the format of the final result will be:

`````` (16, 4, [11])+(16, 4, [11])+(8, 3, [4])+(8, 7, [0])
=(17, 5, [11])+(8, 3, [4])+(8, 7, [0])
=(18, 6, [11])+(8, 7, [0])
=(20, 8, [11])
``````

However, if we add in `d + a + b + c`, the result should be:

`````` (8, 7, [0])+(16, 4, [11])+(16, 4, [11])+(8, 3, [4])
=(20, 8, [11])+(16, 4, [11])+(8, 3, [4])
=(21, 9, [11])+(8, 3, [4])
=(22, 10, [11])
``````

Assignments

If we use the assignment like:

``````x = a + b + c + d
``````

this problem should be consider. But, if we have already defined the format of x and perform the operation as

``````x[:] = a + b + c + d
``````

it would be safer because the format of `x` has already been decided. However, rounding and overflow must be considered if necessary.

A New Sum Function for Fixed-Point Variables

So here, in order to make the result unique in different orders, a new function `fxsum` is needed in this theme. The usage is similar to built-in `sum`:

``````fxsum(iterable, start=0)
``````

`fxsum` requires an iterable parameter containing `fixbv` instances, and returns the sum as a `fixbv` variable. The width of the fractional part would be the longest in the iterable, while the width of the width of the integer part is the longest plus `ceil(log(N, 2))`, in which `N` is the width of the longest integer part.

Instead of using `a + b + c + d`, we could write

``````fxsum((a, b, c, d))
``````

to avoid ambiguity of bit width in the result.

As a result, in the previous example, we could obtain that

``````>>> fxsum((a, b, c, d)).format
(21, 9, [11])
``````