Using decorators for common hardware structures

This is my first non-trivial myhdl program. I made two decorators to automate the myhdl evaluation step and reduce code duplication. The first decorator, @chain, connects hardware blocks serially with one input and one output per block. A simple use for this is in a vga timer. The vga timer takes a clk and enable in and outputs the vga timing signals along with an x and y position which is used later for deciding what color that pixel is. The vga timer also outputs h_refresh and v_refresh, two one clk wide pulses that are sent out when the horizontal and vertical counters, respectively, overflow. First I wrote something like this:

@block
def vga_timer(clk, en, x, y, h_refresh, v_refresh, h_sync, v_sync, video_on):
    
    line_width =  # differs by resolution
    num_lines =  # differs by resolution
    # h and v count when the screen is displaying as well as during the blanking period
    h = Signal(intbv(0, min=0, max=line_width))
    v = Signal(intbv(0, min=0, max=num_lines))
    
    @always(clk.posedge)
    def count_hv():
        if en:
            h_refresh.next = 0
            v_refresh.next = 1
            h.next = h + 1
            if h == line_width - 1:
                h.next = 0
                h_refresh.next = 1
                v.next = v + 1
                if v.next == num_lines - 1:
                    v.next = 0
                    v_refresh.next = 1
    
    @always_comb
    def output_xy():
        x.next = h[len(x):]
        y.next = h[len(y):]
    
    # logic for h_sync, v_sync, and video_on left out because it's irrelevant to this post

This works just fine for two counters that chain into each other, but if for some reason 5 or more counters were needed then there would be a lot of code duplication. For this reason I wrote a counter block that takes a clk, enable, and count signal in and outputs a pulse when count overflows.

@block
def counter(clk, count, max_value, en=True, pulse=None):

Now count_hv can be replaced with instances of counter

h_counter = counter(clk, h, line_width, en=en, pulse=h_refresh)
v_counter = counter(clk, h, num_lines, en=h_refresh, pulse=v_refresh)

This solution also allows the use of n counters with only a list of signals for each input or output of counter. I have found a list of chained instances to be a very common pattern. It is annoying to have to type the same pattern for a chain of small blocks especially when only the global input and output matter.

in_sigs = [global_in] + [Signal(False) for _ in range(1, num_sigs)]
in_sigs = [Signal(False) for _ in range(1, num_sigs)] + [global_out]
small_blocks = [small_block(insigs[i], out_sigs[i]) for i in range(1, num_sigs)]

Python is great at removing code duplication, thankfully. My solution for the problem is the decorator @chain. Here’s how @chain is used in the vga timer.

@chain('en', 'pulse')
@block
def counter(clk, count, max_value, en=True, pulse=None):

counter_insts = counter(clk, [h, v], [line_width, num_lines], en=en, pulse=[h_refresh, v_refresh], chain_length=2)

Lone arguments like clk are given to every instance of counter. Any list inputs like count are distributes so each counter instance gets count[i]. List arguments must have chain_length elements. The @chain decorator takes two positional inputs chain_in and chain_out (‘en’ and ‘pulse’ in this case). The outputs are connected inside the decorator to the previous inputs. If the chain_in or chain_out arguments aren’t lists or any of the elements of the lists are None then signals of type chain_type. chain_type is the final argument to chain and is a named input which defaults to False, so it wasn’t included in this example.

In the future I would like to make chain_length be implied (unless there are no list arguments) and allow the argument for chain_out (pulse in this case) be None. I’m also considering allowing custom connection logic so that multiple chain_in’s and chain_out’s can be set. Here’s the full code for @chain which still needs to be cleaned up and refactored.

def chain(chain_in, chain_out, chain_type=False):
    def chain_decorator(func):
        @block
        def chain_wrapper(*args, chain_length=1, **kwargs):
            assert chain_length > 0
            if chain_length == 1:
                return func(*args, **kwargs)

            lone_input = True
            lone_output = True
            c_in = kwargs.get(chain_in)
            c_out = kwargs.get(chain_out)
            chain_in_sigs = []
            if isinstance(c_in, list):
                lone_input = False
                chain_in_sigs = kwargs.pop(chain_in)
                assert len(chain_in_sigs) == chain_length, 'list arguments must match the chain length'
                for i in range(chain_length):
                    if chain_in_sigs[i] is None:
                        chain_in_sigs[i] = Signal(chain_type)
            else:
                chain_in_sigs = [Signal(chain_type) for _ in range(chain_length)]

            chain_out_sigs = []
            # chain_in_sigs and chain_out_sigs can be input as either 1 signal, the input of output of the entire
            # system, or as a list of signals. any missing or None signals are added
            if isinstance(c_out, list):
                lone_output = False
                chain_out_sigs = kwargs.pop(chain_out)
                assert len(chain_out_sigs) == chain_length, 'list arguments must match the chain length'
                for i in range(chain_length):
                    if chain_out_sigs[i] is None:
                        chain_out_sigs[i] = Signal(chain_type)
            else:
                chain_out_sigs = [Signal(chain_type) for _ in range(chain_length)]
            funcs = []
            for i in range(chain_length):
                arg_sigs = []
                for arg in args:
                    if isinstance(arg, list):
                        assert len(arg) == chain_length, 'list arguments must match the chain length'
                        arg_sigs.append(arg[i])
                    else:
                        arg_sigs.append(arg)
                kwarg_sigs = {}
                for key, arg in kwargs.items():
                    if isinstance(arg, list):
                        assert len(arg) == chain_length, 'list arguments must match the chain length'
                        kwarg_sigs[key] = arg[i]
                    else:
                        kwarg_sigs[key] = arg
                kwarg_sigs[chain_in] = chain_in_sigs[i]
                kwarg_sigs[chain_out] = chain_out_sigs[i]
                funcs.append(func(*tuple(arg_sigs), **kwarg_sigs))

            @always_comb
            def connect():
                for j in range(1, chain_length):
                    chain_in_sigs[j].next = chain_out_sigs[j-1]

            # myhdl can't detect when a signal is renamed by being added to a list of signals so this compensates
            if lone_input:
                if lone_output:
                    @always_comb
                    def foo():
                        chain_in_sigs[0].next = c_in
                        c_out.next = chain_out_sigs[-1]
                    return funcs, connect, foo
                @always_comb
                def foo():
                    chain_in_sigs[0].next = c_in
                return funcs, connect, foo
            elif lone_output:
                @always_comb
                def foo():
                    c_out.next = chain_out_sigs[-1]
                return funcs, connect, foo
            return funcs, connect
        return chain_wrapper
    return chain_decorator

The second decorator @tree is for blocks which have a list input and a single output. Unlike the chain decorator, generating a tree structure out of hardware block instances is non-trivial without a recursive wrapper. A natural example for @tree is an adder. The adder I wrote can take up to four inputs in terms.

@block
def adder(terms, result):
    if len(terms) == 1:
        @always_comb
        def add():
            result.next = terms[0]
    elif len(terms) == 2:
        @always_comb
        def add():
            result.next = terms[0] + terms[1]
    elif len(terms) == 3:
        @always_comb
        def add():
            result.next = terms[0] + terms[1] + terms[2]
    else:
        @always_comb
        def add():
            result.next = terms[0] + terms[1] + terms[2] + terms[3]
    return add

Every hardware block that is decorated with @tree must have terms as its first positional input and result as its second (actual names don’t matter though). The @tree decorator makes a tree structure with each leaf being a term and each internal node being an instance of adder. For 16 terms it will generate four adders with four of the terms each and a final adder to add together the results of each of the four adders. @tree passes any args and kwargs that aren’t terms and result to every insatnce of adder. Here is the decorated adder

@tree(4, adder_result_width)
@block
def adder(terms, result)

The first input to @tree are num_branches which tells @tree the maximum number of branches per node. The second is get_result_width that takes a function which determines the minimum width of the result of each adder instance. An adder with inputs of width w will have a result width of w + log2(num_terms). Here’s the implementation of adder_result_width.

def adder_result_width(terms):
    from math import log2, ceil

    max_len = 1
    for term in terms:
        max_len = max(max_len, len(term))
    return max_len + ceil(log2(len(terms)))

The full code of @tree:

def _balanced_tree_indexer(num_leaves, num_branches):
    """Makes num_branches buckets and fills the buckets as evenly as possible with num_leaves leaves then returns a 
    set of start and end indices for slicing the list of leaves."""
    floor = num_leaves//num_branches
    widths = [floor]*num_branches
    for i in range(num_leaves % num_branches):
        widths[i] += 1
    branch_indices = []
    cur_index = 0
    for i in range(num_branches):
        branch_indices.append((cur_index, cur_index + widths[i]))
        cur_index += widths[i]
    return branch_indices

def tree(num_branches, get_result_width):
    """Decorates a hardware block that takes a list of signal inputs, terms, and a signal result. This decorator uses 
    a tree structure with num_branches branches.  Any additional args or kwargs will be passed to every hardware 
    block instance. """
    def tree_decorator(func):
        @block
        def wrapper(terms, root_result, *args, **kwargs):
            assert len(terms) > 0
            if len(terms) <= num_branches:
                return func(terms, root_result, *args, **kwargs)

            branch_indices = _balanced_tree_indexer(len(terms), num_branches)
            branches = []
            branch_results = []
            for i in range(num_branches):
                branch_terms = terms[branch_indices[i][0]:branch_indices[i][1]]
                result_width = get_result_width(branch_terms)
                branch_result = Signal(intbv(0)[result_width:])
                branches.append(wrapper(branch_terms, branch_result, *args, **kwargs))
                branch_results.append(branch_result)

            root = wrapper(branch_results, root_result, *args, **kwargs)
            return root, branches

        return wrapper

    return tree_decorator

Suggestions, criticism, and comments welcome.

1 Like