Large memory consumption

We have very large MyHDL designs and are in the process of migrating from MyHDL 0.8 to 0.10. Unfortunately some of our designs runs out of memory (> 128Gbyte) when generating Verilog.

It seems there is at least a factor 4x higher memory consumption with 0.10 vs 0.8 for some of our designs.

The memory usage is high both when using toVerilog and .convert methods.

I am working on memory profiling but any hints to what might be causing the large difference between 0.8 and 0.10 would be helpful.

Kenny

Do you use the same Python version (including 32/64 bit) ?

Yes, Python 2.7.12. I also run with PyPy 5.1.2 with similar results.

Any chance you give Python3 a try ?

It is a large code base so migration to Python3 is not an option right now.

Can you provide a sample to reproduce the problem ?
How do you measure memory consumption ?

Initially I just measured using “top” but now I’m running guppy/heapy to try to identify where the memory is consumed.

One large part is symdict in the _Instantiator class which holds each blocks variables used to identify which are Signals. But I’m not sure this is the only source.

I have an example that consumes a lot of memory that I will post. It is not from the real design but it shows at least some part of the memory consumption issue.

Here’s a small example. It uses about 900 Mbyte memory compare to 500 Mbyte on MyHdl 0.8). It is also much slower to convert on 0.10.

0.10:

907512 Mbyte
elab time: 30.4093239307
gen  time: 43.0679249763

0.8:

518860 Mbyte
elab+gen time: 31.7385931015

The example consists of 15000 inverters connected in series.

import time
from myhdl import block
from myhdl import always_comb, Signal
from myhdl import modbv


@block
def inverter(inp, outp):

    s1 = Signal(modbv(0)[1:])

    @always_comb
    def c():
        s1.next = not inp

    @always_comb
    def c2():
        outp.next = s1 

    return c,c2

@block
def inverters(inp,outp,nr_inverters=1):
    i_invs = []
    s = [Signal(modbv(0)[1:]) for i in range(nr_inverters)]
    sprev = inp
    for i in range(nr_inverters):
        i_invs.append( inverter(sprev,s[i]))
        sprev = s[i]
    @always_comb
    def conn():
        outp.next = sprev
    return i_invs,conn

@block
def top(inp,outp):
    inv = Signal(modbv()[1:])

    i_inv1 = inverter( inp, inv )
    i_invs = inverters(inv, outp,15000)

    return i_inv1, i_invs

if __name__ == '__main__':

    inp = Signal(modbv(0)[1:])
    outp = Signal(modbv()[1:])

    t0 = time.time()
    i_top = top(inp,outp)
    t1 = time.time()

    i_top.convert()
    t2 = time.time()
    print 'elab time:',t1-t0
    print 'gen  time:',t2-t1

@kranerup thanks for sharing this issue and an example. I have not encountered memory issues during conversion, however I haven’t looked at the amount of memory used either.

I don’t know if anyone has much bandwidth right now to look at this issue, but it is something that I would like to understand and improve if possible.

@cfelton
I had a quick look at it.
I used CPython 2.7.16 and vprof for profiling.
I used 1500 inverters.

MyHDL 0.8 used about 27MB
MyHDL 0.10 used about 43MB

Here are the most used resources :

objects MyHL 0.8 MyHDL 0.10 remark for 0.10
type dict 13823 33233
type tuple 11008 7925
class myhdl._intbv.intbv 9012 18018 (myhdl._modbv.modbv)
type instancemethod 6028 15016
type set 6011 9017
type list 4577 10595
type cell 4506 4505
type function 4497 3790
class myhdl._Signal._Signal 3004 6006
class myhdl._Signal._WaiterList 3004 6006
class myhdl._Signal._NegedgeWaiterList 3004 6006
type generator 3004 3004
class myhdl._Signal._PosedgeWaiterList 3004 6006
class myhdl._always_comb._AlwaysComb 3003 3003
class myhdl._Waiter._SignalWaiter 3003 None
class myhdl._extractHierarchy._Instance 1503 None
type weakref 234 243
type type 231 238
type classobj 130 13

Looks similar to what I got from Heapy. The dict that consumed most according to Heapy was the symdict in the _Instantiator class.

I tried various tricks to make it more efficient like replace the dict with a list (very cpu-inefficient of course), pruning unused entries in the dict like __doc__ , __package__ , __name__. That did reduce the memory consumption but I get nowhere near the 0.8 usage.

I think my bottom-up approach is not sufficient. An understanding of how this dict is used and why it is different from 0.8 is probably necessary to get anywhere on this issue.

I don’t know for V0.8 but V0.10 uses a two pass conversion algorithm.
Maybe this is the source of memory consumption difference.

I noticed that the converter actually does three passes, one of them is marked as a workaround. Not that great, I’d say …

I noticed the workaround but didn’t understood its purpose at first read.
However, running a conversion executes MyHDL code twice.