MyHDL Discourse

Large memory consumption

#1

We have very large MyHDL designs and are in the process of migrating from MyHDL 0.8 to 0.10. Unfortunately some of our designs runs out of memory (> 128Gbyte) when generating Verilog.

It seems there is at least a factor 4x higher memory consumption with 0.10 vs 0.8 for some of our designs.

The memory usage is high both when using toVerilog and .convert methods.

I am working on memory profiling but any hints to what might be causing the large difference between 0.8 and 0.10 would be helpful.

Kenny
0 Likes

#2

Do you use the same Python version (including 32/64 bit) ?

0 Likes

#3

Yes, Python 2.7.12. I also run with PyPy 5.1.2 with similar results.

0 Likes

#4

Any chance you give Python3 a try ?

0 Likes

#5

It is a large code base so migration to Python3 is not an option right now.

0 Likes

#6

Can you provide a sample to reproduce the problem ?
How do you measure memory consumption ?

0 Likes

#7

Initially I just measured using “top” but now I’m running guppy/heapy to try to identify where the memory is consumed.

One large part is symdict in the _Instantiator class which holds each blocks variables used to identify which are Signals. But I’m not sure this is the only source.

I have an example that consumes a lot of memory that I will post. It is not from the real design but it shows at least some part of the memory consumption issue.

0 Likes

#8

Here’s a small example. It uses about 900 Mbyte memory compare to 500 Mbyte on MyHdl 0.8). It is also much slower to convert on 0.10.

0.10:

907512 Mbyte
elab time: 30.4093239307
gen  time: 43.0679249763

0.8:

518860 Mbyte
elab+gen time: 31.7385931015

The example consists of 15000 inverters connected in series.

import time
from myhdl import block
from myhdl import always_comb, Signal
from myhdl import modbv


@block
def inverter(inp, outp):

    s1 = Signal(modbv(0)[1:])

    @always_comb
    def c():
        s1.next = not inp

    @always_comb
    def c2():
        outp.next = s1 

    return c,c2

@block
def inverters(inp,outp,nr_inverters=1):
    i_invs = []
    s = [Signal(modbv(0)[1:]) for i in range(nr_inverters)]
    sprev = inp
    for i in range(nr_inverters):
        i_invs.append( inverter(sprev,s[i]))
        sprev = s[i]
    @always_comb
    def conn():
        outp.next = sprev
    return i_invs,conn

@block
def top(inp,outp):
    inv = Signal(modbv()[1:])

    i_inv1 = inverter( inp, inv )
    i_invs = inverters(inv, outp,15000)

    return i_inv1, i_invs

if __name__ == '__main__':

    inp = Signal(modbv(0)[1:])
    outp = Signal(modbv()[1:])

    t0 = time.time()
    i_top = top(inp,outp)
    t1 = time.time()

    i_top.convert()
    t2 = time.time()
    print 'elab time:',t1-t0
    print 'gen  time:',t2-t1
0 Likes

#9

@kranerup thanks for sharing this issue and an example. I have not encountered memory issues during conversion, however I haven’t looked at the amount of memory used either.

I don’t know if anyone has much bandwidth right now to look at this issue, but it is something that I would like to understand and improve if possible.

0 Likes

#10

@cfelton
I had a quick look at it.
I used CPython 2.7.16 and vprof for profiling.
I used 1500 inverters.

MyHDL 0.8 used about 27MB
MyHDL 0.10 used about 43MB

Here are the most used resources :

objects MyHL 0.8 MyHDL 0.10 remark for 0.10
type dict 13823 33233
type tuple 11008 7925
class myhdl._intbv.intbv 9012 18018 (myhdl._modbv.modbv)
type instancemethod 6028 15016
type set 6011 9017
type list 4577 10595
type cell 4506 4505
type function 4497 3790
class myhdl._Signal._Signal 3004 6006
class myhdl._Signal._WaiterList 3004 6006
class myhdl._Signal._NegedgeWaiterList 3004 6006
type generator 3004 3004
class myhdl._Signal._PosedgeWaiterList 3004 6006
class myhdl._always_comb._AlwaysComb 3003 3003
class myhdl._Waiter._SignalWaiter 3003 None
class myhdl._extractHierarchy._Instance 1503 None
type weakref 234 243
type type 231 238
type classobj 130 13
0 Likes

#11

Looks similar to what I got from Heapy. The dict that consumed most according to Heapy was the symdict in the _Instantiator class.

I tried various tricks to make it more efficient like replace the dict with a list (very cpu-inefficient of course), pruning unused entries in the dict like __doc__ , __package__ , __name__. That did reduce the memory consumption but I get nowhere near the 0.8 usage.

I think my bottom-up approach is not sufficient. An understanding of how this dict is used and why it is different from 0.8 is probably necessary to get anywhere on this issue.

0 Likes

#12

I don’t know for V0.8 but V0.10 uses a two pass conversion algorithm.
Maybe this is the source of memory consumption difference.

0 Likes