Large memory consumption

kranerup · April 10, 2019, 9:32am

We have very large MyHDL designs and are in the process of migrating from MyHDL 0.8 to 0.10. Unfortunately some of our designs runs out of memory (> 128Gbyte) when generating Verilog.

It seems there is at least a factor 4x higher memory consumption with 0.10 vs 0.8 for some of our designs.

The memory usage is high both when using toVerilog and .convert methods.

I am working on memory profiling but any hints to what might be causing the large difference between 0.8 and 0.10 would be helpful.

Kenny

DrPi · April 10, 2019, 11:49am

Do you use the same Python version (including 32/64 bit) ?

kranerup · April 11, 2019, 2:03pm

Yes, Python 2.7.12. I also run with PyPy 5.1.2 with similar results.

DrPi · April 16, 2019, 9:48am

Any chance you give Python3 a try ?

kranerup · April 17, 2019, 7:37am

It is a large code base so migration to Python3 is not an option right now.

DrPi · April 17, 2019, 9:48am

Can you provide a sample to reproduce the problem ?
How do you measure memory consumption ?

kranerup · April 17, 2019, 10:05am

Initially I just measured using “top” but now I’m running guppy/heapy to try to identify where the memory is consumed.

One large part is symdict in the _Instantiator class which holds each blocks variables used to identify which are Signals. But I’m not sure this is the only source.

I have an example that consumes a lot of memory that I will post. It is not from the real design but it shows at least some part of the memory consumption issue.

kranerup · April 17, 2019, 1:06pm

Here’s a small example. It uses about 900 Mbyte memory compare to 500 Mbyte on MyHdl 0.8). It is also much slower to convert on 0.10.

0.10:

907512 Mbyte
elab time: 30.4093239307
gen  time: 43.0679249763

0.8:

518860 Mbyte
elab+gen time: 31.7385931015

The example consists of 15000 inverters connected in series.

import time
from myhdl import block
from myhdl import always_comb, Signal
from myhdl import modbv


@block
def inverter(inp, outp):

    s1 = Signal(modbv(0)[1:])

    @always_comb
    def c():
        s1.next = not inp

    @always_comb
    def c2():
        outp.next = s1 

    return c,c2

@block
def inverters(inp,outp,nr_inverters=1):
    i_invs = []
    s = [Signal(modbv(0)[1:]) for i in range(nr_inverters)]
    sprev = inp
    for i in range(nr_inverters):
        i_invs.append( inverter(sprev,s[i]))
        sprev = s[i]
    @always_comb
    def conn():
        outp.next = sprev
    return i_invs,conn

@block
def top(inp,outp):
    inv = Signal(modbv()[1:])

    i_inv1 = inverter( inp, inv )
    i_invs = inverters(inv, outp,15000)

    return i_inv1, i_invs

if __name__ == '__main__':

    inp = Signal(modbv(0)[1:])
    outp = Signal(modbv()[1:])

    t0 = time.time()
    i_top = top(inp,outp)
    t1 = time.time()

    i_top.convert()
    t2 = time.time()
    print 'elab time:',t1-t0
    print 'gen  time:',t2-t1

cfelton · April 19, 2019, 1:43pm

@kranerup thanks for sharing this issue and an example. I have not encountered memory issues during conversion, however I haven’t looked at the amount of memory used either.

I don’t know if anyone has much bandwidth right now to look at this issue, but it is something that I would like to understand and improve if possible.

DrPi · April 23, 2019, 1:36pm

@cfelton
I had a quick look at it.
I used CPython 2.7.16 and vprof for profiling.
I used 1500 inverters.

MyHDL 0.8 used about 27MB
MyHDL 0.10 used about 43MB

Here are the most used resources :

objects	MyHL 0.8	MyHDL 0.10	remark for 0.10
type dict	13823	33233
type tuple	11008	7925
class myhdl._intbv.intbv	9012	18018	(myhdl._modbv.modbv)
type instancemethod	6028	15016
type set	6011	9017
type list	4577	10595
type cell	4506	4505
type function	4497	3790
class myhdl._Signal._Signal	3004	6006
class myhdl._Signal._WaiterList	3004	6006
class myhdl._Signal._NegedgeWaiterList	3004	6006
type generator	3004	3004
class myhdl._Signal._PosedgeWaiterList	3004	6006
class myhdl._always_comb._AlwaysComb	3003	3003
class myhdl._Waiter._SignalWaiter	3003		None
class myhdl._extractHierarchy._Instance	1503		None
type weakref	234	243
type type	231	238
type classobj	130	13

kranerup · April 23, 2019, 1:48pm

Looks similar to what I got from Heapy. The dict that consumed most according to Heapy was the symdict in the _Instantiator class.

I tried various tricks to make it more efficient like replace the dict with a list (very cpu-inefficient of course), pruning unused entries in the dict like __doc__ , __package__ , __name__. That did reduce the memory consumption but I get nowhere near the 0.8 usage.

I think my bottom-up approach is not sufficient. An understanding of how this dict is used and why it is different from 0.8 is probably necessary to get anywhere on this issue.

DrPi · April 23, 2019, 2:09pm

I don’t know for V0.8 but V0.10 uses a two pass conversion algorithm.
Maybe this is the source of memory consumption difference.

josyb · May 13, 2019, 11:55am

I noticed that the converter actually does three passes, one of them is marked as a workaround. Not that great, I’d say …

DrPi · May 14, 2019, 7:27am

I noticed the workaround but didn’t understood its purpose at first read.
However, running a conversion executes MyHDL code twice.

Topic		Replies	Views
VHDL conversion - missing constant Support	3	1170	July 31, 2018
Moving to 0.1dev broke my design Support	1	996	March 23, 2018
Newbie questions: trying to understand Verilog conversion behavior Bug	2	1278	September 16, 2016
MyHDL cheat sheet Showcase	2	1496	October 19, 2016
Performance comparision with systemc	1	1758	December 19, 2016

Large memory consumption

Related topics