Source code file content
web-content / trunk / dev / reviews / opinions_91546.html
Size: 19188 bytes, 1 line
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1"> <title>Memory Model infrastructure</title> <META NAME="AUTHOR" CONTENT="Vladimir Voskresensky"> <style type="text/css"> <!-- body {color: #000000; background-color: #ffffff; font-family: Monospaced} table {color: #000000; background-color: #e9e8e2; font-family: Monospaced} .java-block-comment {color: #737373} .java-layer-method {font-family: Monospaced; font-weight: bold} .java-keywords {color: #000099; font-family: Monospaced; font-weight: bold} --> </style></head> <body LANG="en-US" > <h1>Architecture Review Opinion</h1> <dl> <dt><b>Issue:</b> <a href="http://www.netbeans.org/issues/show_bug.cgi?id=91546">91546</a></dt> <dt><b>Submitter:</b> <a href="mailto:vv159170@netbeans.org">Vladimir Voskresensky</a></dt> <dt><b>History:</b><a href="http://www.netbeans.org/source/browse/cnd/www/dev/reviews/opinions_91546.html">in CVS</a> <dt><b>Date:</b> Dec 25, 2006</dt> <dt><b>Reviewers:</b> </dt> </dl> <hr/> <dl> <dt><b>Contents</b></dt> <dd> <ul> <li><a href="#summary">Summary</a></li> <li><a href="#decision">Decision</a></li> <li><a href="#opinion">Opinion</a></li> <li><a href="#mem_working_set">Memory Working Set</a></li> <li><a href="#minutes">Minutes</a></li> <li><a href="#issue_detailes">Issue details</a></li> <li><a href="#minority">Minority Opinion</a></li> <li><a href="#advisory">Advisory Information</a></li> <li><a href="#appendices">Appendices</a> <ul> <li><a href="#TCRs">Appendix A: TCRs</a></li> <li><a href="#TCAs">Appendix B: TCAs</a></li> <li><a href="#references">Appendix C: Reference Material</a></li> </ul> </li> </ul> </dd> </dl> <hr/> <h2><a name="summary">Summary</a></h2> <p> There are a lot of complains about memory usage and several bugs are still open </p> <ul> <li> <a href="http://www.netbeans.org/issues/show_bug.cgi?id=87921"> Out of Memory Error </a> </li> <li> <a href="http://www.netbeans.org/issues/show_bug.cgi?id=89648"> Code model memory consumption shouldn't depend linearly on project size </a> </li> <li> <a href="http://www.netbeans.org/issues/show_bug.cgi?id=87302"> APTStringManager uses more memory than necessary. </a> </li> </ul> <p> The <q>memory problem</q> seems to be one of the most important and as such it should be addressed one day. The way how to satisfy it is subject to this review. There used to be a general feeling that the solution should be based on repository engine. </p> <h2><a name="decision">Decision</a></h2> <p> </p> <!-- <p>[Keep this short; details will be given in <a href="#opinion">Opinion</a> section. Mark one of three outcomes:</p> <ul> <li><b>Accepted</b> (Go to implement or commit, based on phase of review)</li> <li><b>Accepted with change requests</b> (Go to implement or commit with completed Technical Change requests)</li> <li><b>Rejected</b> (No, go back and do it again.)</li> </ul> <p>]</p> --> <h2><a name="opinion">Opinion</a></h2> <p>The following significant issues were discussed at the inception review.</p> <h2><a name="mem_working_set">Memory Working Set</h2> <p> The underlying memory metric is that of "memory working set". This is the minimum amount of memory required to run the application while achieving adequate performance. <img src="91546/memory-set.png" alt="memory-set"/> </p> <h2><a name="minutes">Minutes</h2> <p> Memory Model [all high priority] <ul> <li> (9)a Design and implement a runtime repository to maintain parsed data </li> <li> (9)b Implement the client for accessing and using parsed data </li> </ul> </p> <h3>Memory</h3> </p> <p> The underlying memory metric is that of "memory working set". This is the minimum amount of memory required to run the application while achieving adequate performance. Detailed goals for what the required memory should be were not discussed. The team agreed that the selected solution should scale "nicely" with application size, so for example, even large applications like Firefox could run in standard system configurations. Additional quantification is needed here. </p> <p> The development team proposed adding a repository to the language model. Two options were discussed: <ul> <li> (i) use of Lucene </li> <li> (ii) custom solution. </li> </ul> </p> <p> Development agreed to: <ul> <li> (i) execute a due diligence effort to identify available open source solutions (beyond Lucene) for a repository implementation </li> <li> (ii) simulate the memory and performance profile for Lucene with large production applications </li> <li> (iii) size the effort for a tuned custom implementation. </li> </ul> </p> <p> The selected repository approach needs to serve all language model needs, including the symbol table (required for the accuracy work), the core parsing, and a future cross reference ("xref"). The decision for which approach to use is pending completion of exploring the listed options, and a top level design. </p> </p> <h2>API design</h2> <h3>Code Model</h3> <p> <img src="91546/code-model-detailed.png" alt="detailed-csm"/> </p> <h3>Memory model: Client proposal</h3> <p>TBD</p> <h3>Memory model: Repository proposal</h3> <p> <pre> <span class="java-block-comment">//////////////////////////////////////////////////////////////////////////////</span> <span class="java-block-comment">// First two interfaces should be implemented by client to use the repository.</span> <span class="java-block-comment">/**</span> <span class="java-block-comment"> * Interface which classes should implement to be persistable</span> <span class="java-block-comment"> */</span> <span class="java-keywords">public</span> <span class="java-keywords">interface</span> Persistent { <span class="java-block-comment">/**</span> <span class="java-block-comment"> * Serialization </span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">write</span>(OutputStream out); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * Deserialization </span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">read</span>(InputStream in); } <span class="java-keywords">public</span> <span class="java-keywords">interface</span> PersistentObjectFactory { <span class="java-block-comment">/**</span> <span class="java-block-comment"> * create an object by handle, </span> <span class="java-block-comment"> * handle is a sign for a factory to understand which kind (class) should</span> <span class="java-block-comment"> * be used to create new object</span> <span class="java-block-comment"> */</span> Persistent <span class="java-layer-method">createPersistent</span>(<span class="java-keywords">int</span> handle); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * retrieve handle for object class </span> <span class="java-block-comment"> */</span> <span class="java-keywords">int</span> <span class="java-layer-method">getHandle</span>(Persistent obj); } <span class="java-block-comment">//////////////////////////////////////////////////////////////</span> <span class="java-block-comment">// This interface would be implemented by repository provider.</span> <span class="java-keywords">public</span> <span class="java-keywords">interface</span> Repository { <span class="java-block-comment">/**</span> <span class="java-block-comment"> * initialize and provide Repository with objects factory</span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">init</span>(PersistentObjectFactory factory, String repositoryId); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * store object, maybe Id should be on behalf of the object itself</span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">put</span>(Identifier id, Persistent obj); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * retrieve object</span> <span class="java-block-comment"> */</span> Persistent <span class="java-layer-method">get</span>(Identifier id); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * stop storing object</span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">remove</span>(Identifier id); <span class="java-block-comment">/**</span> <span class="java-block-comment"> * store all objects to permanent location </span> <span class="java-block-comment"> * should be called, e.g., during IDE shutdown or project closing</span> <span class="java-block-comment"> */</span> <span class="java-keywords">void</span> <span class="java-layer-method">flush</span>(); } <span class="java-block-comment">//////////////////////////////////////////////////////////////</span> <span class="java-block-comment">// Accessor.</span> <span class="java-keywords">public</span> <span class="java-keywords">class</span> RepositoryAccessor { <span class="java-keywords">private</span> <span class="java-layer-method">RepositoryAccessor</span>() {}; <span class="java-keywords">private</span> <span class="java-keywords">static</span> Repository instance; <span class="java-block-comment">/**</span> <span class="java-block-comment"> * Default way for clients to get instance</span> <span class="java-block-comment"> */</span> <span class="java-keywords">public static</span> Repository <span class="java-layer-method">getRepository</span>(String repositoryId) { <span class="java-keywords">if</span> (instance == <span class="java-keywords">null</span>) { instance = (Repository)Lookup.<span class="java-layer-method">getDefault</span>().<span class="java-layer-method">lookup</span>(Repository.<span class="java-keywords">class</span>); } <span class="java-keywords">return</span> instance; } } </pre> </p> <h2><a name="issue_detailes">Issue details</h2> <h3>Base Level</h3> <p> The most memory critical part is API Implementation component. </p> <h3>Tasks</h3> <p> <b>Action item:</b> Design for repository client and it's place in memory model <p> <img src="91546/repository.png" alt="repository"/> </p> </p> <p> <b>Action item:</b> Analyze the current state of memory usage using profiler and MySQL. Consider Library and Project elements as different <ul> <li> Amount of used memory by different elements </li> <li> Number of objects for different elements </li> </ul> </p> <p> <b>Action item:</b> Prototype Light Weight Elements approach. </p> <p> <b>Action item:</b> Introduce CsmID as replacement for hard references to objects (for API clients) <p> <img src="91546/UID.png" alt="UID"/> </p> </p> <p> <b>Action item:</b> Prototype using SoftReferences as Java approach for memory management </p> <p> <b>Action item:</b> Rewrite model to use RID instead of hard references. Uses KeyBasedUID in most cases. <p> <img src="91546/keyUID.png" alt="keyUID"/> </p> </p> <p> <b>Action item:</b> Update API-clients use CsmID instead of hard references <p> <img src="91546/fileImpl.png" alt="fileImpl"/> </p> </p> <h3>Some Implementation Notes to consider and not forget</h3> <p> <b>Action item:</b> FileBufferFile handles java.io.File objects (may be path is enough) </p> <h4>Mem info for MySql on Opteron (19-01-2007)</h4> <table border="2"> <thead> <tr> <th>Package</th> <th>Objects</th> <th>Shallow Size</th> <th>Retained Size</th> </tr> </thead> <tbody> <tr bgcolor="white"> <td>all model</td> <td>2,677,267 (100%)</td> <td>75,214,328 (100%)</td> <td>185,727,576 (100%)</td> </tr> <tr> <td>modelimpl</td> <td>1,417,137 (53%)</td> <td>44,408,944 (59%)</td> <td>177,009,664 (95%)</td> </tr> <tr> <td>apt</td> <td>1,260,093 (47%)</td> <td>30,804,776 (41%)</td> <td>82,807,280 (45%)</td> </tr> <tr> <td>repository</td> <td>0</td> <td>0</td> <td>0</td> </tr> </tbody> </table> <h4>Mem info for MySql on Opteron (22-01-2007) with "clean snapshot" prototype</h4> <table border="2"> <thead> <tr> <th>Package</th> <th>Objects</th> <th>Shallow Size</th> <th>Retained Size</th> </tr> </thead> <tbody> <tr bgcolor="white"> <td>all model</td> <td>1,418,969 (100%)</td> <td>44,453,680 (100%)</td> <td>112,002,680 (100%)</td> </tr> <tr> <td>modelimpl</td> <td>1,415,948 (100%)</td> <td>44,380,408 (100%)</td> <td>103,736,544 (93%)</td> </tr> <tr> <td>apt</td> <td>2,984 (0%)</td> <td>72,664 (0%)</td> <td>9,110,824 (8%)</td> </tr> <tr> <td>repository</td> <td>0</td> <td>0</td> <td>0</td> </tr> </tbody> </table> <p> There are APT States handled by ProjectBase in modelimpl that affect memory </p> <p> <img src="91546/APTStateHandlers.png" alt="APTState"/> </p> <pre> Some details about "clean snapshot" (most improvements are in APT size). Nr Objects Shallow Size Retained Size APT part of DDD 317,000->1,200 8Mb->28Kb 21Mb->3Mb full DDD 663,000->347,000 18.5Mb->11Mb 48Mb->29.5Mb APT part of MySQL 1,260,000->3,000 31Mb->72Kb 82Mb->9Mb full MySQL 2,700,000->1,400,000 75Mb->44.5Mb 186Mb->112Mb </pre> <h2><a name="minority">Minority Opinion</a></h2> <p> to be done </p> <pre> </pre> <p> </p> <h2><a name="advisory">Advisory Information</a></h2> <p>[List any non-blocking issues and suggestions for improvement.]</p> <p> Comment about repository from Nik: </p> <pre> Sun Studio compilers have "-sb" option. This option is used to generate source browser data. It is some kind of repository, and probably it can be used as a candidate for a "custom solution" repository. I don't know if it will fit, but it seems it makes sense to check out. </pre> <p> </p> <p> Comment about stand alone parsing from Nik: </p> <pre> add the ability to parse projects outside of IDE. This is the only way to solve the problem with the memory limit. IDE cannot increase its own memory limit on fly, but it can start a child process with increased maximum memory limit </pre> <p> Comment about NIO from Tim: </p> <pre> One option that seems not to be mentioned is using NIO memory mapped files for indexing sources. It doesn't solve the problem of nondeterministic behavior due to swapping, but it is a way around the heap size limit, and generally works quite well, and since you control the cache, some access-level optimizations are possible. If you can either do fixed-record-length caches or cache file + metadata lookup table, you can probably get around the heap limit quite nicely. I wrote some (pretty embryonic and untested) code to build such caches in contrib/misc/cache - it's based on code I wrote for the NetBeans output window, which had to be able to handle 400Mb of text and still be able to scroll and line-wrap without a hiccup, and I was surprised at just how well it ended up working. Ideally some existing library will do that for you, but this is the sort of thing that often needs to be really optimized for the particular use-case. Some people would argue that mapping objects to data kept in records off the Java heap not "object oriented" enough, but it certainly is practical if you're dealing with huge data sets. </pre> <h2><a name="appendices">Appendices</a></h2> <h3><a name="TCRs">Appendix A: Technical Changes Required</a></h3> <p>[File all TCRs in Issuezilla with P1 or P2 priority and make the issue representing this review depend on them.]</p> <h3><a name="TCAs">Appendix B: Technical Changes Advised</a></h3> <p>[File all TCAs in Issuezilla with P3 to P5 priority and make the issue representing this review depend on them.]</p> <h3><a name="references">Appendix C: Reference Material</a></h3> <p>[List additional materials relevant to reviewed case]</p> </body> </html>