Source code file content
web-content / trunk / dev / reviews / opinions_92584.html
Size: 9056 bytes, 1 line
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1"> <title>Memory Model Accuracy</title> <META NAME="AUTHOR" CONTENT="Vladimir Voskresensky"> </head> <body LANG="en-US" > <h1>Memory Model Accuracy: Architecture Review Opinion</h1> <dl> <dt><b>Issue:</b> <a href="http://www.netbeans.org/issues/show_bug.cgi?id=92584">92584</a></dt> <dt><b>Submitter:</b> <a href="mailto:vkvahin@netbeans.org">Vladimir Kvashin</a></dt> <dt><b>History:</b><a href="http://www.netbeans.org/source/browse/cnd/www/dev/reviews/opinions_92584.html">in CVS</a> <dt><b>Date:</b> Jan 16, 2006</dt> <dt><b>Reviewers:</b> </dt> </dl> <hr/> <dl> <dt><b>Contents</b></dt> <dd> <ul> <li><a href="#definitions">Definitions</a></li> <li><a href="#architecture">Brief code model architecture overview</a></li> <li><a href="#issues">Issues</a></li> <li><a href="#statistics">Some statistics</a></li> <li><a href="#solutions">Solutions</a></li> <li><a href="#advisory">Advisory Information</a></li> <li><a href="#appendices">Appendices</a> <ul> <li><a href="#TCRs">Appendix A: TCRs</a></li> <li><a href="#TCAs">Appendix B: TCAs</a></li> <li><a href="#references">Appendix C: Reference Material</a></li> </ul> </li> </ul> </dd> </dl> <h2><a name="definitions">Definitions</a></h2> <dl> <dt>AST</dt> <dd> <p> AST stands for Abstract Syntax Tree - a tree that represents the entire compilation unit code (see <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Wikipedia AST definition</a> for more details). In CND code model, AST is produced by parser and then processed by a special component (called Renderer) that builds implementation of code model API. This allows to separate parser from other components. </p> </dd> <dt>Symbol table</dt> <dd> <p> A symbol table is a mechanism where each identifier in a program's source code is associated with information such as its type, scope level and sometimes its location. This mechanism should allow to find information about identifier effectively. (See <a href="http://en.wikipedia.org/wiki/Symbol_table"> Wikipedia symbol table definition</a> or "Dragon Book", chapter 1.3) </p> </dd> <dt>Dynamic vs static symbol table</dt> <dd> <p> Associating identifier with the element (variable, function, type, etc.) that is represented by this identifier, might happen either on parsing phase or after the AST has been already built. In the former case it is created and used by parser and it is dynamic (i.e. each time parser leaves a scope, the scope's content "disappears" from symbol table). In the latter case it is static - anyone can ask about any identifier at any time. </p> </dd> <dt>Resolver</dt> <dd> <p> A resolver is a historical term that means a static symbol table. </p> </dd> <dt>Code model architecture</dt> <dd> <p> Below is a code model architecture chart </p> <img src="92584/code-model-detailed.png" alt="Detailed code model diagram"/> </dd> </dl> <h2><a name="issues">Issues</a></h2> For node code model isn't accurate enough. Ideally, it should be 100% accurate in assumption that code does not contain compiler errors. We mean accuracy of data represented via code model API. Code model clients (completion, classview, etc.) might have their own issues that lower accuracy from end-user point of view; we aren't going to discuss these issues here - they are the matter of a separate disuccions, IZs, reviews, etc. Particluar issues are: <dl> <dt><p><b>Lack of symbol table at parsing time</b></dt> <dd>There are several constructs that can not be correctly recognized at parsing phase: <br><code> A (a); <font color="gray"> // function call or variable declaration ? </font></code> <br><code> A<B> c; <font color="gray"> // expression or variable declaration? </font></code> <br><code> (B)(c) ; <font color="gray"> // function call or cast expression? </font></code> <br><code> A b(t); <font color="gray"> // function declaration or variable declaration? </font></code> </dd> <dt><p><b>Poor resolver (static symbol table)</b></dt> <dd> For now, we have a static symbol table (AKA Resolver). <br>The algorythm is poor and in some cases incorrect at all. </dd> <!-- <dt><p><b>Inefficient resolver (static symbol table)</b></dt> <dd> Resolver searches model recursively to find out what does the given name refer to. On subsequent calls it isn't able to reuse any information gathered on previous calls. The makes algorythm inefficient. </dd> --> <dt><p><b>Using canonical parameter types representation</b></dt> <dd> The following declarations are treated as different: <br><code> void foo(string) </code> <br><code> void foo(std::string) </code> <br> Fixing this needs reliable and at the same time efficient resolver. </dd> <dt><p><b>Lack of C/C++ distinctoin</b></dt> <dd> The following code means different in C and C++. <br><code> void foo(); </code> <br><code> void foo(int p) { // ... } ; </code> <br> In C++ it declares two functions while in C it is the same function (there are no overloads in C!) </dd> <dt><p><b>Parser errors</b></dt> <dd> We still have some parser errors (situations in which parser is not able to process </dd> </dl> <h2><a name="statistics">Some statistics</a></h2> This statistics isn't complete: for many projects current (old) whitebox tests run out of memory; and new whitebox tests aren't yet ready. Although it is incomplete, the table below gives a lot of interesting information. <pre> project Dwarf Model Delta Accuracy Parser err Unresolved ------- ----- ----- ----- -------- ---------- ---------- clucene.2 4.855 4.317 538 88,9 % 73 344 litesql.1 1.612 1.250 362 77,5 % 7 104 mico.1 (partial) 6.822 5.409 1.413 79,3 % 24 3.285 mysql.3 (partial) 43.291 39.267 4.024 90,7 % 189 274 python.2 14.782 13.905 877 94,1 % 13 2 Total 71.362 64.148 7.214 89,9 % 306 4.009 </pre> <h2><a name="solutions">Solutions</a></h2> <h3><a name="solutions-dynamic-symtav">Dynamic Symbol Table</a></h3> Dynamic symbol table interface looks as follows. <pre> //-------------------- // Reading //-------------------- /** Represents different kinds of identifiers */ enum Kind { Type, Function, Variable } /** Determines the given identifier kind */ boolean getKind(String name); //-------------------- // Modification //-------------------- /** Is called when entering a scope */ void push(); /** Is called when leaving a scope */ void pop(); /** Adds an element to the current frame */ void add(String name, Kind kind); /** Adds all symbols from the given namespace. * Is called when entering namespace definition. */ void addFromNamespace(String namespaceName); /** Adds symbols from the given class * Is called when entering a class that extends given class */ void addFromClass(String className, boolean isPublic); </pre> <h3><a name="solutions-static-symtab">Static Symbol Table</a></h3> Static symbol table interface looks as follows. <pre> public class ResolverFactory { public static Resolver createResolver(CsmFile file, int offset); } public interface Resolver { /** * Resolves identifier name. * * @param nameTokens tokenized name to resolve * (for example, for std::vector it is new String[] { "std", "vector" }) */ public CsmObject resolve(String[] nameTokens); /** * Resolves identifier name. * * @param qualifiedName name to resolve */ public CsmObject resolve(String qualifiedName); } </pre> <h3><a name="solutions-pros-and-cons">Static vs Dynamic Pros and Cons</a></h3> <h2><a name="advisory">Advisory Information</a></h2> <p>[List any non-blocking issues and suggestions for improvement.]</p> <h2><a name="appendices">Appendices</a></h2> <h3><a name="TCRs">Appendix A: Technical Changes Required</a></h3> <p>[File all TCRs in Issuezilla with P1 or P2 priority and make the issue representing this review depend on them.]</p> <h3><a name="TCAs">Appendix B: Technical Changes Advised</a></h3> <p>[File all TCAs in Issuezilla with P3 to P5 priority and make the issue representing this review depend on them.]</p> <h3><a name="references">Appendix C: Reference Material</a></h3> <p>[List additional materials relevant to reviewed case]</p> </body> </html>