This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 189004 - Infinite Loop in Error Cache while Scanning Linked directories
Summary: Infinite Loop in Error Cache while Scanning Linked directories
Status: RESOLVED FIXED
Alias: None
Product: editor
Classification: Unclassified
Component: Parsing & Indexing (show other bugs)
Version: 7.0
Hardware: PC Linux
: P2 normal (vote)
Assignee: Tomas Zezula
URL:
Keywords:
: 194730 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-28 04:56 UTC by codeslinger_compsalot
Modified: 2011-08-05 15:01 UTC (History)
10 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Simple folder structure (213.22 KB, application/zip)
2011-08-05 14:05 UTC, Hickeroar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description codeslinger_compsalot 2010-07-28 04:56:57 UTC
I am running the PHP only version of Netbeans on Linux.  I can repro this behavior on 6.8, 6.9rc1, and dev builds.

I finally figured out the cause of some performance issues I've been seeing.

The problem is that my php project contains a symbolic link that points it's parent directory.  In this instance the link was accidental, however there are also some valid scenarios where this occurs, for instance certain website layouts.

What I discovered is that when Netbeans scans the code it builds a cache of the directory structure.  The scan clearly gets confused by the infinite loop of the directory structure and tries to replicate it in the cache.  It gives up after 37 recursions...

Given the following directory structure:

dir  /somewhere/projects/mycode
dir  /somewhere/projects/mycode/adir/
file /somewhere/projects/mycode/adir/fileN.php

link  /somewhere/projects/mycode/fu -> /somewhere/projects/mycode

Make sure that fileN.php  contains a syntax error

------
Now run netbeans and let it scan that structure.


Result:
Look in /home/~user/.netbeans/6.9rc1/var/cache/index/

inside of /index/  the actual path varies but in one repro it was:
/s5/errors/1/

and then inside of that folder is 
/somewhere/projects/mycode/adir/fileN.php
/somewhere/projects/mycode/fu

/somewhere/projects/mycode/fu/adir/fileN.php
/somewhere/projects/mycode/fu/fu

/somewhere/projects/mycode/fu/fu/adir/fileN.php
/somewhere/projects/mycode/fu/fu/fu

etc...  the full path looks something like this

/home/~user/.netbeans/6.9rc1/var/cache/index/s5/errors/1/somewhere/projects/mycode/fu/fu/fu/fu/fu/fu/fu/fu/.../adir/
 
It of course reparses the entire set of files that it finds with each iteration. interestingly only files with errors in them get cached in this way.

After I removed the link, I deleted the contents of 
/home/~user/.netbeans/6.9rc1/var/cache/

and then ran netbeans, it rescanned everything, but this time without the infinite loop it was much faster, and equally important to me is that the cache went from being 536 megs to only being 35 megs.

So the fix here is to detect directory loops and refuse to rescan the same files multiple times, but at the same time do not break ability to follow symbolic links -- since the lack of this ability is the major reason I stopped using Eclipse.

Might I suggest that as you walk the dirs, to maintain a list of inodes, and with each new dir check if it's inode is already in the list and skip it...
Comment 1 Peter Pis 2010-07-28 07:45:48 UTC
Please evaluate.
Comment 2 Tomas Mysik 2010-08-01 09:59:58 UTC
Honzo, can you please have a look at this issue? Or maybe Tomas?

Thanks a lot.
Comment 3 Tomas Zezula 2010-08-04 06:37:57 UTC
NetBeans does not detect symlinks (getCannonicalFile) so it just follow them. Symlink to parent causes infinite loop as described above. Unfortunately there is no other way except of getCannonicalFile to detect symlink in JDK < 1.7 (getCannonicalFile points to the original file) but getCannonical file is very expensive and cannot be used.
Comment 4 Jan Lahoda 2010-08-04 09:30:24 UTC
Not directly related to the errors cache, which only reflects what was parsed (other metadata about the files are stored in different parts of the cache). I agree with Tomas: I do not think we can reasonably check for symlink cycles, as that would make indexing slower for everyone (check for hardlink/mount --bind cycles is currently impossible in java, and testing via native/JNA code would be even slower). In 1.7, there is java.nio.file.Files.walkFileTree, which allows to recursively walk through a directory, detecting cycles (I did not test it).
Comment 5 codeslinger_compsalot 2010-08-06 23:39:30 UTC
Here is an approach that ought to work:

On Windows this is a non-issue because it does not support symlinks in the way that *nix does.  End of story.


On *nix looking up the inode number of a directory is normally a very fast operation.   I don't know Java, so I don't what your limitations are, but other languages have no problem requesting an inode number.  You do not need to use getCannonicalFile.


All we care about is preventing loops via linkage to a parent directory, so you do not need to maintain a big list, only the list of the directory hierarchy that you are currently walking.

Method:

As you recurs the dirs, you look up the inode of that dir -- this is normally available in an extended properties request.

You push that inode onto a stack.  

Every time that you enter a new dir, you get it's inode and check to see if it's currently on the stack.  If it is, then you skip that dir, it's a loop.

as you leave a dir, you pop it's inode from the stack

This is a low overhead way to prevent loops.  Max stack depth should never exceed 50 levels and would typically be much smaller.  If you did hit 50 levels then you ought to start asking some questions about the validity of the file system.


Symlinks are very common on *nix systems and the performance of walking an infinite loop is truly horrible....   The above method is a net gain.
Comment 6 Tomas Zezula 2010-08-09 08:41:27 UTC
The inode number is problematic. The java has no API to get the inode nomber, it will require JNI call which is slow to call. The second problem is that it does not work for symlinks only for hardlinks. Symlink is a special file  pointing to original unlike hardlink only reference to the same inode.
Comment 7 bluescrubbie 2010-08-13 01:22:01 UTC
codeslinger_compsalot@netbeans.org wrote:
"On Windows this is a non-issue because it does not support symlinks in the way
that *nix does.  End of story."

This is not the case.  Windows has supported symbolic links since Vista (mklink)

I'm developing with Magento (a bulky e-commerce platform) that implements multi-store through upward 'junction' symlinks (mklink /j), and infinite scanning is making the IDE almost unusable and frequently hanging.
Comment 8 codeslinger_compsalot 2010-08-13 02:16:48 UTC
Yes, I've experienced the same behavior -- continuous scanning.

I suspect that a number of your "performance bugs"  especially the not repro ones, are actually caused by this infinite looping on symlinks.  I know that at one point I nearly gave up on using netbeans because of this problem.  But I thought it was just a matter of project size, because when I defined the projects to be smaller the performance improved dramatically.  Now, I realize that my smaller project did not include the recursive symlink.

-------

well, since this apparently can't be done by java then the question would be, can java request this info from an external program?  

For instance this directory parser would be trivial to write in PHP -- I'd be willing to write it for you -- and then it could return the list of directories to be searched, back to the caller -- perhaps as a temp file on disk.

That would solve it for Linux for certain.   For Vista etc. that would need more research to see how Vista makes the equivalent info available.


symlinks are way to useful to ignore them.
Comment 9 codeslinger_compsalot 2010-08-13 02:23:20 UTC
quote:  The second problem is that it does not work for symlinks only for hardlinks. Symlink is a special file pointing to original unlike hardlink only reference to the same inode.


right, that's fine.  it's not the symlink that we care about.  we only need to know the inode of what the symlink points to, because we want to check our list to see if we already have that inode on our stack.
Comment 10 esminis 2010-08-21 13:17:41 UTC
I think this is not netbeans.php problem it is whole netbeans problem, and another bug is created for that.

*** This bug has been marked as a duplicate of bug 178180 ***
Comment 11 Alexander Simon 2010-08-24 09:06:49 UTC
CND part was fixed (see BZ#178180).
Indexer has a same problem:
- infinite recursion in:
org.netbeans.modules.parsing.impl.indexing.FileObjectCrawler.collect(FileObjectCrawler.java:179)
Issue does not duplicate 178180.
Comment 12 Filip Zamboj 2010-09-15 12:55:04 UTC
no change for a long time, looks like fixed
Comment 13 Petr Pisl 2011-02-01 15:36:53 UTC
The issue is still there, but for reasons that are described by Tomas and Jan cannot be fixed now.
Comment 14 Petr Pisl 2011-02-01 15:48:03 UTC
The issue is still there. The similar problem has Aikar that is described in the issue #178180. I'm reassigning this issue to the parsing api, because it should be handled there.
Comment 15 Marian Mirilovic 2011-04-07 12:34:42 UTC
based on the feedback from issue 178180, we should solve this problem into Nb 7.0.1 . Adding waiver for 7.0 and setting TM to 7.0.1
Comment 16 Tomas Zezula 2011-04-07 12:37:08 UTC
Unfortunately there is no solution for JDK < 1.7 except of the slow JNA.
Comment 17 Tomas Zezula 2011-05-11 11:28:03 UTC
Fixed jet-main bdd15f76b1a9
Comment 18 Quality Engineering 2011-05-12 04:37:10 UTC
Integrated into 'main-golden', will be available in build *201105120000* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/bdd15f76b1a9
User: Tomas Zezula <tzezula@netbeans.org>
Log: #189004:Infinite Loop in Error Cache while Scanning Linked directories
Comment 19 Tomas Mysik 2011-06-03 10:10:18 UTC
*** Bug 194730 has been marked as a duplicate of this bug. ***
Comment 20 Hickeroar 2011-08-05 13:30:22 UTC
The issue still stands in 7.0.1. I have a directory in my project which contains a symlink to the directory itself. If i include it in my project it hangs. If I don't include it, it works fine. Other than this symlink, there are just some JS files in the directory.
Comment 21 Tomas Zezula 2011-08-05 13:38:40 UTC
Works for me. If the getCanonicalFile() resolves correctly links on your system.
Please attach a reproduceable case.
Comment 22 Hickeroar 2011-08-05 13:43:50 UTC
This is frustrating. When I add the folder in question to my project it hangs. When I create a project around that folder by itself, it doesn't hang at all.

Any idea why this might occur?
Comment 23 Tomas Zezula 2011-08-05 13:49:40 UTC
Please attach at lest the structure of project like:

Project-
      -src
         -folder
         -folder2
         -link -> src

Otherwise I cannot help.
Thanks
Comment 24 Hickeroar 2011-08-05 14:05:05 UTC
Created attachment 109818 [details]
Simple folder structure

There are many duplicates of the "post" folders (each representing a site), which all contain the same img symlink along with some other standard folders for css an js.
Comment 25 Hickeroar 2011-08-05 14:05:25 UTC
I've been unable to reproduce the problem on a small scale, but when it's added to my whole project it eats up a whole core of my CPU and the RAM usage steadily climbs to a couple GB until the system is unresponsive.

Comments added to the attachment with specifics about the folder structure.
Comment 26 Tomas Zezula 2011-08-05 14:46:36 UTC
The behavior does not depend on the size of project.
One more thing you can help. Can you attach the  self profiler snapshot it will show where the IDE is cycling. I will look at the attached structure and try to reproduce it with it.
Thanks
Comment 27 Hickeroar 2011-08-05 15:01:20 UTC
I emailed the profile to you. I didn't want to post it up here publicly.

Thanks.