This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 190675 - Incremental update of local repo index
Summary: Incremental update of local repo index
Status: NEW
Alias: None
Product: projects
Classification: Unclassified
Component: Maven (show other bugs)
Version: 7.0
Hardware: All All
: P1 normal with 1 vote (vote)
Assignee: Tomas Stupka
URL:
Keywords:
Depends on: 197965 197966
Blocks:
  Show dependency tree
 
Reported: 2010-09-29 15:15 UTC by Jesse Glick
Modified: 2014-05-12 13:09 UTC (History)
6 users (show)

See Also:
Issue Type: ENHANCEMENT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Glick 2010-09-29 15:15:26 UTC
Currently it seems that the local repository is indexed only in batch, upon request or on whatever schedule is configured (say, weekly). This is not great because things you did in the past six days are not reflected: in pom.xml completion, repository browser, local archetype list, ...

We should at least be updating the index incrementally to reflect newly added entries due to:

1. "Downloaded..." messages from Maven builds.

2. "Installed..." messages from Maven builds.

3. Artifacts downloaded using the embedder.

You would still be missing changes done from the command line, but that seems acceptable.

There is at least one piece of code in the IDE which *removes* artifacts from the local repository - the Remove button in the archetype browser. This would of course also need to update the index.

Need to figure out how to get Nexus and Lucene to make incremental changes.
Comment 1 Jesse Glick 2010-09-29 16:46:05 UTC
In fact it is partially implemented already: RepositoryIndexer.updateIndexWithArtifacts. Need to call this more often and more reliably.
Comment 2 athompson 2010-10-01 19:55:09 UTC
I don't really know how this stuff works, so if this is an incredibly stupid idea please say so, but the IDE seems to already be able to detect what files/folders have changed when the IDE regains focus.  Can that code be used to detect changes in the local repository to incrementally update if the user uses the command line as well?
Comment 3 Jesse Glick 2010-10-01 20:17:38 UTC
(In reply to comment #2)
> if this is an incredibly stupid idea please say so

It's actually a good question. I did think of simply attaching a listener to ~/.m2/repository/ and updating the index whenever something changes.

(There are other things that this could be good for as well: if a project is marked as being in error because of a missing dependency, you might expect the badge to be cleared the moment that dep appears in the local repo by any means. Currently this works if the dep is from a SNAPSHOT sibling project you have opened at least once, since the IDE internally substitutes ../sib/pom.xml for ~/.m2/repository/grp/sib/1.0-SNAPSHOT/sib-1.0-SNAPSHOT.pom; or after selecting "Download Dependencies", since this triggers a refresh when it is done; or after running a build on a project. But for other cases, such as when building one project downloaded artifacts also needed by a dozen other projects, you have to "Reload Project".)

The question is whether such a listener can be efficient enough to be practical even if you have an enormous local repo. (Mine currently has 7419 folders and 17649 files.) Certainly using the polling implementation in NB 6.9- this is out of the question: the IDE would be stat'ing the whole repo on every focus change.

6.10 uses native file change listeners on common platforms, which makes it more plausible. Unfortunately the native implementations vary in scalability: some systems let you add one listener to a whole directory tree; some require one listener per subfolder (and on Linux there is a limit on the number of such listeners which in common distributions is set rather low); some even require one listener per file or subfolder, and can consume an open file handle.

In the case of indexed source roots, e.g. src/main/java, we consider the immediate refresh from external processes like 'svn up' important enough that we just try to make the listener performance reasonable even for big projects. Petr would be able to comment on whether this is a good idea for the local Maven repo, or whether it is better to rely on a limited set of triggers such as Maven messages in the Output Window.
Comment 4 Petr Nejedly 2010-10-26 09:20:46 UTC
As to the resource usage and limits:
Windows and MacOS implementations do use recursive listening on the FS root(s),
so their OS resource usage is minimal and 7419 folders cause no harm to them (except that the IDE would need to process the events and also have corresponding FileObjects in memory, to have something to fire the events off).

Linux implementation needs an OS-level record per watched folder and the default (OS-wide) limit is 8192 entries. It can be raised by the admin (root), but you don't want to force _every_ user go fishing in /etc/sysctl.d, do you?
(I believe it is acceptable to tell those _few_ users with enormous source bases to tweak their system, though).
Comment 5 Jesse Glick 2010-10-26 16:40:50 UTC
(In reply to comment #4)
> have corresponding FileObjects in memory, to have something to fire the events off

This is a real problem, I think. I don't *want* to have all these FileObject's in memory; I just want to be told when a File under some root changes. The Filesystems API is a poor match for the problem here, unless masterfs can create the necessary FileObject's for the FileEvent on demand. (Ideally FileEvent would be enhanced to provide a URL of the changed file, so you could check this and avoid even asking for a FileObject.)

> you don't want to force _every_ user go fishing in /etc/sysctl.d

See bug #185135.
Comment 6 Xypron 2011-04-13 22:34:12 UTC
>> Linux implementation needs an OS-level record per watched folder

This seems to be as description of "dnotify". 

According to the documentation of "inotify" only one file handle is
needed for all directories watched.

See:
http://linux.die.net/man/7/inotify
http://www.kernel.org/pub/linux/kernel/people/rml/inotify/README
Comment 7 Jesse Glick 2011-04-14 20:53:29 UTC
(In reply to comment #6)
> According to the documentation of "inotify" only one file handle is
> needed for all directories watched.

Yes, one _file handle_, but one watch record per dir - limited by default to 8192 unless you set fs.inotify.max_user_watches to something more reasonable (I use 99999). My current ~/.m2/repository requires >1700 watches, and that is after having recently deleted it so as to be recreated from Nexus cache, so hitting that ceiling would not take long.

It looks like masterfs could actually catch ENOSPC and create an extra file handle; on my Ubuntu fs.inotify.max_user_instances is set to 128, so assuming not many other apps are using inotify (*), you could in principle listen to up to a million dirs before hitting EMFILE on inotify_init1. This would make LinuxNotifier more complex since it could not use a simple call to read(...) on the handle; it would need to select(...).

Anyway the problem would remain that FileUtil.addRecursiveListener will create a FileObject in heap for each file or folder in the repo, which would add considerable overhead. (True on all platforms, not just Linux.) This is just a basic design flaw in NB's API, or more precisely the implementation of its API.

Demo (also useful to insert some logging into FileObj.<init> and LinuxNotifier.addWatch):

package m;
import java.io.File;
import org.openide.filesystems.FileChangeAdapter;
import org.openide.filesystems.FileChangeListener;
import org.openide.filesystems.FileEvent;
import org.openide.filesystems.FileUtil;
import org.openide.modules.ModuleInstall;
public class Installer extends ModuleInstall {
    private static final FileChangeListener l = new FileChangeAdapter() {
        private void event(FileEvent fe, String m) {
            System.err.println(m + ": " + FileUtil.getFileDisplayName(fe.getFile()));
        }
        @Override
        public void fileChanged(FileEvent fe) {
            event(fe, "changed");
        }
        @Override
        public void fileDataCreated(FileEvent fe) {
            event(fe, "dataCreated");
        }
        @Override
        public void fileFolderCreated(FileEvent fe) {
            event(fe, "folderCreated");
        }
        @Override
        public void fileDeleted(FileEvent fe) {
            event(fe, "deleted");
        }
    };
    @Override
    public void restored() {
        FileUtil.addRecursiveListener(l, new File(System.getProperty("user.home"), ".m2/repository"));
        System.err.println("listening...");
    }
}

(*) I know Beagle has the same problem. Some interesting discussion here: https://bugs.launchpad.net/ubuntu/+source/beagle/+bug/455884
Comment 8 Jaroslav Tulach 2011-04-15 06:17:08 UTC
Report enhancement for the improved usage of LinuxNotifier, it is interesting to hear that there is way to overcome the 8192 barrier. I'll see what I'll be able to do.