This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | FileUtil::normalizeFileOnWindows() is slow due to IO especially on network drives | ||
---|---|---|---|
Product: | platform | Reporter: | victork |
Component: | Filesystems | Assignee: | Jaroslav Havlin <jhavlin> |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 8.0 | ||
Hardware: | PC | ||
OS: | Windows 7 | ||
Issue Type: | ENHANCEMENT | Exception Reporter: | |
Attachments: |
normalizeFileOnWindows update to not use IO to the filesystem
normalizeFileOnWindows update to not use IO to the filesystem. |
Description
victork
2013-12-10 12:28:43 UTC
Created attachment 143037 [details]
normalizeFileOnWindows update to not use IO to the filesystem.
Changes since previous patch:
Added wildcards checking in the path(From Java's internal C routine)
Added invalid dots checking in the path elements(From Java's internal C routine)
Replaced C-style "for" with short iterating "for" and update it's body accordingly.
Updated to use static methods of Character class without creating new object.
getNormalizedPathOnWindows() now throws exception on failure of the above checks which is catched by normalizeFileOnWindows().
It would be great if we could prevent any I/O in normalizeFile, but unfortunately we need it. On Windows, files can still have 8.3 names, e.g. C:\PROGRA~1, which is just another name for C:\Program Files. The 8.3 name is stored in the filesystem, as well as the long name. So, if we get the short (8.3) name, we need to call File.getCanonicalPath() to retrieve the long name. If we get two paths for the same file, we should know (for several reasons, e.g. caching, filesystem operations) that we are working with a single file. That's why we should always work with canonical (unique) paths. If two files have different canonical name, they are surely different (not speaking about symbolic links now). That's also why we need to know the correct case of file names. "C:\file.txt", "c:\FILE.TXT" and "c:\File.txt" are different paths for the same file. But what of these paths should be used by NetBeans as the unique path for this file? We could theoretically use the lower-case or the upper-case variant, but it's much better to use the name that was specified by the user (or operating system). This name is stored in the filesystem, so we need to call File.getCanonicalPath(), which performs some I/O to read the correct name. Having some heuristics for avoiding disk access would be nice, but it is quite complicated. Mainly because of the fact that case of file names is quite important on Windows, although the file system is case insensitive. Thank you very much for your great effort to fix this issue, but I cannot integrate this patch. It can work fine if you work with canonical paths only, but it cannot be guaranteed, so there is high risk of severe problems. I'm changing the issue type to Enhancement. I knew it can't get in(As i've wrote - i predicted it then i wrote it :)) - can be still useful for others. Why we get canonical every time instead of converting to canonical then we store data about file(Open,Drag&Drop,etc). If user renamed a file to same with other case outside of the IDE then invalidate caches for old name and reindex. "We could theoretically use the lower-case or the upper-case variant, but it's much better to use the name that was specified by the user (or operating system)" Operating system shall provide canonical name already, then user provides good name then it's ok as well - problems arise then user provides bad name -> And here then user provides a name(Open dialog,etc) it should be canonicalized so core will always work with already canonicalized paths and will not need to recanonicalize stored invalid custom path with dots,diff case,etc over and over again or you say the system became so comlicated so there is no way to find all external "path input" spots in the code or the fear is about problems with indexes of projects relaying on bad absolute addresses? Or am I missing something? Solution should exist - it's just all about architecting/planning it cooperatively with devs managing simultaneous parts of the code(Possibly can take a lot of time up to coming to conclusion that writing new IDE from scratch is better). Btw. i've heard Microsoft plans to remove legacy 8.3 names support in upcoming releases of Windows ;) and there is special registry key which can enable case sensitivity on Windows FS and is quite dangerous and ofc by default off(exists for years) :) (In reply to victork from comment #3) > I knew it can't get in(As i've wrote - i predicted it then i wrote it :)) - > can be still useful for others. I agree. Thus, the bug is still open for experimenting and discussion. > Why we get canonical every time instead of converting to canonical then we > store data about file (Open,Drag&Drop,etc). It should not be used every time, but whenever a path is passed to the IDE from outside - a settings file, file chooser (it's not guaranteed that file choosers return canonical names), drag&drop, command-line argument, etc. (We store data about the file even if we see the file, because we need e.g. detect MIME-Type, and some info is cached.) If you find a place from which normalizeFile() is called for a file that is surely canonical, please let me know. > If user renamed a file to same with other case outside of the IDE then > invalidate caches for old name and reindex. It works this way currently. > Operating system shall provide canonical name already, then user provides > good name then it's ok as well - problems arise then user provides bad name > -> And here then user provides a name (Open dialog,etc) it should be > canonicalized so core will always work with already canonicalized paths and > will not need to recanonicalize stored invalid custom path with dots,diff > case,etc over and over again I absolutely agree, but it should work this way already. > Or am I missing something? Surely not. See javadoc for FileUtil.html#normalizeFile(java.io.File), it contains the same recommendations as you are saying. > Solution should exist - it's just all about architecting/planning it > cooperatively with devs managing simultaneous parts of the code(Possibly can > take a lot of time up to coming to conclusion that writing new IDE from > scratch is better). The IDE is already designed this way, the normalizedFile() should be called only if a path is passed from outside of the IDE. Of course, there can be bugs that violates the design, which should be detected and fixed.(The IDE also try to make the normalization faster by using a cache for normalized paths.) > Btw. i've heard Microsoft plans to remove legacy 8.3 names support in > upcoming releases of Windows ;) and there is special registry key which can > enable case sensitivity on Windows FS and is quite dangerous and ofc by > default off(exists for years) :) Thank you for the info! :-) Thanks for the descriptive answer :)
""""
> Why we get canonical every time instead of converting to canonical then we
> store data about file (Open,Drag&Drop,etc).
It should not be used every time, but whenever a path is passed to the IDE from outside - a settings file, file chooser (it's not guaranteed that file choosers return canonical names), drag&drop, command-line argument, etc.
(We store data about the file even if we see the file, because we need e.g. detect MIME-Type, and some info is cached.)
If you find a place from which normalizeFile() is called for a file that is surely canonical, please let me know.
""""
In this specific case i'm not sure if its canonical. Need to open whole Netbeans tree and precisely check all flows/xrefs to be sure, and even if its ok no guarantee can be made that someone will not add "unsanitized" path in the future(Perfectly should be denied by code review process but not every patch gets enough reviews :().
There may be uses of normalize that came into some projects via good old "Copy/Paste" of existing code chunk handling the files then normalization was not actually needed there, because the path is already in its normal state.
|