This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 9724 - search should specifically not index some things
Summary: search should specifically not index some things
Status: RESOLVED INVALID
Alias: None
Product: obsolete
Classification: Unclassified
Component: collabnet (show other bugs)
Version: 3.x
Hardware: PC Windows ME/2000
: P4 blocker (vote)
Assignee: jcatchpoole
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-02-20 18:20 UTC by jcatchpoole
Modified: 2009-11-08 02:27 UTC (History)
1 user (show)

See Also:
Issue Type: ENHANCEMENT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jcatchpoole 2001-02-20 18:20:34 UTC
This bug was part of #9163, which I've broken up into several 
seperate bugs in order to track better.
--------------------------------------------------------------
Must specifically *not* index certain things, like
- cvs mail archives (actually this seems to be alread done ?);
- collabnet/2 test projects (I get to
  http://www.netbeans.org/project/collabnet/index.html via a search);
- nbmoduleadmin mail archives (?? are these public ?  Should these be
  browsable?)
- www-request_cert and www-request_passwd - you can't actually see
  the msgs if you browse to them, which is good, but they shouldn't
  show up in the search results at all;
- changelogs ??
Comment 1 Unknown 2001-03-17 02:11:42 UTC
This is an enhancement request to our search functionality (which we are 
reifying right now).  Assigning to support so that they can put this in PCN for 
tracking for search enhancements for SC.
Comment 2 Unknown 2001-03-24 00:39:35 UTC
Requested status from stack@collab.net on pcn #3291.
Comment 3 Unknown 2001-03-27 02:17:33 UTC
Heard from ms:
<keric> ok.  so www and mail are going  to be the two options available to nb 
users when we move.
<ms> yes, but seperately
<keric> ok.  Thanks.

So there are going to be two independent search facilities that don't overlap -- 
one for mail archives (eyebrowse) and one for HTML.

Keri
Comment 4 jcatchpoole 2001-03-27 09:56:18 UTC
** Note these last 2 entries are pertinent to 9721 rather
than this issue **

Sounds good - will "eyebrowse" search allow you to select 
*which* mailing list archives you search, as described in
9721 ?  Eg I want to search nbdev and dev@openide, but not 
nbusers ?
Comment 5 Unknown 2001-03-27 18:52:29 UTC

*** This issue has been marked as a duplicate of 9721 ***
Comment 6 jcatchpoole 2001-05-29 12:15:40 UTC
Trying to clean up and clarify :

As noted in #9721, this bug is *not* a duplicate of 9721.
Even if this issue is resolved, I would like some info as
to exactly *what* the various searches indes.

For example, in 9721 Keri notes that there will be a seperate
HTML and mail-archive search.  What exactly is indexed for 
the HTML search ?  Can we exclude certain directories like 

- /source/browse/
- testwww/
- there were a couple of Collab test projects at one stage
- ... ?

Is this configurable by us ?  If I wanted to stop a certain
directory/project/xxx from being indexed, how would I do it ?
Comment 7 Taska 2001-05-29 22:32:05 UTC
Collabnet internal issue SC52.
Comment 8 Taska 2001-06-05 23:22:05 UTC
Setting internal Target Milestone of 1.1.
Comment 9 Taska 2001-06-20 23:00:28 UTC
This has been moved back to 2.0 upgrade.  (May change back to 1.1- if so, we
will reopen.)
Comment 10 Taska 2001-10-05 00:54:22 UTC
Reopening and marking P5.
Comment 11 Taska 2001-10-09 23:15:16 UTC
Accepting issue.
Comment 12 Taska 2001-11-08 20:34:49 UTC
The internal issue (PCN3291) has been targeted for SC1.3.
Comment 13 jcatchpoole 2002-03-26 11:23:31 UTC
Also wondering why P5 here ...
Comment 14 Unknown 2002-08-20 22:07:20 UTC
This enhancement is targeted for the Danube release of 
SourceCast. During the upgrade to that release we can 
confirm this issue or reopen if necessary.
Comment 15 Unknown 2005-03-18 11:29:07 UTC
Verified in 2.6.
Comment 16 jcatchpoole 2006-03-23 10:39:08 UTC
Sounds like this is in place, but I don't know how to use it.  Quoting my
comment of May 29 11:15:40 +0000 2001 :

> Is this configurable by us ?  If I wanted to stop a certain
> directory/project/xxx from being indexed, how would I do it ?

Comment 17 Unknown 2006-12-19 05:46:58 UTC
started...
Comment 18 Unknown 2006-12-19 05:54:06 UTC
- cvs mail archives (actually this seems to be already done ?):

Yes, the corresponding host admin option has been disabled already. 

- collabnet/2 test projects:	

Note: There is only one test project exist now. 

To avoid any project from being indexed, adding "/servlets/ProjectHome" in the 
disallow entry should fix. 
  
  User-agent: CEE
  Disallow: /servlets/ProjectHome

To avoid the whole project to be indexed

  User-agent: CEE
  Disallow: /

- nbmoduleadmin mail archives

	User-agent: CEE
	Disallow: /servlets/ReadMsg?list=nbmoduleadmin
	Disallow: /servlets/SummarizeList?listName=nbmoduleadmin

- www-request_cert and www-request_passwd:
    
Again the same setting as above in the robots should help here. 

- To avoid any directories like /source etc...to be indexed,

	User-agent: CEE
	Disallow: /source/

Note: The changes in robots.txt will be applicable only for the new data, for 
the already indexed data, a full indexrebuild is requried.
Comment 19 Unknown 2007-01-16 07:58:22 UTC
Jack, this was waiting for your review, however as per your email closing this.

***************
2) SC should *not* index some lists :

   http://www.netbeans.org/issues/show_bug.cgi?id=9724

This is still open though I think should be closed/fixed - it looks like we 
can now do this with edits to robots.txt, cool.
Comment 20 jcatchpoole 2007-01-16 16:21:27 UTC
Reopening.  The fix isn't in place yet, until it is the issue should stay open
so it doesn't get lost.
Comment 21 jcatchpoole 2007-02-14 16:08:11 UTC
Added the following to www/www/robots.txt

# Don't want SC indexer to return results from rollup cvs and issues lists
# See http://www.netbeans.org/issues/show_bug.cgi?id=9724
User-agent: CEE
Disallow: /servlets/ReadMsg?list=nbcvs
Disallow: /servlets/SummarizeList?listName=nbcvs
Disallow: /servlets/ReadMsg?list=nbbugs
Disallow: /servlets/SummarizeList?listName=nbbugs

In fact I don't find any hits from those lsits anyway so not sure this is really
needed.
Comment 22 jcatchpoole 2007-02-14 16:31:56 UTC
In fact if the archives are deleted (issue 36647) they wont show up in search
results anyway.  But using robots.txt this way might be useful for other lists
we don't want to index.
Comment 23 Marian Mirilovic 2009-11-08 02:27:21 UTC
We recently moved out from Collabnet's infrastructure