This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 84497 - Search for mail archive not working at all
Summary: Search for mail archive not working at all
Status: RESOLVED INVALID
Alias: None
Product: obsolete
Classification: Unclassified
Component: collabnet (show other bugs)
Version: 5.x
Hardware: All All
: P1 blocker (vote)
Assignee: support
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-08 06:49 UTC by Masaki Katakai
Modified: 2009-11-08 02:36 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Masaki Katakai 2006-09-08 06:49:26 UTC
It seems that searching for mail archiving does not work at all. There is no
result returned with any keyword.

Here is an example,

http://translatedfiles.netbeans.org/servlets/SearchList?list=dev&searchText=katakai&defaultField=author&Search=Search
Comment 1 artibee 2006-09-08 18:16:29 UTC
It is as if there are no mail archives to be searched.

Here's another random example:

http://www.netbeans.org/servlets/SearchList?list=broken_builds&searchText=build&defaultField=subject&Search=Search

nothing found in the broken_builds with "build" in Subject -- ha!
Here's one of the emails that has "build" in its Subject:

http://www.netbeans.org/servlets/ReadMsg?list=broken_builds&msgNo=4451
Comment 2 jcatchpoole 2006-09-14 08:55:30 UTC
P2 defect with no response from support in almost 1 week.  Pls respond ASAP. 
Mailing list search is a critical piece of site functionality.
Comment 3 Unknown 2006-09-14 16:56:56 UTC
Hi,

Let me look at this asap and will update you my findings.

Thanks,
Kavitha
Support Operations
Comment 4 Unknown 2006-09-14 17:25:28 UTC
Hi,

This seems to be a site wide, taking this issue as high priority. Will update
you soon on any progress made for this.

Thanks,
Kavitha
Support Operations
Comment 5 Unknown 2006-09-15 06:05:31 UTC
changin the status. 
Comment 6 Unknown 2006-09-15 12:33:48 UTC
Jack,

This is again to do with robots exclusion option. Will update you again with 
more info after i get the consolidated info. 

-Priya
Comment 7 Unknown 2006-09-15 23:00:16 UTC
Hi Jack,

To fix the few of the earlier requests i.e.refer issue 22183, we have enabled
"Enable robots exclusion" which means on netbeans the indexer will not search
/servlets (because it is disallowed in their robots.txt). We enable it for the
robots to be recognized. 

In netbeans,the host level attribute "Enable robots exclusion" wasn't enabled
but they were trying to exclude some of the URLs from indexing. After some point
we enabled that, which fixed the actual issues but the other projects were
running problems. 

Hence we would require to have the project list which is really need to be
excluded from indexing. 

Thanks,
Kavitha
Support Operations
Comment 8 Unknown 2006-09-19 10:42:06 UTC
changing the status, waiting for the response. 
Comment 9 Jan Pirek 2006-09-20 12:42:40 UTC
As I know, testwww project should be excluded for sure, there are probably
others but I know 100% about only this one... Jack, which others?

jan
Comment 10 jcatchpoole 2006-09-25 10:06:29 UTC
I guess the question is : "on which nb.org projects can we allow servlets/ to be
indexed by robots".  Is that correct ?  ie we have mailing lists on www, so we
need to update our robots.txt file to say "robots can index
www.netbeans.org/servlets/".  Yes ?

Indexing of servlets/ was disallowed to stop robots bringing nb.org down.  If we
re-allow indexing on some project sites, aren't we asking for trouble ?  Are
there no other options ?  Eg have the SourceCast mailing list indexer ignore the
robots.txt file ?

Pls advise.
Comment 11 Unknown 2006-09-25 13:56:48 UTC
Jack,

I will discuss about this again with the engineer and will update you ASAP.

Thanks,
Priya
Comment 12 Unknown 2006-09-26 12:20:57 UTC
I think the confusion here is between the internal CEE indexer(internal robot) 
and external spider/crawlers(external robots)
Internal robot is an CEE indexer which helps us to do the site wide search. 
Prior to Danube release our internal robots doesn't respect robots.txt and now 
we introduced an option in danube "Enable robots exclusion" is to force the 
CEE indexer to respect the robots.txt. 

I guess the question is : "on which nb.org projects can we allow servlets/ to 
be indexed by robots".  Is that correct ?

>>>> No. The question is " On nb.org what are all the projects can we disallow 
**internal** robot/indexer from indexing?" 

ie we have mailing lists on www, so we need to update our robots.txt file to 
say "robots can index www.netbeans.org/servlets/".  Yes ?
>>>> No. Again this is not about external robot. Its about internal CEE 
indexer. 

Indexing of servlets/ was disallowed to stop robots bringing nb.org down.  
>>>>Yes. Its true still. We are not gonna change anything here. 

If we re-allow indexing on some project sites, aren't we asking for trouble ? 
>>>> I think you refer the external robots here. External robots, the 
functrionality remains the same. The question is about allowing the intenal 
robots ie., CEE internal indexer to index the project artifacts. 

I would like to paste a snip here may help you:

<snip>
A robots.txt file is used to tell external spiders and indexers (such as 
Google's Googlebot, or anyone else's) what web pages to not add to their 
index. Since there are some portions of a typical CEE installation we won't 
want indexed by external indexers, there will be a default robots.txt for the 
domain and for each project (achieved through rewrite rules), which customers 
may override if they wish. Customers can also use such an override to tell 
CEE's own indexer to not index certain portions of a project or a domain.

The default robots.txt at the domain level currently excludes all the URL 
patterns matching '/source/', '/search/', '/issues/' and '/servlets/'. But for 
the CEE internal indexer, we won't want to exclude these by default. This can 
be achieved by having a separate record for the CEE internal indexer. The 
following robots.txt file should be used as the default.

User-agent: CEE
Disallow:

User-agent: *
Disallow: /source/
Disallow: /search/
Disallow: /issues/
Disallow: /servlets/

The above robots.txt file can be read as "No robot should index any page in 
the site matching the URL patterns '/source/', '/search/', '/issues/' 
and '/servlets/' except the robot CEE for which there are no restrictions".
</snip>
Comment 13 jcatchpoole 2006-10-02 12:19:20 UTC
*** Issue 86075 has been marked as a duplicate of this issue. ***
Comment 14 jcatchpoole 2006-10-02 14:56:19 UTC
>>>> No. The question is " On nb.org what are all the projects can we disallow 
**internal** robot/indexer from indexing?" 

For mailing lists (and probably other servlets/ content), the internal indexer
should index every project.  Pls enable ASAP, thanks.

HTML content is different, eg we dont' want testwww/ indexed by anything.  But
so far we have no issues with the way html content is indexed, so nothing should
change here.
Comment 15 Unknown 2006-10-03 09:30:10 UTC
Updated the engineers on the same and will make sure all these set in the 
config. 
Comment 16 padmar 2006-10-05 06:56:53 UTC
*** Issue 86512 has been marked as a duplicate of this issue. ***
Comment 17 Unknown 2006-10-06 05:51:46 UTC
ok. Atlast the robots.txt is edited as per the requirement and ran the full 
indexer for all the plugins. Verified all the above complaints about the 
search in mailing list and domain search and now it should work as expected. 
Please verify and let us know the feedback.

Thanks for the patience maintained on this issue. 

-Priya
Comment 18 Unknown 2006-10-17 10:36:32 UTC
I have verified all the above concerns users had on 'search' and now it works 
as expected, so closing. Please feel free to reopen if you happen to find 
anything not works as expected. 

-Priya
Comment 19 padmar 2006-12-06 11:08:48 UTC
Verified the search works now. 
Comment 20 Marian Mirilovic 2009-11-08 02:36:26 UTC
We recently moved out from Collabnet's infrastructure