This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 8150 - www-nbdev/ getting very big
Summary: www-nbdev/ getting very big
Status: RESOLVED INVALID
Alias: None
Product: obsolete
Classification: Unclassified
Component: collabnet (show other bugs)
Version: 3.x
Hardware: Other Other
: P2 critical (vote)
Assignee: issues@www
URL:
Keywords:
Depends on:
Blocks: 8094 8592
  Show dependency tree
 
Reported: 2000-11-02 16:49 UTC by jcatchpoole
Modified: 2009-11-08 02:26 UTC (History)
0 users

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jcatchpoole 2000-11-02 16:49:04 UTC
The mailing list archives are currently indexed on a single
page.  This means that http://www.netbeans.org/www-nbdev/ for
example currently spans June 5th -> Nov 2nd, and the index
page comes to around 360K.  nbdev is probably the worst atm,
but the others are of course growing also.

It would probably be good if this index was split up by week
or month, and some kind of navigation mechanism implemented to
move around between each month's index.
Comment 1 Unknown 2000-11-02 19:09:59 UTC
I agree.  This will be an enhancement, but I think a good one to put on the
roadmap for the product.  I can't say it will be resolved soon, but I will put
in a feature request for sourcecast due out 1/5/01.
http://projects.collab.net/issues/show_bug.cgi?id=785
Comment 2 jcatchpoole 2000-11-22 16:50:59 UTC
Extra comments from Evan -

Assuming the archive was broken into some kind of chronological
chunks, the older the chunk, the longer period it could cover.
ie nbdev archives today should probably be split by day, definitely by
week.  However 6mths+ ago a single page for the whole month is
probably better.
Comment 3 Unknown 2000-11-23 00:36:59 UTC
duplicate of 8531.

*** This bug has been marked as a duplicate of 8531 ***
Comment 4 Unknown 2000-11-28 02:26:59 UTC
Reopened, closing 8531.  I had closed this one and left the other one open
because 8531 was more general.  This is not just an issue with nbdev, but as was
pointed out -- this bug contains more information.
Comment 5 Unknown 2000-11-28 02:27:59 UTC
*** Bug 8531 has been marked as a duplicate of this bug. ***
Comment 6 Unknown 2000-12-02 02:23:59 UTC
This is the issue I'm really pushing on right now.  I'm marking it P2 so
visually it gets bumped up in the hierarchy.
Comment 7 jcatchpoole 2000-12-05 12:35:59 UTC
Bill Shannon has pointed out -

The IMAP mailing list also uses Mhonarc, v2.4.6 according to the
page footer : http://www.washington.edu/imap/listarch/current/

Their archives are broken down by year, and then appoximately by
month, though not exactly, there are 18pages for 1999, and 20 so far
for 2000.  Anyway, it is broken down into quite small chunks.

As I understand it we are also using Mhonarc.  If breaking up the
index page of the mailing list archives is simply a configuration
option, then lets pursue and do this now!  I had been thinking this
was a limitation of Mhonarc, and changing it would involve hacking
source and considerable work.  Tweaking config however should be
much easier.

A side note - it would be cool if this Mhonarc config was also in
our control, maybe a .cfg type file under the look/ project.
Comment 8 jcatchpoole 2000-12-05 13:37:59 UTC
PS - the IMAP site also seems to use a custom mail search utility,
ie specificly for searching mail archives, and not www.  I say this
as the results returned after a search are pretty customised, showing
subject and sender as well as a one line extract of the msg.

Unless there exists a search tool that is clever enough to search
and index mail archives and www pages differently, I guess that would
require 2 search tools on the site.  This doesn't sound great, but
the current format of search results - "msg02010.html" and a URL,
no further info - leaves a lot to be desired.
Comment 9 Jesse Glick 2001-01-29 15:08:59 UTC
This is getting to be more and more of a problem BTW. Loading this page takes a
minute at least, plus another minute of Netscape freezing while it computes the
table layout...
Comment 10 Unknown 2001-01-30 10:20:59 UTC
I'm going to divide up the archives.  It's not just a configuration option, but
it can be scripted.  Do you have any preference as to how you'd like the URLs to
look?  Here are some possibilities:

http://www.openoffice.org/www-dev/YYYY/MMM/
http://www.openoffice.org/www-dev/YYYY-MM/
http://www.openoffice.org/www-dev/{count}/

count would simply increase by one for every N messages (or approx N messages).

I think there would be no link from one of these groupings to another,
unfortunately ... but at least the archives would be more usable.  (I have tried
to view them, over a fast connection; it's possible for me, but painful.)

http://www.openoffice.org/www-dev/current/ will always work for the current
grouping (whatever that may be).

I'm expecting that http://www.openoffice.org/www-dev/ would be an index which
would have an URL for each subgrouping.

Doing this transformation will break the archive for a limited period of time --
I'm not sure how long, though I can determine the precise amount.  It will also
involve delaying message delivery until it's complete (so as to avoid having new
messages including in the wrong grouping).

Besides nbdev -- are there any other lists for which lists you'd like this done?

So far as the issues w/ searching -- that's something we're generally addressing
in sourcecast, though there are outstanding issues in that right now.  In the
short term, I don't think it's going to be possible to have a better search
interface until we upgrade netbeans.org to sourcecast.

Without significant additional work, it won't be possible to use different
groupings for older messages, compared with new ones.  (Jack suggested weeks for
recent messages, and months for messages > 6months old in one of his comments.)
I'm not planning on doing such a thing right now.
Comment 11 jcatchpoole 2001-01-30 16:07:59 UTC
I am surprised this is so difficult.  Doesn't Mhonarc have all this
built in ?  There is even a free mail archival site using Mhonarc
which handles this better than we do - http://www.mail-archive.com/
Our archives are embarassing and useless atm.

> I think there would be no link from one of these groupings to another,

The archives need to be navigable.  There should be links between
any seperate pages.

> Besides nbdev -- are there any other lists for which lists you'd like this
done?

This should be set up for all mailing lists that have web archives.  No
list archive is going to shrink - if it doesn't need it now it will next
week/month/xxx

> until we upgrade netbeans.org to sourcecast.

When can we expect this ?

RE Search - http://www.mhonarc.org/ includes a link to a "search
engine for Mhonarc archives" called marc-search.  Sounds ideal - we
could run it as a seperate CGI, and put up a form with a seperate text
box (seperate from the normal search box) for mail archive searches.
Comment 12 Unknown 2001-01-30 22:15:59 UTC
This indeed has been hanging on too long.  Working on getting resolution.
Comment 13 Unknown 2001-01-31 03:50:59 UTC
My experience playing with MHonArc's conf is that it does indeed suck, and this
is not nearly as easy as it should be.  I'm looking into why the index by date
which I'm generating is missing links (see the first link, below)... you can see
the indexes, by dates or by threads:

http://www.netbeans.org/test-www-nbdev/index.html
	http://www.netbeans.org/test-www-nbdev/mail2.html
	...
http://www.netbeans.org/test-www-nbdev/threads.html

Personally, I'd rather use a simpler solution.  I'm willing to spend a little
more time on this, but not much -- I've got other work for netbeans, and
spending time on this takes away from the time I've got to spend on that stuff.

So far as adding another search engine -- one of the basic problems which we
have w/ searching is the size of some of the MHonArc archives.  I'm not
interested in spending time verifying that yet another piece of software is at
least somewhat secure and that it performs at least adequately.  Besides --
we're not going to be using that on other sourcecast installations, so
installing this would make netbeans more custom and less maintainable.
Comment 14 Unknown 2001-01-31 04:43:59 UTC
Ok, I've worked out some more of MHonArc's strangeness.  Please take a look at
this and tell me what you think:

http://www.netbeans.org/test-www-nbdev/

(There's a link to the threaded index embedded.)  I'm not entirely happy with
this implementation -- MHonArc is quite inefficient, and there are problems with
it.  But it should give you an idea of how this would work out.

Minor changes in formatting may well be possible -- certainly they are if you
want to edit the MHonArc conf file and send it back.  MHonArc's conf language
provides a fair bit of flexibility.  OTOH, I see no evidence that it's possible
to do groups of different size based on how old the grouping is.  The grouping
which you see in this is for 150 messages per page; that number is configurable.

Changing this for all the lists can be done, but involves add'l load and delayed
mail during the time when the change is being done.
Comment 15 jcatchpoole 2001-01-31 13:55:59 UTC
Test page looks good, thanks!  Only suggestion is to include the
navigation links at the bottom of the page as well as the top, if
possible.

RE searching - "we're not going to be using that on other sourcecast
installations" - what does this mean ?  Is there similar functionality
already in sourcecast ?  If not, pls let us know, as I would file this
as an enhancement request.
Comment 16 Unknown 2001-02-07 01:42:59 UTC
*** Bug 9457 has been marked as a duplicate of this bug. ***
Comment 17 Unknown 2001-02-07 09:35:59 UTC
I believe this is resolved now: I've changed our configuration so that MHonArc
index pages have at most 150 elements in them.  Lists will be put into that
format as new mail arrives.  I added navigation links at the bottom and made a
few minor changes in the HTML ...

If you notice any problems, please reopen this issue.

I am concerned about MHonArc's performance -- it's O(n*n) w/ the number of
messages in each archive, which is only going to get worse.  It might be worth
considering splitting nbdev and nbusers into separate archives for 2000 and 2001
... this seems to be the standard practice w/ MHonArc, and I have noticed
slowdowns which seem related to MHonArc.  I haven't made any changes related to
this, though, since it'd also require changing links to the archives.

So far as search goes -- in sourcecast, we will have search for mailing lists,
which presents more meaningful results pages.  But currently it searches mailing
lists separately from other documents -- and it only searches one list at a
time.  I intend to make sure it's possible to search more than one list at a
time ... but it'd be worthwhile to file an enhancement request outlining the
features which you consider useful, important, or necessary.
Comment 18 Jesse Glick 2001-02-07 11:27:59 UTC
Note: the thread index seems to have different navigation from the date index,
specifically no indication of which # page you are on, nor the <<<< and >>>>
links.
Comment 19 jcatchpoole 2001-02-07 14:48:59 UTC
Looks good, many thanks.

Searching mailing lists seperately from other docs sounds perfect
to me, this is what I have requested in the past.  Only searching
one list at a time is not so great - ideally the search page would
have a series of checkboxes for which lists and/or other doc sets
(like the website, or the FAQs) to search.  Please see
http://www.netbeans.org/bugs-cgi/show_bug.cgi?id=9163 for my current
list of search requests/bugs - this covers both site searches and
mail archive searches.
Comment 20 Marian Mirilovic 2009-11-08 02:26:56 UTC
We recently moved out from Collabnet's infrastructure