Notices


Reply
Thread Tools
Posts: 58 | Thanked: 42 times | Joined on Jan 2010
#1
Hi!

Version 3 of Evopedia, the offline Wikipedia reader for (not only) maemo has now been available in Extras-testing for quite a while. Unfortunately there is no recent dump for the English edition of Wikipedia. Evopedia uses a compressed database of pre-rendered article pages (called dump) to deliver articles even faster than when reading them online. The drawback of this strategy is the amount of time needed to pre-render every single article when creating such a dump. This is the reason why there is no current English dump yet.

To remedy this situation, dump at home was created: A system for distributed rendering of Wikipedia pages. If you have a Linux computer with some spare CPU cycles and want to have a recent English (or any other edition, but at the moment, English has priority) Wikipedia dump, please consider joining the project. More information is available on the dump at home project site: http://dumpathome.evopedia.info/contribute
Note that the platform is still in some kind of beta state, so please forgive me if there are still some bugs (and please also report them).

Thanks for your time and interest in the project.

Last edited by crei; 2010-03-10 at 19:36.
 

The Following 3 Users Say Thank You to crei For This Useful Post:
Posts: 355 | Thanked: 566 times | Joined on Nov 2009 @ Redstone Canyon, Colorado
#2
Cool! Here's a quick and dirty script that works with Ubuntu Karmic (and likely all Debian):

http://gitorious.org/freemoe/freemoe...-evopedia-node

I just started it on an Amazon EC2 node. It appears to be working. When it starts generating results I'll launch more nodes.
 

The Following User Says Thank You to jebba For This Useful Post:
Posts: 278 | Thanked: 303 times | Joined on Feb 2010 @ Norwich, UK
#3
Am I right in thinking this isn't terribly stable at the moment? I left this running on two VM's overnight and each processed a few chunks fine, then both downloaded new copies of the 11.3GB dump and since this time theyre throwing out various mediawiki/database errors every time they get given a new job to process.
 

The Following User Says Thank You to nidO For This Useful Post:
Posts: 58 | Thanked: 42 times | Joined on Jan 2010
#4
Originally Posted by nidO View Post
Am I right in thinking this isn't terribly stable at the moment? I left this running on two VM's overnight and each processed a few chunks fine, then both downloaded new copies of the 11.3GB dump and since this time theyre throwing out various mediawiki/database errors every time they get given a new job to process.
Thank you for setting up the VMs. I'm seeing that your clients upload zero-length archives and I also got your error logs. With the next automatic update, the issue could be fixed, if only the user database is damaged. If important databases (i.e. databases with content) are damaged, you should remove the files "wikilang", "wikidate" and "commonsdate" in the state subdirectory to force the client to grab new database files.
If you want to check if everything is ok, you can point your apache to the mediawiki directory and open it in the browser.
 
Posts: 278 | Thanked: 303 times | Joined on Feb 2010 @ Norwich, UK
#5
Thanks for the info - It looks like my copies are broken, both clients did update a short while ago but after doing so i'm still getting one of two faults, exactly which one appears varies each time I kick off the client:
inforrect information errors for ./wikidb/user..frm (the user table seems to be empty, even after the client update)
Spurious "MediaWiki internal error. Exception caught inside exception handler" faults after the client is assigned a job to do and the static HTML dump folder is created.

So, i'm now redownloading both the commons and en data from scratch, and will see how it gets on.
 
Posts: 355 | Thanked: 566 times | Joined on Nov 2009 @ Redstone Canyon, Colorado
#6
Originally Posted by nidO View Post
Thanks for the info - It looks like my copies are broken, both clients did update a short while ago but after doing so i'm still getting one of two faults, exactly which one appears varies each time I kick off the client:
inforrect information errors for ./wikidb/user..frm (the user table seems to be empty, even after the client update)
Spurious "MediaWiki internal error. Exception caught inside exception handler" faults after the client is assigned a job to do and the static HTML dump folder is created.

So, i'm now redownloading both the commons and en data from scratch, and will see how it gets on.
Are you on a 64bit system? You may need to install libc6-i686 (debian) or glibc.i686 (fedora).
 

The Following User Says Thank You to jebba For This Useful Post:
Posts: 278 | Thanked: 303 times | Joined on Feb 2010 @ Norwich, UK
#7
Originally Posted by jebba View Post
Are you on a 64bit system? You may need to install libc6-i686 (debian) or glibc.i686 (fedora).
The VM's are 32-bit - Dump generation is technically functioning, over the space of 2 days I managed to get roughly 150MB of work done on each of 2 VM's, but for some reason every few hours each VM was getting submitting a completed slice back to the system then getting a new job to reload one or both of the commons and en wikidumps, despite the commonsdate and wikidate files locally having the same dump version already stored (ie the dumpathome client was deciding to redownload and overwrite the existing local dumps with exactly the same freshly downloaded dump every few hours).
After doing so, the client then seems to have about a 50% chance of just carrying on processing fine, or start throwing out the errors listed above. Presumably something's amiss with having to redownload the wiki dumps so frequently in the first place, I can't see any real reason it would need redownloading at all, and having to redownload the 15GB ish of dumps twice a day is taking up more time than the VM's were able to actually spend processing (while both working they were averaging a slice every 10 minutes ish, between them).
 
Posts: 58 | Thanked: 42 times | Joined on Jan 2010
#8
We created the first dump! Thanks to all who helped! The dump can be downloaded from http://wiki.maemo.org/Evopedia (I hope the archive does not break again...).

I'm currently uploading the commons image database and I think we will start the Dutch Wikipedia on the weekend.
 
Posts: 8 | Thanked: 0 times | Joined on Aug 2010
#9
I intall evopedia with dumps in Italian, but when I open the application to find the dump says he can not find anything as I do? But if I see them on file management with extension IDX and DAT. help me thanks
 
Reply


 
Forum Jump


All times are GMT. The time now is 22:19.