View Single Post
thp's Avatar
Posts: 1,391 | Thanked: 4,272 times | Joined on Sep 2007 @ Vienna, Austria
#159
Before starting, let me say thanks for this helpful post highlighting some important performance issues


Originally Posted by onion_cfe View Post
Very interesting project. I gave up on the original gPodder port a long time ago due to the speed issues. I'm having not dissimilar issues in this newer version, but they've been brought up plenty of times earlier in the thread so the reasons are clearly known.
Yep. The performance issues are not something that inherently _are_ in gPodder, but something that can be fixed with someone sitting down a week or two and streamlining the code.

I'm currently using Videocenter for podcatching. It has a major bug in failing to rename some files after download, and now seems to be a dead project with no chance of being fixed. It would be fantastic to be done with it, although it did show promise for a while, and instead use a project that has clearly had some care and attention spent on it - gPodder.
Thanks for the kind words

I'd really like to use gPodder. It's much more geared toward audio content, but Videocenter does undeniably handle the RSS a lot faster.
gPodder is running on Python, an interpreted language. Also, it's using Feedparser (feedparser.org) to parse its feed, which makes it (IMHO) more compatible to strange RSS/Atom feeds than other solutions. Sadly, this also means some performance impact.

One possibility would be to have a web service carry out the hard work and make gPodder only "read" the data from the webservice without any heavy lifting done. This would make things really (I mean, really, really!) fast (in terms of checking for updates), but would of course make gPodder dependent on such a web service.

I throw a lot at these applications, so seemingly have a harder time than many others writing here who seem to manage to use gPodder, something I haven't yet managed. I've imported 21 feeds. Most of them have sensible publishers that keep x number of episodes in the feed, but a few insist on retaining all episodes. One such is around the 850 show mark, another around 250. I guess these might be harder to deal with, but the new episodes seem to be at the top of the feeds. Are the whole feeds being parsed regardless of size? Do they need to be? I'm no expert on RSS. Does the mechanism use SAX or DOM style parsing? Is there a guarantee that new episodes will be at the top of feeds or could they be in any order?
Yep, changed feeds get parsed completely from start to end, and you're right - the reason for this is that the episodes can theoretically be in any order (heck, some podcasts don't even provide titles or publishing dates, which makes it really hard to present these episodes nicely in a GUI way).

To my knowledge (and by having a short peek at the source), feedparser uses SAX to parse feeds.

Looking at the verbose output it appears that if a feed is unchanged no time is spent reviewing it, but otherwise the same processing that was required when first importing will be required again. Is this the case, because for this list of feeds that takes in excess of half an hour - perhaps longer, and i'd say maybe half of these feeds I subscribe to update daily. Videocenter takes about 5 minutes to do the same. I'm not sure how well it handles bad RSS, but it seems to manage these 21 well enough.
Oh, by the way: If you are using gPodder on your Linux Desktop, you can also download pocasts there and sync it to your Tablet. Although this is probably not what you want

gPodder seems to reread some feeds that episode-wise haven't changed. I can only assume that they must have changed in some other way. Perhaps if somebody is pushing an unchanged XML file to their host every so often the change to a timestamp is enough to convince gPodder that it has changed. If that happens to be the 850 show feed, gPodder now checks each item. Even at 2-3 a second, this obviously still takes some time. Even if there is a new episode in the feed it still seems to need to read all 850 items before it can move on.
gPodder relies at the "etag" and "Last-Modified" headers/features of modern web browser to detect changed files. This should work most of the time, especially for static file, but of course if the feed author is not taking care of this (e.g. the feed is generated dynamically by a script), it won't work and we have no chance of detecting this.

It was mentioned some further back in the thread that there was work to be done on the RSS, time permitting. Has progress been made on this? I ask from the standpoint of somebody that can't offer any help (beyond testing and suggesting I suppose) that would really like to use gPodder and get rid of Videocenter.
There have been some generic speed improvements to the database code, but no direct work at optimizing feedparser.

If anything I've said here is inaccurate then please let me know. I've only brought up so many specifics because they don't seem to have been discussed yet and they seem to be the main barrier to me being able to use gPodder successfully at the moment.
Have you tried the latest Maemo version (0.13.0) of gPodder yet? If so, is it just the parsing speed of feeds thare you don't like? How would you like gPodder if parsing was blazingly fast (with the added dependency on a web service)? Thanks for your feedback!