maemo.org - Talk - Is there anyway to download a whole thread

maemo.org - Talk (https://talk.maemo.org/index.php)

- Community (https://talk.maemo.org/forumdisplay.php?f=16)

- - Is there anyway to download a whole thread (https://talk.maemo.org/showthread.php?t=78920)

Is there anyway to download a whole thread

I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks

Re: Is there anyway to download a whole thread

i would also want this.

Re: Is there anyway to download a whole thread

Quote:

Originally Posted by ziggadebo (Post 1106535)

I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks

Do you want to download each page of the thread as an HTML file?

Code:

for ((i = $3; i <= $4; i++))

do

    echo "grabbing page $i"

    wget "$2&page=$i" -O "$1-page$i.html"

done

A python alternative:

Code:

#!/usr/bin/python



import io

import urllib

import sys



def get_thread(fileName, link, start_page, end_page):

    for num in range(int(start_page), int(end_page) + 1):

        print "grabbing page " + str(num)

        page = urllib.urlopen("%s&page=%d" % (link, num)).read()

        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:

            file.write(unicode(page, 'utf-8'))

            file.close()



if __name__ == '__main__':

    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))

In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).

Re: Is there anyway to download a whole thread

Click on Thread Tools, choose Printable Version, then "save as" or print to file (PDF).

Edit:

My mistake, both this and the archive/index.php resort to pagination for large threads.

Re: Is there anyway to download a whole thread

Quote:

Originally Posted by marxian (Post 1106552)

Do you want to download each page of the thread as an HTML file?

Code:

for ((i = $3; i <= $4; i++))

do

    echo "grabbing page $i"

    wget "$2&page=$i" -O "$1-page$i.html"

done

A python alternative:

Code:

#!/usr/bin/python



import io

import urllib

import sys



def get_thread(fileName, link, start_page, end_page):

    for num in range(int(start_page), int(end_page) + 1):

        print "grabbing page " + str(num)

        page = urllib.urlopen("%s&page=%d" % (link, num)).read()

        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:

            file.write(unicode(page, 'utf-8'))

            file.close()



if __name__ == '__main__':

    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))

Thanks, Am learning python at the moment, early days. So will play with this tonight.

Cheers

Re: Is there anyway to download a whole thread

Quote:

Originally Posted by ziggadebo (Post 1106535)

I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks

If you use Firefox and want a point-and-click solution, look into the Re-Pagination add-on at https://addons.mozilla.org/en-US/fir...re-pagination/. This can turn a multi-page thread into one long page (this is useful if you want to search within a thread, by the way). Unfortunately you cannot then save the long page as HTML, but you could "print" it to a PDF if you have the correct software.

Re: Is there anyway to download a whole thread

Ok,

Can someone point me in the right direction, I'm trying to run the python code, the below is where I'm at. I think I've defined the variables correctly?

I get an error 'test' is not defined, how do I correctly assign file name variable?

Code:

#!/usr/bin/python



import io

import urllib

import sys



fileName = test

link = urllib.urlopen("http://talk.maemo.org/showthread.php?t=73315.html")

start_page = 1

end_page = 255



def get_thread(fileName, link, start_page, end_page):

    for num in range(int(start_page), int(end_page) + 1):

        #print "grabbing page " + str(num)

        page = urllib.urlopen("%s&page=%d" % (link, num)).read()

        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:

            file.write(unicode(page, 'utf-8'))

            file.close()



if __name__ == '__main__':

    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))

I'm not after the full solution just a friendly pointer in the right direction.

Thanks

Re: Is there anyway to download a whole thread

Quote:

Originally Posted by ziggadebo (Post 1106703)

Your code contains the following errors:

1. The fileName variable should be a string, i.e. "test". Your code attempts to assign the value of an undefined variable test to the variable fileName.

2. The link variable should also be a string. Your code attempts to perform urllib.urlopen() on an object that has been returned by that method (as part of a string).

3. You have added ".html" to the thread link. The link should end after the t parameter, i.e "http://talk.maemo.org/showthread.php?t=73315".

Solution:

Code:

#!/usr/bin/python



import io

import urllib

import sys



file_name = "test"

link = "http://talk.maemo.org/showthread.php?t=73315"

start_page = 1

end_page = 255



def get_thread(file_name, link, start_page, end_page):

    for num in range(int(start_page), int(end_page) + 1):

        #print "grabbing page " + str(num)

        page = urllib.urlopen("%s&page=%d" % (link, num)).read()

        with io.open("%s-page%d.html" % (file_name, num) , 'w') as file:

            file.write(unicode(page, 'utf-8'))

            file.close()



if __name__ == '__main__':

    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))

In my earlier example, I had a mixture of styles for the variable names, so I changed fileName to file_name. :)

Re: Is there anyway to download a whole thread

Firstly marxian a big thank you for your help you certainly pointed me in a direction to get this working. The below is your code, I just changed it to get it working for me.

I couldn't get your python code to run try as I might (Probably more to do with my complete lack of understanding than your code) However It definitely won't run on the N900 as python2.5 doesn't support the io library.

Anyway I've taken what you've given me and with some help( well a lot of help) I've managed to get it working on the N900.

If anyone wants to use, the process/steps needed are as follows:
(I'm writing this as an absolute beginner so feel free to point out any errors)

I'm assuming that python2.5 has been installed on your N900. If not install it first.

Firstly using a text editor on your N900 (I use leafpad) copy the code below into it and save the file. Call it getathread.py - Save it to your MyDocs folder so that the output will be easily reachable when we need it.

Code:

import urllib

import sys

try:

    sys.argv[3] = int(sys.argv[3])

    sys.argv[4] = int(sys.argv[4])

except ValueError:

    print "Use numbers for the last two arguments or whatever"

    sys.exit(1)



def get_thread(fileName, link, start_page, end_page):

    for num in range(int(start_page), int(end_page) + 1):

        print "grabbing page " + str(num)

        page = urllib.urlopen("%s&page=%d" % (link, num)).read()

        file = open("%s-page%d.html" % (fileName, num) , 'w')

        file.write(page)

        print "downloaded: "+ str(num)

        file.close()











if __name__ == '__main__':

    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))

now go into your terminal

and change directory to the MyDocs directory

by typing

Code:

cd MyDocs

Now to run the script we need 4 variables (pieces of information)

1. File name you want to save the output as
2. link to the thread
3. First page of thread to download
4. Last page of thread to download

We will use this information to trigger the code.

So for this example, I will use the popular Kernal Power V49 thread

So our variables are
1. power49
2. http://talk.maemo.org/showthread.php?p=1105192
3. 1
4. 186

So to run our code we would type in terminal (note the variables are just separated by a single space)

Code:

python getathread.py power49 http://talk.maemo.org/showthread.php?p=1105192 1 186

you should then get output on the screen saying
grabbing page 1
downloaded: 1
grabbing page 2
downloaded: 2

etc......

When finished simply launch any of the files from filemanager or from your browser.