maemo.org - Talk - View Single Post - Is there anyway to download a whole thread

ziggadebo	2011-10-10 , 14:02
Posts: 99 \| Thanked: 36 times \| Joined on Mar 2010	#5

Originally Posted by marxian

Do you want to download each page of the thread as an HTML file?
Code:
for ((i = $3; i <= $4; i++))
do
    echo "grabbing page $i"
    wget "$2&page=$i" -O "$1-page$i.html"
done
A python alternative:
Code:
#!/usr/bin/python

import io
import urllib
import sys

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).

Thanks, Am learning python at the moment, early days. So will play with this tonight.

Cheers

Quote & Reply |