View Single Post
marxian's Avatar
Posts: 2,448 | Thanked: 9,523 times | Joined on Aug 2010 @ Wigan, UK
#3
Originally Posted by ziggadebo View Post
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.

Thanks
Do you want to download each page of the thread as an HTML file?

Code:
for ((i = $3; i <= $4; i++))
do
    echo "grabbing page $i"
    wget "$2&page=$i" -O "$1-page$i.html"
done
A python alternative:

Code:
#!/usr/bin/python

import io
import urllib
import sys

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).
__________________
'Men of high position are allowed, by a special act of grace, to accomodate their reasoning to the answer they need. Logic is only required in those of lesser rank.' - J K Galbraith

My website

GitHub

Last edited by marxian; 2011-10-10 at 13:40.
 

The Following 7 Users Say Thank You to marxian For This Useful Post: