View Single Post
Posts: 99 | Thanked: 36 times | Joined on Mar 2010
#5
Originally Posted by marxian View Post
Do you want to download each page of the thread as an HTML file?

Code:
for ((i = $3; i <= $4; i++))
do
    echo "grabbing page $i"
    wget "$2&page=$i" -O "$1-page$i.html"
done
A python alternative:

Code:
#!/usr/bin/python

import io
import urllib
import sys

def get_thread(fileName, link, start_page, end_page):
    for num in range(int(start_page), int(end_page) + 1):
        print "grabbing page " + str(num)
        page = urllib.urlopen("%s&page=%d" % (link, num)).read()
        with io.open("%s-page%d.html" % (fileName, num) , 'w') as file:
            file.write(unicode(page, 'utf-8'))
            file.close()

if __name__ == '__main__':
    sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
In both cases you would supply the following arguments:

1. The filename you wish to save to (the page number is appended).

2. The link to the thread.

3. Start page (usually 1).

4. End page (usually the last page of the thread).
Thanks, Am learning python at the moment, early days. So will play with this tonight.

Cheers