![]() |
2011-10-10
, 13:20
|
|
Posts: 1,503 |
Thanked: 2,688 times |
Joined on Oct 2010
@ Denmark
|
#2
|
![]() |
2011-10-10
, 13:35
|
|
Posts: 2,448 |
Thanked: 9,523 times |
Joined on Aug 2010
@ Wigan, UK
|
#3
|
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.
Thanks
for ((i = $3; i <= $4; i++)) do echo "grabbing page $i" wget "$2&page=$i" -O "$1-page$i.html" done
#!/usr/bin/python import io import urllib import sys def get_thread(fileName, link, start_page, end_page): for num in range(int(start_page), int(end_page) + 1): print "grabbing page " + str(num) page = urllib.urlopen("%s&page=%d" % (link, num)).read() with io.open("%s-page%d.html" % (fileName, num) , 'w') as file: file.write(unicode(page, 'utf-8')) file.close() if __name__ == '__main__': sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
![]() |
2011-10-10
, 13:53
|
Posts: 244 |
Thanked: 354 times |
Joined on Jul 2010
@ Scotland
|
#4
|
![]() |
2011-10-10
, 14:02
|
Posts: 99 |
Thanked: 36 times |
Joined on Mar 2010
|
#5
|
Do you want to download each page of the thread as an HTML file?
A python alternative:Code:for ((i = $3; i <= $4; i++)) do echo "grabbing page $i" wget "$2&page=$i" -O "$1-page$i.html" done
In both cases you would supply the following arguments:Code:#!/usr/bin/python import io import urllib import sys def get_thread(fileName, link, start_page, end_page): for num in range(int(start_page), int(end_page) + 1): print "grabbing page " + str(num) page = urllib.urlopen("%s&page=%d" % (link, num)).read() with io.open("%s-page%d.html" % (fileName, num) , 'w') as file: file.write(unicode(page, 'utf-8')) file.close() if __name__ == '__main__': sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
1. The filename you wish to save to (the page number is appended).
2. The link to the thread.
3. Start page (usually 1).
4. End page (usually the last page of the thread).
![]() |
2011-10-10
, 14:45
|
|
Posts: 2,121 |
Thanked: 1,540 times |
Joined on Mar 2008
@ Oxford, UK
|
#6
|
I was wondering if there's anyway to download a whole thread so I can read it offline. I know I can do a page at a time.
Thanks
![]() |
2011-10-10
, 18:53
|
Posts: 99 |
Thanked: 36 times |
Joined on Mar 2010
|
#7
|
#!/usr/bin/python import io import urllib import sys fileName = test link = urllib.urlopen("http://talk.maemo.org/showthread.php?t=73315.html") start_page = 1 end_page = 255 def get_thread(fileName, link, start_page, end_page): for num in range(int(start_page), int(end_page) + 1): #print "grabbing page " + str(num) page = urllib.urlopen("%s&page=%d" % (link, num)).read() with io.open("%s-page%d.html" % (fileName, num) , 'w') as file: file.write(unicode(page, 'utf-8')) file.close() if __name__ == '__main__': sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
![]() |
2011-10-11
, 17:19
|
|
Posts: 2,448 |
Thanked: 9,523 times |
Joined on Aug 2010
@ Wigan, UK
|
#8
|
Ok,
Can someone point me in the right direction, I'm trying to run the python code, the below is where I'm at. I think I've defined the variables correctly?
I get an error 'test' is not defined, how do I correctly assign file name variable?
I'm not after the full solution just a friendly pointer in the right direction.
Thanks
#!/usr/bin/python import io import urllib import sys file_name = "test" link = "http://talk.maemo.org/showthread.php?t=73315" start_page = 1 end_page = 255 def get_thread(file_name, link, start_page, end_page): for num in range(int(start_page), int(end_page) + 1): #print "grabbing page " + str(num) page = urllib.urlopen("%s&page=%d" % (link, num)).read() with io.open("%s-page%d.html" % (file_name, num) , 'w') as file: file.write(unicode(page, 'utf-8')) file.close() if __name__ == '__main__': sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
The Following User Says Thank You to marxian For This Useful Post: | ||
![]() |
2011-10-11
, 19:50
|
Posts: 99 |
Thanked: 36 times |
Joined on Mar 2010
|
#9
|
import urllib import sys try: sys.argv[3] = int(sys.argv[3]) sys.argv[4] = int(sys.argv[4]) except ValueError: print "Use numbers for the last two arguments or whatever" sys.exit(1) def get_thread(fileName, link, start_page, end_page): for num in range(int(start_page), int(end_page) + 1): print "grabbing page " + str(num) page = urllib.urlopen("%s&page=%d" % (link, num)).read() file = open("%s-page%d.html" % (fileName, num) , 'w') file.write(page) print "downloaded: "+ str(num) file.close() if __name__ == '__main__': sys.exit(get_thread(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4]))
cd MyDocs
python getathread.py power49 http://talk.maemo.org/showthread.php?p=1105192 1 186
Thanks