maemo.org - Talk - View Single Post - TEASER: Maegios

	Ralph	2010-02-02 , 15:24
	Posts: 14 \| Thanked: 10 times \| Joined on Jan 2010 @ Italy	#37

Here's my little contribution in using regular expressions to correctly parse the URL:

Code:

Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile('(http|https)?:?\/?\/?([a-z]*\.?[a-z]*\.[a-z]*)\/?(.*)')
>>> m = re.search(p,'http://www.nagios.org')
>>> m.group(1)
'http'
>>> m.group(2)
'www.nagios.org'
>>> m.group(3)
''
>>> m = re.search(p,'https://www.nagios.org/foo')
>>> m.group(1)
'https'
>>> m.group(2)
'www.nagios.org'
>>> m.group(3)
'foo'
>>> m = re.search(p,'nagios.org/foo')
>>> m.group(1)
>>> m.group(2)
'nagios.org'
>>> m.group(3)
'foo'
>>>

There are still some potential issues if URL is written in uppercase or if a trailing slash is inserted after a subfolder, http://WWW.NAGIOS.ORG/nagios/ is still not parsed correctly by the above regexp but with a couple of str.lower() and str.strip() you can have a lowercase string without a trailing /.

Hope this can help

Quote & Reply |

The Following User Says Thank You to Ralph For This Useful Post:
tissot