Active Topics

 


Reply
Thread Tools
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#1
Hi,

I'm attempting to regex some html code to extract a url - so far with limited success.

I use a WGET command to save the page to local htm file:
Code:
wget --user-agent="" -q -t 1 -T 10 -O "/home/user/MyDocs/.images/queen_beecon_dir/test.htm" "http://uk.weather.yahoo.com/england/tyne-and-wear/newcastle-upon-tyne-30079/"
So far, this GREP command will bring back the entire line where the information i need is:

Code:
, ~ $ grep 'forecast-icon.*http.*png' /home/user/MyDocs/.images/queen_beecon_dir/test.htm

</script></div><div></div><div id="yw-forecast" class="night" style="height:auto"><em>Current conditions as of 10:50 PM GMT</em><div id="yw-cond">Mostly Cloudy</div><dl><dt>Feels Like:</dt><dd>1 &deg;C</dd><dt>Barometer:</dt><dd style="position:relative;">1,015.92 mb and steady</dd><dt>Humidity:</dt><dd>87 %</dd><dt>Visibility:</dt><dd>9.99 km</dd><dt>Dewpoint:</dt><dd>-1 °C</dd><dt>Wind:</dt><dd>NW 9.66 km/h</dd><dt>UV Index:</dt><dd>--</dd><dt> UV Description:</dt><dd>Low</dd><dt>Sunrise:</dt><dd>8:32 AM</dd><dt>Sunset:</dt><dd>3:48 PM</dd></dl><div class="forecast-temp"><div id="yw-temp">1°</div><p>High: 3° Low: -2°</p><ul><li>&raquo; <a class=action href="/climo/UKXX0098_c.html?woeid=30079">Records and Averages</a><li>&raquo; <a class="action" href="http://us.rd.yahoo.com/evt=37752/*http://widgets.yahoo.com/widgets/yahoo-weather">Get Yahoo! Weather on your desktop</a></li></ul></div><div class="forecast-icon" style="background:url('http://l.yimg.com/a/i/us/nws/weather/gr/27n.png'); _background-image/* */: none; filter:progid:DXImageTransform.Microsoft.AlphaImageLoader(src='http://l.yimg.com/a/i/us/nws/weather/gr/27n.png', sizingMethod='crop'); "></div></div>
However I have been unable to get a capture result of the specific url for the png
Code:
~ $ grep 'forecast-icon.*(http.*png)' /home/user/MyDocs/.images/queen_beecon_dir/test.htm
No syntax errors but no results either. Am I missing something in the grep command to view a captured result or is my regex too simple?

Last edited by [DarkGUNMAN]; 2011-01-03 at 17:31.
 
Posts: 237 | Thanked: 193 times | Joined on Feb 2010 @ Brighton, UK
#2
by default grep shows lines in the source file that match your expression, and in your first example it is doing just that - outputting the matching line (albeit a long line).

Are you trying to extract the url of the background image being served up instead?

Try using grep with the -o option and working on the regex some more.
 
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#3
Thats exactly what i am trying to do - as you can tell my knowledge of grep is totally newbie, and i'm just following my own knowledge of regex from coding in AutoIT.
The -o option helps get closer to it - but using brackets to capture doesn't work
 
Posts: 237 | Thanked: 193 times | Joined on Feb 2010 @ Brighton, UK
#4
Code:
grep 'forecast-icon.*http.*png' /home/user/MyDocs/.images/queen_beecon_dir/test.htm | grep -o http*[a-zA-Z0-9\.\-\\_/\:]*.png
will get you closer but give you two results I think. If I've understood you correctly you save a html file from Yahoo then grep for 'forecast-icon.*http.*png' , I'm suggesting you could pipe the result to grep again with the second regex. I think that will give you two matches though. If I've understood any of what you are trying to do at all.

so maybe
Code:
grep 'forecast-icon.*http.*png' /home/user/MyDocs/.images/queen_beecon_dir/test.htm | grep -o http*[a-zA-Z0-9\.\-\\_/\:]*.png | grep -m 1 http
will work? I'm sure someone could rationalise this a lot better.

not tested at all though.

UPDATE - just tested this on the device and it worked fine - with the correction I've just posted.

Last edited by mr id; 2011-01-03 at 18:00. Reason: Typo in command
 
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#5
That worked a charm - thank you!
How do I pipe the result into a wget command?
 
Posts: 237 | Thanked: 193 times | Joined on Feb 2010 @ Brighton, UK
#6
if you wrap the grep lines in ` (a backtick) to evaluate the output, you can use

Code:
wget `grep 'forecast-icon.*http.*png' /home/user/MyDocs/.images/queen_beecon_dir/test.htm | grep -o http*[a-zA-Z0-9\.\-\\_/\:]*.png | grep -m 1 http` -O MYFILEPATH.png

Last edited by mr id; 2011-01-03 at 18:30. Reason: CODE tag.
 
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#7
Excellent! This is working a charm as part of a qbw widget to display the weather forecast. Thank you for your help
 
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#8
Thanks again Mr ID, here's the result of you help
Attached Images
 
 

The Following User Says Thank You to [DarkGUNMAN] For This Useful Post:
Posts: 237 | Thanked: 193 times | Joined on Feb 2010 @ Brighton, UK
#9
Originally Posted by [DarkGUNMAN] View Post
Thanks again Mr ID, here's the result of you help
looks fantsastic, nice job!

how about releasing your scripts to the rest of the community? looks like people would need to just update the weather URL to their own location to use it themselves.

cheers
 
Posts: 309 | Thanked: 456 times | Joined on Jan 2010
#10
I'm going to refine the layout a bit more.. when it's ready I'll add the code as a post on on the main QBW thread, as well as the graphics i created for the flipclock.
 

The Following User Says Thank You to [DarkGUNMAN] For This Useful Post:
Reply


 
Forum Jump


All times are GMT. The time now is 14:02.