maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   Nokia N900 (https://talk.maemo.org/forumdisplay.php?f=44)
-   -   [WORK AROUND] sgx_misr randomly jumps to 98% CPU use (https://talk.maemo.org/showthread.php?t=66660)

woody14619 2010-12-06 20:08

[WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
1 Attachment(s)
For those not aware, some users are having an issue with the graphics driver for the N900. It started back in PR1.1, and while changes have been made to try to prevent the problem it's still occuring for some people in PR1.3. (I'm one of them, lucky me.)

It seems to be triggered by a combination of heavy graphics usage, high CPU usage, and possibly GPS usage, though the latter isn't always the case. Users that play intense flash games in the browser, or use high use programs like mapping programs (modRana, Mappero, etc) seem to see this more often. It's listed in bug tracker here. Feel free to vote for it. :)

My biggest issue with this is that it can happen just about any time, even when CPU use is low. The end result is a dead phone, as the process jumps silently to 98% CPU usage, burns through the battery, and the device shuts down, usually without a warning signal. Short of carrying a spare battery, you can't even turn it back on before charging it.

To help stop this until Nokia fixes it, I made a little script that checks to see if the process, a kernel level driver process in this case, is eating a lot of CPU. It only does this every 30 seconds or so, and is rather non-invasive. If it detects activity, it watches it a little closer, and issues warnings via espeak if things are going bad, then reboots. I did this just in case you're actively using it (on a call, etc). After a few warnings, it reboots the device, to prevent the system from draining the battery.

Currently there's no way to repair this other than rebooting. It would be great if there were a way to re-init the chipset/driver involved, but it's not a module from what I can see. If the custom kernel folks could compile this driver as a module, that would rock, since the script could then just re-init (or unload/reload) the graphics driver, and we could re-start X or what not, vs rebooting the whole device.

For now, this script handles the immediate issue of reaching into ones pocket and finding a dead device with no battery left. :P I placed it in /usr/sbin/ on my device and made an RC script to auto-start it at boot. It needs to be run as root to pull off the reboot, so keep that in mind. Hope this helps those having this issue frequently. :)

PS: The script assumes you have espeak installed. If you don't, you may want to install it, or replace the espeak lines with whatever notification mechanism you want to have. Or just delete those lines if you don't care about being warned.

Radicalz38 2010-12-06 20:29

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
I encountered this problem on the past where it raises my CPU to 100%... I Overclocked my n900 to 1150mhz and still reaches to the optimum... I think removed QBW v 1.3 and then I never seen it again in conky till now :p

rajil.s 2010-12-06 20:29

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
I had this issue just few minutes ago for the first time. I have compiled my own kernel which has patches for iphb and ppp_async, and NAT enabled. Thanks for the script.

woody14619 2010-12-06 23:29

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
I'm pretty sure this is a driver issue, which means anyone could see it. I think it just happens more often when one or more of the following is going on:
  • Program is doing CPU intensive work
  • System free memory is low
  • Graphic redraw is high (high frame rate apps)
  • GPS is active
  • Heavy Wifi/3G/GPRS data activity, and/or mode switching.

If you have an app doing several of these things at once, like a flash game (CPU/graphics/wifi), or a navigation app (GPS/CPU/memory), it's just more likely to happen. :(

My phone woke me a 5am today, warning of a reboot... Totally inactive, but sure enough sgx_misr was pegging the CPU & screen wouldn't turn on. It rebooted itself and everything was fine. Had I not had the script, it would have drained the battery (even while plugged in), and I'd have had a dead/off phone this morning when I awoke.

mardy 2010-12-07 06:51

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Quote:

Originally Posted by woody14619 (Post 890903)
To help stop this until Nokia fixes it, I made a little script that checks to see if the process, a kernel level driver process in this case, is eating a lot of CPU. It only does this every 30 seconds or so, and is rather non-invasive. If it detects activity, it watches it a little closer, and issues warnings via espeak if things are going bad, then reboots. I did this just in case you're actively using it (on a call, etc). After a few warnings, it reboots the device, to prevent the system from draining the battery.

Waking up the device every 30 seconds is not very power-friendly either. Maybe there is some way to tune the N900 watchdog?

F2thaK 2010-12-07 07:01

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Rufff..................!!!

Vertikar 2010-12-07 07:55

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Quote:

Originally Posted by woody14619 (Post 890903)

To help stop this until Nokia fixes it, I made a little script that checks to see if the process, a kernel level driver process in this case, is eating a lot of CPU. It only does this every 30 seconds or so, and is rather non-invasive. If it detects activity, it watches it a little closer, and issues warnings via espeak if things are going bad, then reboots. I did this just in case you're actively using it (on a call, etc). After a few warnings, it reboots the device, to prevent the system from draining the battery.

The only issue I see with this script is the fact that I've only ever seen the bug after manually issuing a reboot. That said its possible there is no longer an issue with that.

woody14619 2010-12-07 18:58

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Quote:

Originally Posted by Vertikar (Post 891263)
The only issue I see with this script is the fact that I've only ever seen the bug after manually issuing a reboot. That said its possible there is no longer an issue with that.

While I'm glad that you only see this after a reboot, you are not the common case. In the bug tracker there's no indication that this happens after a manual reboot, and I can verify that I've had this happen on many occasions where I had not manually rebooted.

In fact, I've often had to replace the battery because this bug drained it to the point that the system asked for a time/date. On changing to spare battery, I've booted directly (no reboot involved) only to have it go back into this loop within an hour because I was then using it as a navigation aid.

Even if it did occur only after a reboot, devices need to be able to autonomously reboot themselves and return to a stable state. If they can't that there's a flaw in the driver that needs to either put the chipset in a known state before a system reboot, or be able to handle/reset the state it's in. (If it were properly programmed to do so, I suspect it wouldn't get into this state in the first place.)

Vertikar 2010-12-07 19:02

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Quote:

Originally Posted by woody14619 (Post 891682)
Even if it did occur only after a reboot, devices need to be able to autonomously reboot themselves and return to a stable state. If they can't that there's a flaw in the driver that needs to either put the chipset in a known state before a system reboot, or be able to handle/reset the state it's in. (If it were properly programmed to do so, I suspect it wouldn't get into this state in the first place.)

IIRC (im not awake enough yet) one of the issues with the driver was that with a system reboot the chipset wouldn't properly clear the memory leaving it in a state where it was possible to see this issue without a huge amount of load. That said I haven't rebooted since PR1.3 came out.

Edit: Basically the driver needs fixing...either Nokia or someone else who's worked on it.

woody14619 2010-12-07 19:06

Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
 
Quote:

Originally Posted by mardy (Post 891217)
Waking up the device every 30 seconds is not very power-friendly either. Maybe there is some way to tune the N900 watchdog?

There are plenty of things that wake the system more often than this script. And again, this is a stop-gap measure, not a solution. I find this much more power friendly than reaching into my pocket to find my battery 100% drained because of a kernel bug...

I would love to be able to tune the watchdog to look for this. And I'm sure writing it as a native app in C would lower it's processing time further. If I could, I would have it hook into the frequency scaler to only bother checking when the CPU is running at top speed. But I don't know how to do that from a shell script. Is there a dbus message I can hook against to only wake when this is happening? I'm very open to changes. :D


All times are GMT. The time now is 20:21.

vBulletin® Version 3.8.8