![]() |
[WORK AROUND] sgx_misr randomly jumps to 98% CPU use
1 Attachment(s)
For those not aware, some users are having an issue with the graphics driver for the N900. It started back in PR1.1, and while changes have been made to try to prevent the problem it's still occuring for some people in PR1.3. (I'm one of them, lucky me.)
It seems to be triggered by a combination of heavy graphics usage, high CPU usage, and possibly GPS usage, though the latter isn't always the case. Users that play intense flash games in the browser, or use high use programs like mapping programs (modRana, Mappero, etc) seem to see this more often. It's listed in bug tracker here. Feel free to vote for it. :) My biggest issue with this is that it can happen just about any time, even when CPU use is low. The end result is a dead phone, as the process jumps silently to 98% CPU usage, burns through the battery, and the device shuts down, usually without a warning signal. Short of carrying a spare battery, you can't even turn it back on before charging it. To help stop this until Nokia fixes it, I made a little script that checks to see if the process, a kernel level driver process in this case, is eating a lot of CPU. It only does this every 30 seconds or so, and is rather non-invasive. If it detects activity, it watches it a little closer, and issues warnings via espeak if things are going bad, then reboots. I did this just in case you're actively using it (on a call, etc). After a few warnings, it reboots the device, to prevent the system from draining the battery. Currently there's no way to repair this other than rebooting. It would be great if there were a way to re-init the chipset/driver involved, but it's not a module from what I can see. If the custom kernel folks could compile this driver as a module, that would rock, since the script could then just re-init (or unload/reload) the graphics driver, and we could re-start X or what not, vs rebooting the whole device. For now, this script handles the immediate issue of reaching into ones pocket and finding a dead device with no battery left. :P I placed it in /usr/sbin/ on my device and made an RC script to auto-start it at boot. It needs to be run as root to pull off the reboot, so keep that in mind. Hope this helps those having this issue frequently. :) PS: The script assumes you have espeak installed. If you don't, you may want to install it, or replace the espeak lines with whatever notification mechanism you want to have. Or just delete those lines if you don't care about being warned. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
I encountered this problem on the past where it raises my CPU to 100%... I Overclocked my n900 to 1150mhz and still reaches to the optimum... I think removed QBW v 1.3 and then I never seen it again in conky till now :p
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
I had this issue just few minutes ago for the first time. I have compiled my own kernel which has patches for iphb and ppp_async, and NAT enabled. Thanks for the script.
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
I'm pretty sure this is a driver issue, which means anyone could see it. I think it just happens more often when one or more of the following is going on:
If you have an app doing several of these things at once, like a flash game (CPU/graphics/wifi), or a navigation app (GPS/CPU/memory), it's just more likely to happen. :( My phone woke me a 5am today, warning of a reboot... Totally inactive, but sure enough sgx_misr was pegging the CPU & screen wouldn't turn on. It rebooted itself and everything was fine. Had I not had the script, it would have drained the battery (even while plugged in), and I'd have had a dead/off phone this morning when I awoke. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Rufff..................!!!
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
In fact, I've often had to replace the battery because this bug drained it to the point that the system asked for a time/date. On changing to spare battery, I've booted directly (no reboot involved) only to have it go back into this loop within an hour because I was then using it as a navigation aid. Even if it did occur only after a reboot, devices need to be able to autonomously reboot themselves and return to a stable state. If they can't that there's a flaw in the driver that needs to either put the chipset in a known state before a system reboot, or be able to handle/reset the state it's in. (If it were properly programmed to do so, I suspect it wouldn't get into this state in the first place.) |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
Edit: Basically the driver needs fixing...either Nokia or someone else who's worked on it. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
I would love to be able to tune the watchdog to look for this. And I'm sure writing it as a native app in C would lower it's processing time further. If I could, I would have it hook into the frequency scaler to only bother checking when the CPU is running at top speed. But I don't know how to do that from a shell script. Is there a dbus message I can hook against to only wake when this is happening? I'm very open to changes. :D |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
BTW won't this be a battery whore? Well the script gets stuck into an infinite loop...
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
The funny part is that until PR1.3 I only had this happen once or twice at most, and wasn't sure what was causing it. After the PR1.3 update it started happening several times a week, though my usage patterns did change around the same time, so it may be I just wasn't tasking it enough to cause the issue. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Will this be in extras-devel repository?
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
Once I get my dev environment setup, if I make this into a real program, I'll make one though. I didn't want that delay holding up releasing something quick though for people to use until a real solution is out. Hopefully Nokia will fix this and there will be no reason to make such a package. (And pigs may fly, and glittery rainbows may shoot from my bum... just as likely.) For now you can download it from the first post and just run it (as root). |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
I can type in X Terminal "sudo gainroot" and then "./script" to start it, but then this script will be stopped as soon as this X Terminal window is closed, will not it? I would like to run it at startup, if possible. Can it be done? I just have hit the bug, at least the third or maybe fourth time. When you are looping and there is nothing suspicious, can you store in memory last 30/60 seconds of dmesg output so that when there is something suspicious you dump logs from recent past into the file, and the human can later read dmesg logs of what happened before the problem and could be its cause? http://ibot.rikers.org/%23maemo/20100128.html.gz What's the way to resolve the problem, besides taking out the battery? EDIT: I'm trying to reproduce the bug (with help of Modrana, USB internet, cellular connectivity, FIlebox, Leafpad and Fennec), but get [sgx_misr] CPU usage only around 0.2-0.4-0.5-0.6-0.7. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
Code:
sudo gainroot Quote:
The QueenBee wiki has examples of how to make a widget start something at system boot, and it's totally happy with running scripts, including running them as root. Quote:
A high cpu app (modRana/microb/flash) is running The network active (gprs or wifi) An SMS arrives while all of the above is going on. Oddly, I've not been able to get it to do it recently, but have had other issues. I've also had my SMS log corrupted by this, and now have some other odd side-effect issues affecting glogarchive and my SMS service. :( So I'm near the verge of re-flashing to fix that... Quote:
Quote:
|
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
This drawing to the screen when not necessary is one of the reasons Nokia people have given why this graphics driver spinning may happen. |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
It's one they've been giving, but I've had it happen when the phone was completely idle. It may happen more often when that's going on, but it's not the only factor. The main factor in my case seems to be that every time it happens, it's as a new SMS comes in (but the SMS is not displayed or signaled until after the reboot). This doesn't happen with every SMS, but it does happen as a result of it.
Any improvement to modRana is a good thing though, as I do love that program and use it regularly. :) |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quick reply...
dmesg doesn't have long enough logs. After reboot I cannot see the SGX errors in the log. The oldest lines are about the last reboot. Booting process writes so much lines in dmesg. I'm going to put dmesg logging into the script. Does dmesg >> output.txt overwrite output.txt or add new lines to it? And how can I filter out lines containing "slide" or "kb_lock"? |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
An example from grep SGX /var/log/syslog Code:
Dec 14 09:41:32 Nokia-N900 kernel: [68078.694793] HWRecoveryResetSGX: SGX Hardware Recovery triggered |
Re: [WORK AROUND] sgx_misr randomly jumps to 98% CPU use
Quote:
I Quote:
Code:
dmesg | grep -v "kb_lock" >> output.txt |
All times are GMT. The time now is 01:08. |
vBulletin® Version 3.8.8