maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   Nokia N900 (https://talk.maemo.org/forumdisplay.php?f=44)
-   -   N900 Unexpected reboots (https://talk.maemo.org/showthread.php?t=35055)

hypnotik 2009-11-27 20:35

Re: N900 Unexpected reboots
 
it just rebooted again. 3 reboots in 7 days. just web browsing this site in MicroB when it happened. no other apps.

Bratag 2009-11-27 21:09

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by hypnotik (Post 396704)
it just rebooted again. 3 reboots in 7 days. just web browsing this site in MicroB when it happened. no other apps.

Interesting. Are you running a logging script?

Wouldn't be a bad idea to vmstat at say 10 second intervals out to a file on your sdcard. Perhaps a top out to a file as well.

hypnotik 2009-11-27 21:43

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by Bratag (Post 396754)
Interesting. Are you running a logging script?

Wouldn't be a bad idea to vmstat at say 10 second intervals out to a file on your sdcard. Perhaps a top out to a file as well.

I've done this but the file keeps getting corrupted. Maybe I need a way to netcat it to a remote server.

hypnotik 2009-11-27 21:55

Re: N900 Unexpected reboots
 
So are these reboots related to some kernel level issue? would userlevel processes cause a ramdom reboot? i thought worst case the web browser would crash.

Bratag 2009-11-27 22:00

Re: N900 Unexpected reboots
 
I am wondering if we might be dealing with some bad memory chips here. Its possible I guess that when the swap/RAM reaches a certain spot in the memory whats written gets corrupted and things go pear shaped.

That might explain the random nature of the reboots.

jjx 2009-11-27 22:09

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by hypnotik (Post 396838)
So are these reboots related to some kernel level issue? would userlevel processes cause a ramdom reboot? i thought worst case the web browser would crash.

The reboot note says it's due to the watchdog.

Note: The following is based on generic experience with other devices. Don't have my N900 yet :)

Normally you'd hook the watchdog through the kernel to one or more userspace apps, so that if userspace locks up or a critical daemon dies, it will reboot. There's no point in the system continuing run if the kernel is fine but a critical part of userspace is gone.

However, if it's due to a userspace app dieing, or closing it's watchdog file descriptor, or failing to ping the kernel watchdog device, the kernel could log which userspace app is the reason for triggering the reboot, before rebooting.

If it's due to the kernel crashing, it could not note anything, and the reboot-reason register would simply record it as watchdog hardware reboot.

So there should be enough data to tell the difference, and if not, it could be added with a kernel patch.

jjx 2009-11-27 22:14

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by Bratag (Post 396846)
I am wondering if we might be dealing with some bad memory chips here. Its possible I guess that when the swap/RAM reaches a certain spot in the memory whats written gets corrupted and things go pear shaped.

That might explain the random nature of the reboots.

Yes, it could.

It could also be PCB quality control, with all the signal timings being so tight, that a few PCBs don't meet them consistently and random faults occur. I've had this problem with other devices (I work in embedded Linux generally).

Also coming to mind as a potential culprit: the "swap to flash" feature, where there's 256MB RAM swapping onto 1GB flash. That's not a feature commonly used as far as I know. If there are timing-dependent or flash-layout-dependent bugs in that area, it could have the same effect as bad memory - i.e. random failures - but wouldn't be due to faulty hardware so much as natural variations in the flash response triggering software issues. If there's enough spare memory, this could be tested with, I presume, a "swapoff -a" command and then using the device long enough to see if the reboots have subsided.

sony123 2009-11-27 22:30

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by jjx (Post 396869)
Yes, it could.

It could also be PCB quality control, with all the signal timings being so tight, that a few PCBs don't meet them consistently and random faults occur. I've had this problem with other devices (I work in embedded Linux generally).

Also coming to mind as a potential culprit: the "swap to flash" feature, where there's 256MB RAM swapping onto 1GB flash. That's not a feature commonly used as far as I know. If there are timing-dependent or flash-layout-dependent bugs in that area, it could have the same effect as bad memory - i.e. random failures - but wouldn't be due to faulty hardware so much as natural variations in the flash response triggering software issues. If there's enough spare memory, this could be tested with, I presume, a "swapoff -a" command and then using the device long enough to see if the reboots have subsided.

That's insightful... random failure really takes experience to isolate the root causes, and the last few posts may point out a few areas that Nokia can look into.

This random reboot issue is the most critical among the issues I've read about n900 because of its randomness. I can't help but thinking Eldar was unfortunately right about n900 being not stable... although the instability doesn't seem to affect every n900, Eldar did have a point as we can see now.

Bratag 2009-11-27 22:34

Re: N900 Unexpected reboots
 
Quote:

Originally Posted by sony123 (Post 396880)
That's insightful... random failure really takes experience to isolate the root causes, and the last few posts may point out a few areas that Nokia can look into.

This random reboot issue is the most critical among the issues I've read about n900 because of its randomness. I can't help but thinking Eldar was unfortunately right about n900 being not stable... although the instability doesn't seem to affect every n900, Eldar did have a point as we can see now.

Yep - I guess its possible he may have gotten a bad unit. If it is bad memory its going to be a very tricky issue to nail down.

mikec 2009-11-27 22:40

Re: N900 Unexpected reboots
 
Guys

if you read the Bug report you will see that the Nokia Engineer is asking for information. Please try and provide to help pin this down

=============from bug report==================
If the reboot happens with pre-installed or flashed 42-11, the most important
information is what this command outputs in XTerm:
cat /proc/bootreason

If the reboot reason is "sw_rst" (instead of "32wd_to"), please provide also
output from this command:
cat /var/lib/dsme/stats/*

(i.e. which crucial system service going down needed device reboot.)


(In reply to comment #3)
> Two random device reboots here so far. Bootreasons sw_rst and 32wd_to,
> both happened while browsing the web. Unable to reproduce.
>
> ~ $ cat /var/lib/dsme/stats/lifeguard_resets
> /usr/bin/Xorg -logfile /tmp/Xorg.0.log -logverbose 1 -n: 1

This Xorg related sw_rst is most likely something we know about (hairy & very
rare SGX 3D driver memory management issue) and should be fixed in the next
release.

If they continue after the next release, please provide us Xorg core-dumps with
Crash reporter (they could be useful also before that).


The 32wd_to (HW watchdog rebooting the device due to not it being updated) is
more worrying.

For 32wd_to reboots we need to know when exactly they occur:
- After device goes to sleep (screen blanks etc)?
- When device wakes / is woken up from sleep?
- When device is being used?
-> If yes, do the reboots happen also in offline mode?

If (32wd_to) reboots happen only when NOT using offline mode, we need to know
whether this happens only in some particular networking environment.

I.e. if you for example have reboots at home, do they happen somewhere else
(e.g. at work) where you use a different WLAN/phone access point (with
potentially different power management etc settings). If they happen only with
some specific access points, please file a separate bug about that and provide
the exact model of that access point and whether you've changed any of its
default settings (or device default connectivity settings).

===========================================


All times are GMT. The time now is 13:03.

vBulletin® Version 3.8.8