Active Topics

 


Reply
Thread Tools
Posts: 486 | Thanked: 154 times | Joined on Sep 2009 @ New York City
#111
it just rebooted again. 3 reboots in 7 days. just web browsing this site in MicroB when it happened. no other apps.
 
Posts: 2,014 | Thanked: 1,581 times | Joined on Sep 2009
#112
Originally Posted by hypnotik View Post
it just rebooted again. 3 reboots in 7 days. just web browsing this site in MicroB when it happened. no other apps.
Interesting. Are you running a logging script?

Wouldn't be a bad idea to vmstat at say 10 second intervals out to a file on your sdcard. Perhaps a top out to a file as well.
__________________
Class .. : Power Poster, Potential Coder
Humor .. : [*********] Alignment: Chaotic Evil
Patience : [***-------] Weapon(s): +2 Logic Mace
Agro ... : |*****-----] Relic(s) : G1, N900

 
Posts: 486 | Thanked: 154 times | Joined on Sep 2009 @ New York City
#113
Originally Posted by Bratag View Post
Interesting. Are you running a logging script?

Wouldn't be a bad idea to vmstat at say 10 second intervals out to a file on your sdcard. Perhaps a top out to a file as well.
I've done this but the file keeps getting corrupted. Maybe I need a way to netcat it to a remote server.
 
Posts: 486 | Thanked: 154 times | Joined on Sep 2009 @ New York City
#114
So are these reboots related to some kernel level issue? would userlevel processes cause a ramdom reboot? i thought worst case the web browser would crash.
 
Posts: 2,014 | Thanked: 1,581 times | Joined on Sep 2009
#115
I am wondering if we might be dealing with some bad memory chips here. Its possible I guess that when the swap/RAM reaches a certain spot in the memory whats written gets corrupted and things go pear shaped.

That might explain the random nature of the reboots.
__________________
Class .. : Power Poster, Potential Coder
Humor .. : [*********] Alignment: Chaotic Evil
Patience : [***-------] Weapon(s): +2 Logic Mace
Agro ... : |*****-----] Relic(s) : G1, N900

 
Posts: 474 | Thanked: 283 times | Joined on Oct 2009 @ Oxford, UK
#116
Originally Posted by hypnotik View Post
So are these reboots related to some kernel level issue? would userlevel processes cause a ramdom reboot? i thought worst case the web browser would crash.
The reboot note says it's due to the watchdog.

Note: The following is based on generic experience with other devices. Don't have my N900 yet

Normally you'd hook the watchdog through the kernel to one or more userspace apps, so that if userspace locks up or a critical daemon dies, it will reboot. There's no point in the system continuing run if the kernel is fine but a critical part of userspace is gone.

However, if it's due to a userspace app dieing, or closing it's watchdog file descriptor, or failing to ping the kernel watchdog device, the kernel could log which userspace app is the reason for triggering the reboot, before rebooting.

If it's due to the kernel crashing, it could not note anything, and the reboot-reason register would simply record it as watchdog hardware reboot.

So there should be enough data to tell the difference, and if not, it could be added with a kernel patch.
 
Posts: 474 | Thanked: 283 times | Joined on Oct 2009 @ Oxford, UK
#117
Originally Posted by Bratag View Post
I am wondering if we might be dealing with some bad memory chips here. Its possible I guess that when the swap/RAM reaches a certain spot in the memory whats written gets corrupted and things go pear shaped.

That might explain the random nature of the reboots.
Yes, it could.

It could also be PCB quality control, with all the signal timings being so tight, that a few PCBs don't meet them consistently and random faults occur. I've had this problem with other devices (I work in embedded Linux generally).

Also coming to mind as a potential culprit: the "swap to flash" feature, where there's 256MB RAM swapping onto 1GB flash. That's not a feature commonly used as far as I know. If there are timing-dependent or flash-layout-dependent bugs in that area, it could have the same effect as bad memory - i.e. random failures - but wouldn't be due to faulty hardware so much as natural variations in the flash response triggering software issues. If there's enough spare memory, this could be tested with, I presume, a "swapoff -a" command and then using the device long enough to see if the reboots have subsided.
 

The Following User Says Thank You to jjx For This Useful Post:
Posts: 650 | Thanked: 619 times | Joined on Nov 2009
#118
Originally Posted by jjx View Post
Yes, it could.

It could also be PCB quality control, with all the signal timings being so tight, that a few PCBs don't meet them consistently and random faults occur. I've had this problem with other devices (I work in embedded Linux generally).

Also coming to mind as a potential culprit: the "swap to flash" feature, where there's 256MB RAM swapping onto 1GB flash. That's not a feature commonly used as far as I know. If there are timing-dependent or flash-layout-dependent bugs in that area, it could have the same effect as bad memory - i.e. random failures - but wouldn't be due to faulty hardware so much as natural variations in the flash response triggering software issues. If there's enough spare memory, this could be tested with, I presume, a "swapoff -a" command and then using the device long enough to see if the reboots have subsided.
That's insightful... random failure really takes experience to isolate the root causes, and the last few posts may point out a few areas that Nokia can look into.

This random reboot issue is the most critical among the issues I've read about n900 because of its randomness. I can't help but thinking Eldar was unfortunately right about n900 being not stable... although the instability doesn't seem to affect every n900, Eldar did have a point as we can see now.
 
Posts: 2,014 | Thanked: 1,581 times | Joined on Sep 2009
#119
Originally Posted by sony123 View Post
That's insightful... random failure really takes experience to isolate the root causes, and the last few posts may point out a few areas that Nokia can look into.

This random reboot issue is the most critical among the issues I've read about n900 because of its randomness. I can't help but thinking Eldar was unfortunately right about n900 being not stable... although the instability doesn't seem to affect every n900, Eldar did have a point as we can see now.
Yep - I guess its possible he may have gotten a bad unit. If it is bad memory its going to be a very tricky issue to nail down.
__________________
Class .. : Power Poster, Potential Coder
Humor .. : [*********] Alignment: Chaotic Evil
Patience : [***-------] Weapon(s): +2 Logic Mace
Agro ... : |*****-----] Relic(s) : G1, N900

 
mikec's Avatar
Posts: 1,366 | Thanked: 1,185 times | Joined on Jan 2006
#120
Guys

if you read the Bug report you will see that the Nokia Engineer is asking for information. Please try and provide to help pin this down

=============from bug report==================
If the reboot happens with pre-installed or flashed 42-11, the most important
information is what this command outputs in XTerm:
cat /proc/bootreason

If the reboot reason is "sw_rst" (instead of "32wd_to"), please provide also
output from this command:
cat /var/lib/dsme/stats/*

(i.e. which crucial system service going down needed device reboot.)


(In reply to comment #3)
> Two random device reboots here so far. Bootreasons sw_rst and 32wd_to,
> both happened while browsing the web. Unable to reproduce.
>
> ~ $ cat /var/lib/dsme/stats/lifeguard_resets
> /usr/bin/Xorg -logfile /tmp/Xorg.0.log -logverbose 1 -n: 1

This Xorg related sw_rst is most likely something we know about (hairy & very
rare SGX 3D driver memory management issue) and should be fixed in the next
release.

If they continue after the next release, please provide us Xorg core-dumps with
Crash reporter (they could be useful also before that).


The 32wd_to (HW watchdog rebooting the device due to not it being updated) is
more worrying.

For 32wd_to reboots we need to know when exactly they occur:
- After device goes to sleep (screen blanks etc)?
- When device wakes / is woken up from sleep?
- When device is being used?
-> If yes, do the reboots happen also in offline mode?

If (32wd_to) reboots happen only when NOT using offline mode, we need to know
whether this happens only in some particular networking environment.

I.e. if you for example have reboots at home, do they happen somewhere else
(e.g. at work) where you use a different WLAN/phone access point (with
potentially different power management etc settings). If they happen only with
some specific access points, please file a separate bug about that and provide
the exact model of that access point and whether you've changed any of its
default settings (or device default connectivity settings).

===========================================

Last edited by mikec; 2009-11-27 at 22:47. Reason: added information request from bug report
 

The Following 7 Users Say Thank You to mikec For This Useful Post:
Reply

Tags
lack of quality control


 
Forum Jump


All times are GMT. The time now is 02:06.