maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   Troubleshooting (https://talk.maemo.org/forumdisplay.php?f=6)
-   -   how to troubleshoot boot sequence? (https://talk.maemo.org/showthread.php?t=16056)

z2n 2008-02-02 08:05

how to troubleshoot boot sequence?
 
I'm running the latest version of OS2008 on my N810, and having consistent problems booting from the internal mmc card.

The boot process hangs at the "Nokia" splash screen--the progress bar fills the screen, but nothing happens after that. The problem is consistent with or without the charger in place, and the tablet fails to continue booting after several hours.

I don't think there's any problem with the mmc format or data--booting from the internal flash allows me to acces the mmc card with no errors.

Here's what I've done to try to trace the boot process:
  • Changed the bootmenu.conf script to save the "dmesg" output to /tmp. There are no errors shown in the dmesg output.

  • Added an "echo" statement to each of the /etc/init.d scripts, logging each script name to a file in /tmp, in order to determine if the boot process was hanging in a script. Nothing is logged, but this may be because files written to mmc2:/tmp when booting from mmc2 are not available when I later reboot from internal flash to view the logfile.

So, are there any suggestions for additional ways to trace and troubleshoot the boot process? What happens in the Maemo boot process after the init scripts complete that can cause the process to hang, and how can I get more detail on what step is hanging?

Thanks!

fanoush 2008-02-02 11:32

Re: how to troubleshoot boot sequence?
 
instead of editing each script in /etc/init.d you may try to edit /etc/init.d/rc and add debug code to startup() function like this

Code:

#
# Start script or program.
#
dbgout(){
chroot /mnt/initfs text2screen -x 0 -y 60 -w 800 -h 20 -c
chroot /mnt/initfs text2screen -s 2 -H center -y 60 -T 0 -t "$@"
}

startup() {
dbgout "$@"
  case "$1" in
        *.sh)
                $debug sh "$@"
                ;;
        *)
                $debug "$@"
                ;;
  esac
}

then you see nicely what runs at startup and where it stops

you can also install syslog and see /var/log/syslog after unsuccessful boot

to install something to nonbooting system first boot working system, connect to network, mount nonbooting system and chroot to it
Code:

mount /dev/mmcblk0p2 /opt
chroot /opt

then you can 'apt get install sysklogd' from maemo repository, exit shell, unmount it and try to boot it again, wait until it hangs or reboots, boot working system, mount bad system and see end of /var/log/syslog

If there is no /var/log/syslog you may create it first time (not sure about this)

z2n 2008-02-03 04:20

Re: how to troubleshoot boot sequence?
 
Thanks for responding so quickly!

Quote:

Originally Posted by fanoush (Post 137088)
instead of editing each script in /etc/init.d you may try to edit /etc/init.d/rc and add debug code to startup() function like this

OK, that'll be a big help. Editing the individual scripts was no problem...I scripted the process.


The debugging shows that all of the scripts are called...now the display freezes with a screen that shows:


Booting from mmcint1 ...
/etc/rc2.d/S99zzinitdone


In a change from the boot process before editing /etc/init.d/rc, there is no blue progress bar vizible at the bottom of the screen, and no "Nokia" graphic.

Do you have any suggestions about what comes next in the startup process...

Is there any way to boot the tablet in text mode (runlevel 3), or to get more debugging on the steps that follow S99zzinitdone?

SNIP!
Quote:

Originally Posted by fanoush (Post 137088)
you can also install syslog and see /var/log/syslog after unsuccessful boot

There doesn't seem to be a syslog package built for chinook...or at least it's not in any of the gronmayer repositories.

Thanks!

fanoush 2008-02-03 10:47

Re: how to troubleshoot boot sequence?
 
Quote:

Originally Posted by z2n (Post 137500)
In a change from the boot process before editing /etc/init.d/rc, there is no blue progress bar vizible at the bottom of the screen, and no "Nokia" graphic.

It is not related. On my device with same change it boots normally except overwriting one line with the message.
Quote:

Originally Posted by z2n (Post 137500)
Do you have any suggestions about what comes next in the startup process...

Is there any way to boot the tablet in text mode (runlevel 3), or to get more debugging on the steps that follow S99zzinitdone?

There is no text mode (except serial console). zzinitdone is last. Try same change on your booting system. Hands show at S50, then desktop shows and still many scripts are started when desktop is already running.
Quote:

Originally Posted by z2n (Post 137500)
There doesn't seem to be a syslog package built for chinook...or at least it's not in any of the gronmayer repositories.

It is called sysklogd.

charlie 2008-02-03 22:13

Re: how to troubleshoot boot sequence?
 
I'm getting a virtually identical set of symptoms when booting off the the internal MMC on my 810.

I've updated the /etc/init.d/rc and added a number of logging statements to help identify where booting freezes and the watchdog process reboots. Here's the tail of syslog after failed boot:

Code:

Feb  3 21:34:10 Nokia-N810-50-2 user: Starting temp-reaper-startup.sh
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Closed a client connection
Feb  3 21:34:10 Nokia-N810-50-2 user: Starting dbus-sessionbus.sh
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Closed a client connection
Feb  3 21:34:10 Nokia-N810-50-2 user: Waiting for X
Feb  3 21:34:10 Nokia-N810-50-2 user: Starting sapwood
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Closed a client connection
Feb  3 21:34:10 Nokia-N810-50-2 user: Starting matchbox
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Closed a client connection
Feb  3 21:34:10 Nokia-N810-50-2 user: Waiting for D-BUS
Feb  3 21:34:10 Nokia-N810-50-2 waitdbus[932]: trying to connect to the system bus
Feb  3 21:34:10 Nokia-N810-50-2 waitdbus[932]: got connection
Feb  3 21:34:10 Nokia-N810-50-2 user: Starting media server
Feb  3 21:34:10 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  3 21:34:12 Nokia-N810-50-2 init: Switching to runlevel: 6
Feb  3 21:34:12 Nokia-N810-50-2 DSME: Closed a client connection
Feb  3 21:34:13 Nokia-N810-50-2 exiting on signal 15

This section of the log was generated by /etc/osso-af-init/real-af-services and suggested to me that the lockup is being caused by the media server failing to start cleanly, however I get the same reboot, at the same point, even when I comment out this section of the script.

Is it possible that a process started by a previous init script has hung and caused the watchdog to restart. If so, has anybody got any pointers to tracking it down?

Thanks

fanoush 2008-02-04 09:59

Re: how to troubleshoot boot sequence?
 
Quote:

Originally Posted by charlie (Post 137754)
Is it possible that a process started by a previous init script has hung and caused the watchdog to restart.

Well, yes, sort of, but not exactly. The hardware (Retu) watchdog reboots device only if kernel hangs completely or dsme dies because then nobody pings the watchdog anymore (Retu watchdog timeout is 63 seconds, dsme pings it every ~5 seconds). Then there is dsme with its policy which is the problem here. DSME tries to startup system services and when startup fails or it starts but exits later and it is considered to be critical (by dsme), dsme gives up and switches to runlevel 6. So perhaps something dies here. Maybe the X server?

One can also boot to usb networking recovery mode, log-in, leave the shell running and then try to continue booting and examine system (via ps or whatever) when it hangs somewhere. This needs modification of bootmenu.sh to not to shutdown usb networking. Here is the change (in bold) for binding it to menu key
Code:

while true ; do
        key=`evkey -u -t 100000 /dev/input/${EVNAME}`
        [ "$key" = "$KEY_ESC" ] && break
        [ "$key" = "$KEY_MENU" ] && break
done
${T2S} -c
if [ "$key" = "$KEY_ESC" ] ; then
killall dropbear
killall utelnetd
#sleep 1
ifconfig usb0 down
umount /dev/pts
rmmod g_ether.ko
fi

Root filesystem gets switched for this shell too so don't be confused, it should still work.

Also if system reboots then one also needs modification to stop doing this (/etc/init.d/minireboot).

charlie 2008-02-05 07:05

Re: how to troubleshoot boot sequence?
 
Thanks, that approach is working and letting see further into the boot sequence.

I've updated bootmenu.sh (but not disabled minireboot yet). Here's the process list just before (within 1 sec of) the boot failing and restarting

Code:

  PID  Uid        VSZ Stat Command
    1 root      1468 SW  init [5]
    2 root            SWN [ksoftirqd/0]
    3 root            SW  [watchdog/0]
    4 root            SW< [events/0]
    5 root            SW< [khelper]
    6 root            SW< [kthread]
  16 root            SW< [dvfs/0]
  67 root            SW< [kblockd/0]
  68 root            SW< [kseriod]
  81 root            SW< [OMAP McSPI/0]
  88 root            SW< [ksuspend_usbd]
  91 root            SW< [khubd]
  115 root            SW  [pdflush]
  116 root            SW  [pdflush]
  117 root            SW< [kswapd0]
  118 root            SW< [aio/0]
  121 root            SW< [mipid_esd]
  246 root            SW  [mtdblockd]
  287 root            SW< [kondemand/0]
  288 root            SW< [kmmcd]
  300 root            SW< [krfcommd]
  313 root            SW< [mmcqd]
  345 root      1084 SW< dsme -d -l syslog -v 4 -p /usr/lib/dsme/libstartup.so
  350 root        564 SW  /usr/sbin/kicker
  355 root        776 SW  /usr/bin/bme_RX-44
  576 root        152 RW  /usr/sbin/utelnetd -l /bin/sh -d
  585 root        376 SW  /usr/sbin/dropbear -d /tmp/dropbear_dss_host_key -r /
  741 root      1044 SW  /bin/sh
 1406 root            SW< [cx3110x]
 1459 root      1576 SW< /sbin/udevd --daemon
 1660 root      1540 SW  /sbin/syslogd
 1714 root      1468 SW  /sbin/klogd
 1773 messagebus  1916 SW< /usr/bin/dbus-daemon --system
 1779 haldaemon  3980 SW  /usr/sbin/hald
 1780 root      2800 SW  hald-runner
 1787 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1788 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1789 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1790 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1791 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1792 root      2436 SW  /usr/lib/hal/hald-addon-omap-gpio
 1793 haldaemon  2508 SW  hald-addon-usb-cable: listening on /sys/devices/plat
 1794 root      2940 SW  hald-addon-input: Listening on /dev/input/event2 /dev
 1795 root      2436 SW  /usr/lib/hal/hald-addon-mmc
 1796 root      2436 SW  /usr/lib/hal/hald-addon-mmc
 1798 root      2952 SW  /usr/lib/hal/hald-addon-cpufreq
 1825 root      3636 SW< /sbin/mce --force-syslog
 1828 messagebus  3324 SW  /usr/lib/gconf2/gconfd-2
 1875 user      1312 SW< /usr/sbin/temp-reaper
 1879 user      1916 SW< /usr/bin/dbus-daemon --session
 1885 user      6776 SW< /usr/lib/sapwood/sapwood-server
 1890 user      5760 SW< /usr/bin/matchbox-window-manager -theme echo -use_tit
 1903 root            SW< [dsp/0]
 1906 root            SW< [dsp/0]
 1909 root      2952 SW  /usr/sbin/dsp_dld -p --disable-restart -c /lib/dsp/ds
 1917 root      2792 SW< /usr/bin/bme-dbus-proxy -N
 1980 root      4804 SW  /usr/sbin/multimediad
 1987 root      2176 SW< /usr/bin/esd
 1994 root      1960 RW  ps

Being a newcomer to maemo, I'm not sure what should be running at this point and no obvious problems stand out to me - any advice welcomed!

I'll disable minireboot next and see what results that gives.

fanoush 2008-02-05 08:29

Re: how to troubleshoot boot sequence?
 
There is no X server running (/usr/bin/Xomap), this is fairly critical. At least matchbox window manager is already started so X server should be already up too. I think this is the line
Code:

Feb  3 21:34:10 Nokia-N810-50-2 user: Waiting for X
in previous dmesg output. It waits and after some time (perhaps when matchbox times out when trying to connect to display) it gives up. Try to run /etc/init.d/x-server by hand to see possible errors.

charlie 2008-02-05 09:02

Re: how to troubleshoot boot sequence?
 
I can see how that might be classed as fairly critical! :)

I'll see if manually restarting X gives any clues.

Thanks.

charlie 2008-02-05 21:20

Re: how to troubleshoot boot sequence?
 
1 Attachment(s)
Well, sure enough X is exiting - here's the evidence from syslog:

Code:

Feb  5 20:28:06 Nokia-N810-50-2 DSME: Closed a client connection
Feb  5 20:28:06 Nokia-N810-50-2 DSME: process '/usr/bin/Xomap -mouse tslib -nozap -dpi 96 -wr -nolisten tcp' with pid 1014 exited with return value: 1
Feb  5 20:28:06 Nokia-N810-50-2 DSME: '/usr/bin/Xomap -mouse tslib -nozap -dpi 96 -wr -nolisten tcp' exited with RESET policy -> reset
Feb  5 20:28:06 Nokia-N810-50-2 DSME: Here we will request for sw reset
Feb  5 20:28:06 Nokia-N810-50-2 DSME: Here we could do some bookkeeping..
Feb  5 20:28:06 Nokia-N810-50-2 user: Starting temp-reaper-startup.sh
Feb  5 20:28:06 Nokia-N810-50-2 DSME: Accepted new client connection
Feb  5 20:28:06 Nokia-N810-50-2 DSME: Closed a client connection

I've attached a zip containing the full syslog from this boot (truncated at the point where matchbox is repeatedly restarted).

Manually starting X (executing "/usr/bin/Xomap -mouse tslib -nozap -dpi 96 -wr -nolisten tcp") produces the following:

Code:

The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Multiple names for keycode 138
>                  Using <I138>, ignoring <PROP>
> Warning:          Multiple names for keycode 140
>                  Using <I140>, ignoring <FRNT>
> Warning:          Multiple names for keycode 211
>                  Using <I211>, ignoring <AB11>
Errors from xkbcomp are not fatal to the X server

X appears to continue running, however there are no visible changes on the tablet screen at this point and nothing further is reported to the shell connection.

Any other suggestions, or should I give up on this installation and roll back to a previous backup?


All times are GMT. The time now is 07:08.

vBulletin® Version 3.8.8