![]() |
[Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
I would like to announce a system monitoring solution that is developed to be lightweight and provide information relevant to mobile devices. Its based on Sailfish port of collectd, rrdtool, and the GUI SystemDataScope.
Example of the questions this solution can answer you: does my phone enter sleep? for how long? which CPU frequencies are used? when did my RAM run out? whats a battery current? how much cellular traffic did I use? All these questions are relevant for monitoring different aspects of your device performance, such as determining reasons behind battery drain, for example. It is expected that users are running this solution 24x7 without any noticeable impact on battery life, CPU, RAM, and storage usage. collectd Homepage: https://collectd.org/ Sailfish port: https://github.com/rinigus/collectd Packages: https://openrepos.net/content/rinigus/collectd rrdtool Sailfish packaging scripts: https://github.com/rinigus/pkg-rrdtool Packages: https://openrepos.net/content/rinigus/rrdtool SystemDataScope Homepage: https://github.com/rinigus/systemdatascope Packages: https://openrepos.net/content/rinigus/systemdatascope Screenshots: see SystemDataScope and collectd OpenRepos packages description. When in use, the data is recorded by collectd and stored in RRD datasets. collectd runs as a daemon that should be enabled on installation. Data can be visualized using SystemDataScope that uses rrdtool to generate graphs. SystemDataScope also shows selected graph on a cover allowing you to follow recent data (whether your device is entering CPU sleep, for example). Goal is to allow keep records that cover several time windows from hours, days, up-to a year in the default configuration. The data can be viewed on a device as well as the GUI allows you to generate the reports that you could send as a feedback on relevant forums. The collected data covers CPU usage and sleep, battery, RAM and storage, network, radios, system load and processes. In addition to overall system data, you could also follow some specific apps and see their CPU, RAM, I/O usage. This is useful if you develop an app and want to profile it. Used resources collectd has a very small CPU (~0.1% of wall time), RAM (~15MB RSS for collectd and datasets kept in RAM), and storage (~10MB for all default datasets) impact. collectd wakes up the device once in 2.5 minutes to perform the readout. SystemDataScope's main impact is through RAM usage which is ~55MB RSS that's used by GUI and rrdtool running in the background. CPU usage is negligible when minimized (redrawing of the cover once in 2 minutes) and an average when you scroll through the graphs. As such, I would expect that it has no noticeable impact. Current state In general, all is expected to work. There are several plugins that I developed for collectd and which are not merged yet with upstream. Taking into account the earlier experience with the other plugins, I expect that the recorded data may change for these plugins and the users would have to remove the old datasets recorded by these plugins. Eventually, when everything is merged with upstream such inconvenience would disappear. If something does not work, please send your bug reports via GitHub by opening an Issue or reporting here. Licenses: Open Source, see corresponding package or module. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Hello, rinigus.
Tnanks for useful application, but sometimes I see on the charts Battery_current=0 A and Battery_power_consumption=hundreds mW at the same time period(a few hours). It's strange. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
cat /run/state/namespaces/Battery/Current If that shows correct non-zero value and you get 0 in the graphs then we should look further. For that, maybe you could also post some example graphs (you could generate them with Gui/Report), your current status (Gui/Status) and /etc/collectd.conf ? You could easily do that by opening an issue on https://github.com/rinigus/systemdatascope cheers, rinigus |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
Tonight it was better for current/power_consumption, i.e. 0mA/98mW for graphs. Then, 98mW/4.15V=23.5mA. It's excessively for standbye, but this is other problem... Maybe in history value for current is other?.. I'll try for github, it's not very simple :-) I don't know how add screenshot here from PC. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
With screenshots, I usually mail them to myself and then upload from PC. But with the latest version of SystemDataScope you could generate report (Pulley menu, Report). When you call this function, all defined graphs are saved as PNG under /home/nemo/Documents/SystemDataScope/[DateTime]. You could see them under Gallery app with your photos. Select and send the ones that are needed and, after sending them, delete the folder /home/nemo/Documents/SystemDataScope. That would also clean up your Gallery from all the graphs. The main advantage of the report graphs is that they are made on white background and each file contains exactly one graph. If its easier to paste pictures on talk.maemo.org - please do so. Coming back to your problem. It could be induced by the fact that your device is in deep sleep. In general, its a very good news and the small annoyance with the graphs should not disturb us from the fact that your device is behaving as intended. If your explanation is right (which it probably is), you should see very long times under CPU details/CPU sleep details/Duration of a single suspend. Under long times I mean anything significantly longer than 150 seconds. If you are going to post the graphs, then please post also "CPU sleep" and "Duration of a single suspend". The problem that you see could be a more general issue that is hard to solve at present. In general, data is acquired by collectd in several threads and written to RRDs. On PC, where CPU is always on, that works very well. On Sailfish, I have to wakeup the device using keepalive library, readout the data, and let the device go to sleep. I suspect that in you case, and it has happened on my phone as well, sometimes phone either does not wakeup (keepalive event is not fired?) or the phone manages to fall asleep faster than the data is recorded. Since there is inevitable variability in such wakeup/sleep cycle, I had to increase the allowed time-window for RRD writing which could lead to the effect that you see. Namely, old data gets interpolated over longer period of time. If it is a problem with collectd not being able to record all the data during awake window, this may get fixed when upstream developers will help me with the port. I did submit the merge request in summer (https://github.com/collectd/collectd/pull/1736), but, due to the fact that its rather complex problem, I haven't had a chance to work on it with the developers who know how to tackle collectd multi-threaded internals :) The interpolation is a problem for the values that are recorded as just "current values". Namely, you record a datapoint and hope that its an adequate representation of a variable during that time-window. Values in statefs are representing "current values" and, as a result, could fluctuate a bit too much. I presume that's why you see sometimes so high power consumption that is later interpolated over all deep sleep time window. Fortunately, many values reported by the kernel are using additive approach. For example, kernel keeps counters on internet connections that are incremented. For these values, collectd takes derivative which would be a more accurate way to represent the network traffic irrespective on whether device was sleeping in between or not. These should be better represented in your case as well (CPU sleep, for example). I hope that this explanation is helpful and, if your explanation is right, there are no problems with your device and you managed to hit either an issue with collectd data recording or that Sailfish just ignored keepalive request and does not wakeup in between. If it is collectd problem, we'll get a chance to work on it as a part of https://github.com/collectd/collectd/pull/1736. cheers, rinigus |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Hi, rinigus.
Many thanks for this detailed explanation. I'll see and try. This story with battery is by origin from N9. It's good application for N9(and symbian devices) EnergyProfiler. I know current in standbye mode(only 2G=on) for N9 is 7...9mA(2mA for N52) and for battery 1450 mAh -> 1450/9=161h=6.7days. Aqua Fish battery have 2500 mAh and for current 23 mA I have 108.7h=4.5days only. And it's close to my practice for this moment. It's bad. I don't know why consumption is three times more but I would like to reduce consumption. I know the difference between N9 and Jolla_C, but... :-) |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
I guess obvious places to look at are: * CPU sleep %. * for how long a single sleep lasts * is sleep is interrupted frequently, check number of forks per second * check whether suspend attempts have a high success rate. if there are many failures, you can try to debug why * check distribution of CPU frequencies. usually, the lowest frequency is dominating since your phone is mainly waiting for some network package to arrive * cellular / wifi radio signal strength Good luck! rinigus |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
@rinigus Battery_current, Battery_power_consumption are stranges. Current is very low for this consumption. I'm not a programmer, I don't understand how this is possible.
https://ptpb.pw/_Fyj.png https://ptpb.pw/qSnv.png https://ptpb.pw/fhZh.png https://ptpb.pw/RD1P.png https://ptpb.pw/5HVM.png https://ptpb.pw/UMD2.png https://ptpb.pw/G_T-.png https://ptpb.pw/2W1S.png https://ptpb.pw/IG8j.png https://ptpb.pw/wCmQ.png |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
From the graphs I can see that the device wakes up once in ~150 s and spends probably about 4.5 s awake. I think that the reporting on collectd side should be fine here. Maybe on your device StateFS values are not updated during this period and that's why you see bogus 0 on current. While you stated earlier that the value is non-zero during sleep, we would have to see it by calling cat on /run/state/namespaces/Battery/Current . Tricky part is that you have to do it while its awaking in deep sleep. One way is to start terminal and enter sleep 15m; date; cat /run/state/namespaces/Battery/Current and hope that it will get fired during deep sleep part (sleep would usually be counted on awake CPU time). If its zero then there is nothing I can do - its just reported wrong by the OS. If its some other value and collectd is reporting it wrong then we would look into it deeper. Now, looking on your graphs all seems to be fine with the exception of used CPU frequency. For some reason, your phone does not use frequencies below 800MHz. On my OnePlus X and Nexus 4, the lowest frequency was used the most. I think that's where you should get some improvement. I think I saw something regarding optimization of kernel settings in TJC. You may want to check out there or some AquaFish-related forums over here. Please let us know whether the changes in governor would help. Would be great to see frequency distribution and whether it helped your battery. Even if we cannot get current graph always perfectly, battery % reduction would already tell something. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
I can not download SystemDataScope. It complains about libkeepalive-glib.rpm in repos.
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
Code:
pkcon refresh |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
I would like to make users aware of a bug reported by @ossi1967: https://github.com/rinigus/systemdatascope/issues/39 . At this stage, I don't know what causes it. However, please watch out for it on Sailfish X (maybe also some other devices?). Those of you who is experiencing something like it, please report via github or here.
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Never experienced this behaviour on the Nexus5 with this excellent piece of software!
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
New minor updates of collectd and SystemDataScope are out and add swap statistics. The swap module was suggested by @Self-Perfection, thank you!
For those interested in swap usage and I/O (probably J1 and JC users), plugin should be enabled automatically if you haven't altered /etc/collectd.conf manually. If you and want swap stats, just uncomment line https://github.com/rinigus/collectd/...ectd.conf#L186 . Swap plots will be enabled after you run new collectd for a little bit and press Generate in SystemDataScope/Settings. Plots are under Memory overview section. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
I will have to look into it, but devices with root partitions on sda (Xperia XZ2, XZ3; Pro1; OnePlus 5) will have lots of disk-activity related logs. As a result, collectd will be slow and may have impact on battery. I am planning to make fix it and probably will look into whether to update collectd stack as well.
As a workaround, you can adjust /etc/collectd.conf to have disk section as Code:
<Plugin disk> |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Updated collectd and rrdtool are out. Note that the updated collectd is built against updated rrdtool and will probably not work with older rrdtool. So, update the both of them.
Main changes are in configuration of collectd that should record now disk activity only for the main storage devices. This change was required by new SFOS ports such as Xperia Tama and, I suspect, for Pro 1, 1+5. In addition, script syncing logs between /tmp and ~nemo has been enhanced to svoid false-skipping of changed file by rsync. Was rather common on XZ2 for me. I have not decided whether to work on merging my changes with collectd upstream, as its significant amount of work that is not guaranteed to be merged. Maybe later. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Hi rinigus,
after upgrading my i4293 to 3.3.0.14 I noticed SDS and collectd had been uninstalled. Trying to reinstall resulted in error messages about nothing providing libgcrypt.so.11 which is being needed by collectd-5.5.0.git ... Is there any way around this? |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
collectd has been found to brak SFOS update to 3.3: https://together.jolla.com/question/...emove-failure/
Fix is released, but not tested on update yet. As the risk of messing up update is too large, I would recommend to uninstall SystemDataScope and collectd before doing SFOS update. If someone has tested the updated collectd with updating SFOS, please let me know whether it was successful. There are no changes in functionality, its the update in RPM install script only. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Hi rinigus,
I uninstalled/installed SDS and collectd for updating collectd to "collectd-5.5.0.git_.2020.04.29-1.19.1.jolla_.armv7hl.rpm". And "after upgrading my" JC to 3.3.0.16 "I noticed SDS and collectd had been uninstalled. Trying to reinstall resulted in error messages about nothing providing libgcrypt.so.11 which is being needed by collectd-5.5.0.git ..." |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Unfortunately, we have https://github.com/rinigus/collectd/issues/10 . Will have to wait till 3.3. hits OBS. And then I will get an issue with supporting newer and older SFOS versions :(
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
New build of collectd is available at OBS. This is not tested as I don't have my device updated yet.
OBS project: https://build.merproject.org/project...me:rinigus:sds |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
As tested and approved by @cy8aer, recompile fixed issues with collectd. Please use version collectd-5.5.0.git_.2020.05.05 if you are on SFOS 3.3.0. Otherwise, please use earlier version. The package is updated at OpenRepos.
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
rinigus, many thanks! It's OK.
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Hello.
This looks quite neat. I downloaded this because I had a strange problem that I thought monitoring might help in solving. On a couple of occasions, the device has discharged the battery from almost full to totally empty in just a few hours. This suggested a runaway process looping or otherwise chewing cycles. I realise it's probably a pain to do and maintain, but a sub level of top process CPU usage over time would help me isolate (and report) what is going on. Obviously it happens while I'm asleep Dweeb |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
As StateFS is not installed by default in SFOS 3.4.0.24, users have to install the missing StateFS support:
devel-su pkcon install statefs statefs-provider-bluez statefs-provider-connman statefs-provider-mce statefs-provider-ofono statefs-provider-power-udev statefs-provider-profile statefs-provider-qt5 Reboot after installation. After that, collectd will be able to start normally. I have asked what is expected to be a replacement for StateFS, but no reply yet. Reference: https://forum.sailfishos.org/t/state...for-stats/3751 |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
@rinigus
Do not expect an 'answer' from Jolla on FSO. If you would like to know the best/only thing is to raise that on next dev community meeting. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
1 Attachment(s)
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Quote:
|
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
New collectd and SystemDataScope have been released and should work on SFOS 4.0.1.
collectd has been synced with the upstream to version 5.12. I had to adjust daemon implementation that we use on SFOS, but it seems that it all works. As stateFS got deprecated, new plugins were written to track network and cellular signal strength. As they get the data from ofono and connman DBus API, corresponding plugins were written in Python. Turned out to be rather simple way to extend collectd (see https://github.com/rinigus/collectd/...ish/src/python for those who are interested). As some plugins are using Python now, collectd-python will be also pulled in when installing updates. This should be done automatically when you update SDS. SystemDataScope has been updated to support new ofono and connman plugins. In addition, graph definition script has been ported to python3 and a bug triggered on some phones with CPU frequency distribution stats has been fixed. |
Re: [Announce] System monitoring solution based on collectd/rrdtool/SystemDataScope
Just pushed bugfix for collectd. It now should be using the same ofono signal strength source as connman is using. Earlier, I was using strength from different ofono API access points. As a result, seems like signal strength was underestimated (assuming that connman is doing it correctly).
|
All times are GMT. The time now is 01:08. |
vBulletin® Version 3.8.8