maemo.org - Talk - N900, ohmd, syspart, VM & swap tweaks

maemo.org - Talk (https://talk.maemo.org/index.php)

- Nokia N900 (https://talk.maemo.org/forumdisplay.php?f=44)

- - N900, ohmd, syspart, VM & swap tweaks (https://talk.maemo.org/showthread.php?t=71115)

N900, ohmd, syspart, VM & swap tweaks

Hi all,

this post is intentionally kept light & clear in order to have a scratch page to be updated only with sure and certain findings and conclusion. My goal would be to find the better configuration (if any) of system files for a specified use-case. When something is being explicitely working (I mean, lot of people report it works for their use case), we can put that single item in a wiki.

Here's the usual
WARNING FOR EVERYBODY - MANY OF THOSE TWEAKINGS COULD LEAD TO A NON WORKING N900 AND A REFLASH TO RESTORE IT IN CASE OF MISTAKES - SO BE AWARE!!!

Now you can go on reading, you've been warned

STILL NO DATA CONFIRMED NOR AVAILABLE

Things under investigation:
VM tweaks
SYSPART / OHMD tweaks
SCHEDULER/SYStemBLOCK tweaks
OTHER tweaks

Re: N900, ohmd, syspart, VM & swap tweaks

First of all, if a single document speaking about the aforementioned matters does exist, please point me in the right direction, it will be very nice from you! After having said that, now, this post is a long one.
Lot of comments had been posted during last year and a half, together with scripts, scripts collections, programs then completed with GUI (thank you DeBernardis & Saturn!!! and all the other that contributed, I hope to have thanked all of you when I found something useful), but until today I did not find a resume with subsequent explanations and conclusions on many detaiils expressed in the subject. One thing on we all could agree - I think - is that most of the lagginess problems that arise on our beloved 900s are due to memory constraint. Probably with 512M of rams, the quantity of threads around full of people wanting to smash the phone against a wall would be 1/10 or 1/100 in respect of as today, knowing that anyway the VSYNC problem will make our 900s feel slow perhaps forever (or until Stskeeps changes his mind and decides to try finally to compile vsync against 2.6.28 kernels... :) I know I know, many times you said it is almost impossible to too many changes required in the source, but we can always hope!).
So said, I tried in a year really LOT of this tweaks. I am a curious person, and I learnt lot of things about kernel internals of VM and the like. On one side it has been rather rewarding, sometime frustrating, but at the end is a week now that I really feel that 'urge to change something' going down. To verify it is not a placebo effect, I used this morning half a day my 900 in its original configuration (not from scratch, with the full bloat of software I have installed only with no tweaks applied in its standard configuration except the overclock), and now I can rather firmly claim that for my use case the difference does exist. Reverted now happily to tweaked configuration.
I also made some rough benchmarks, and often the results from those benchmarks left me wiithout clear ideas. So i decided the final judge was the use I made of the phone day by day. Such decision on a side is very important because inspite everything we could say, we have a n900 to USE it, not only to hack with it. On the other side, it is rather hard to stay subjective when travelling in the feeling realm, and I decided to write this post in order to find some other testers willing to compare opinions.
I installed some test tools also from the tools repository, being IOSTAT perhaps the most important. I modified Conky in order to have dirty pages, writebacks and uninterruptible processes updated once per second.I started working in a systematic manner around a month ago, but decided not to share anything until the point I had some clear ideas on what I wanted and was looking from/for those experiments. For working, I mean not only try to change something and to say: -"yes, it feels better". I mean methodically change 1 parameter, fire a test script with 128M dd to and from swap partitions, fire at least three memory hogs (browser with standard pages with flash AD, maps and mobilestellarium for example) keeping htop and my modified conky running in the background all the day. When testing, this modified conky+htop alone keeps the sistem load around 1 when screen is on and Xorg working, so I hope the stress test is good enough. Batteries never got to the 4 hours mark while testing, with phone always warm... I cannot tell how many times my phone rebooted under such high loads with modified parameters, and especially the number of phone calls i lost while doing those test :)!!!

I think a definitive conclusion will be almost impossible to achieve, because VM organization and prioritizing is not a simple matter, and a good part of that is pure math. But at least to achieve some confidence that if I use my N900 as a media server, for example, some modifications will be helpful, I think that's a reasonable goal!

Next post will resume my tests. The idea is to keep the thread clear as much as possible in order to collect all infos in the first post. So please, if somebody would like to join and share his experiences, please try to follow the scheme I am explaining:
SETUP:
N900
mmc yes//no which one
stock/modified kernel which one
USAGE
short description of your use case
TWEAKS APPLIED
divided in the area where they affect
WHY THOSE TWEAKS?
Here is the trickyest part. It would be nice to explain WHY and HOW you get to the conclusion that the modifications make the n900 feel better, technical background and kind of response (feeling, stress test, benchmark...)

So let's start, hoping somebody will follow me in this crazy job. After all the work I did, the feeling is that Nokia engineers did a very good job on their part, inspite of some comments stating the opposite. Keep in mind they have to provide a resilient machine instead of a top-performing one optimizing what they have and, keeping in mind the kind of device the 900 is, I think they did really a great job

Re: N900, ohmd, syspart, VM & swap tweaks

Everything said, here follows a resume of my experiences so far.

SETUP:
N900
8Gb class 6 uSD card
Power kernel, std overclock 850 (sometimes I up to 1100 while watching a quick video or using heavily Gnumeric)

USAGE:
Browser (maemo.org, home banking, other forums, no flash video and flash adverts blocked), 4 online IM accounts OR (mutually exclusive) bluetooth tethering for my PC, mediaplayer for OGGs, Sygyc maps, games from time to time, calendar and obviously PHONE

TWEAKS:

VM
Swap on MMC

modified /proc/sys/vm as follows:

swappines 70

dirty_ratio 8

dirty_background_ratio 4

vfs_cache_pressure 1000

page-cluster 6

oom_kill_allocating_task 1

min_free_kbytes 4096

MMC QUEUE
modified sys/block/mmcblk1/queue

nr_requests 32

read_ahead_kb 512

OHMD/SYSPART (modified /usr/share/policy/etc/rx51/syspart.conf)
Partititions:

partition desktop memory-limit 70M

partition desktop cpu-shares 4096

partition active_ui memory-limit 130M

partition standby_ui memory-limit 95M

partition background memory-limit 25M

Rules:
i have flashlight-extra and panorama installed, therefore I added in
[classify camera]
/usr/bin/flashlight-extra
/usr/bin/panorama

[classify desktop]
removed /usr/bin/matchbox-window-manager
added /usr/bin/hildon-sv-notification-daemon

[classify mediarend]
added /usr/bin/matchbox-window-manager

[classify mediasrc]
removed /usr/bin/hildon-sv-notification-daemon

OTHER
modified WSEGL_UseHWSync=1

---------------

MODS EXPLANATION

Swap on uSD. I think there is still a lot to understand about this tweak. I made someDD benchmarks on both eMMC (on an ext3 partition) and card, and i can tell the eMMC is slightly faster than my class 6 memory card when reading but slower when writing. Roughly, timings with 128Mb files states the speed at approx 1,2 MB/sec write and 14 MB/sec read. Probably reading times are affected of buffer kicking in at the start of reading, but then we should also consider processes running, system overhead and everything else. The internal memory figured around 0,95 MB/s and 16MB/sec
This difference in speed (eMMC vs uSD, eMMC slower 20% reading and faster 10% reading) is almost confirmed, albeit with different figures, under real usage as swap partitions. IOSTAT near usage peaks during the day showed the top figures to be 180 KB/s reading and 17KB/s wiriting for the eMMC and respectively 160 and 20Kb/s for the uSD
The changes in VM management have been tested almost by feeling, ranging through the whole scale of values. Tried swappiness 0, dirty ratios very low and very high, different philosophies as explained in so many threads on TMO. The final setting have been tuned by feeling, after having decided to keep memory clear via dirty ratios very low - I noticed on Maemo never more than 2 PDFlush threads are spawned probably due to the bandwith bottleneck, so having too much memory to free at once is not a good idea while suddendly you need ram. With a dirty_background_ratio of 3 or less, I noticed the nr_dirty value being always 0 and values of nr_dirty_writebacks increasing every moment I did something, so probably it was the moment when it became too aggressive. Same procedure had been followed for dirty_ratio setting. It is mentioned from many sources that some kernel version have a lower bound of 5 at these values, but this seemed not the case for my kernel.
I would like to have some confirmations or negations on what follows, because it is what I understood from looking on the internet on VM settings: Vfs_cache pressure had been greatly increased in order to convince the kernel to almost always decide to discard filesystem cache in favour of free ram. With a solid state memory there is little penalty reading the values again, it's better always to look for datas then start swapping. Reading datas from cache or looking them all around the disk take the same time. I prefer to keep in memory some useful page of user programs instead of their data with so low ram and no penalty. To speed up this process, the scheduler for uSD had been swapped to NOOP in order to put datas just as they are ready and not to use any extra computational power. We don't have moving parts and we don't need elevators!
Since the HW size of block of uSD is reported from my kernel to be 512K, and every single page is 4K, I set page cluster set to 7. It means that the kernel will try to swap pages in groups of 2^7=128 pages*4KB each=512; I wonder if I am missing something because Nokia engineers could not have mistaken this. If anyone has a pointer it will be very well accepted. Lastly, min_free_kbytes had been slightly increased and the OOM activated, but to be honest after the modifications in syspart.conf I never saw it kicking in. The uSD nr_request had been lowered after having read the comments on I/O pressure. In my use cases it made no percetible difference and benchmarks does not show difference, too. Read-ahead equalled to the dimension of phisical block. Sincerely, r/w benchmarks did not show any difference changing these values and I don't know if they are of any benefit globally.
Then come ohmd/syspart modifications. I have to say that these, combined to moving the swap to uSD, are the things that made my day. Don't know if everybody has the same tremendous slowness when a notify has to arrive. You know, those 4/5 seconds in which you see your phone stop all of its activities and you wait to feel it rumble and when it eumbles you know that - if everything goes well - in 3 or 4 seconds an annoying yellow baloon will appear on the top of the screen. It could be an SMS, or an IM, who knows? If in this moment somebody decide to phone call you and you have an uptime of more than 10 hours and perhaps one or two browser windows open in the background, no way you will be able to respond and you will have to recall that unfortunate girls who had that urgent necessity to hear your voice...
Nokians decided to have the notification daemon in the group of essential services, such as telepathy or gstreamer. Well, if I see a yellow baloon 5 seconds later, it is not a problem for me, is it? So i slightly reorganized the assignment of syspart partitions (and thus priorities), also taking the occasion to promote matchbox (the window manager). I also dedicated a little bit less memory to applications and a little more to desktop and essential services. Everything will be clear if you take some time to read the values i put in /usr/share/policy/etc/rx51/syspart.conf. I also reduced the CPU slice of desktop group.

FINAL COMMENTS

modifications in syspart UNTIL NOW for me had no visible drawbacks - so far so good. Feedbacks on that will be greatly appreciated
moving swap on uSD worked for me (TM) - it will be nice to understand WHY, because the difference between internal memory and card is not so big. We really need some more details from somebody who knows N900 hardware very well (STSKeeps??? Where are you?!?!?)
VM values are a balancing and probably are the most subject to be adapted to use cases. After lowering the dirty ratios and increasing vfs pressure for the aforementioned reasons, I slowly increased swappiness until the point I saw visually in real-time a certain balance between nr_dirty and nr_dirty_writebacks with system under high load.

With the settings reported here and my use case, I read 14 uninterruptible processes in the queue and processor 100% @850, with a waiting reaction time always less than 3/4s maximum. During last week only once I had to leave the n900 to settle for some minutes before going responsive again. Try to launch, without waiting states between your clicks, microb, contacts (my list is over 580 buddies), mediaplayer, angry birds, calendar, bounce evolution, mobilestellarium, gnumeric, and panorama - you get it!
But the best thing happened this morning - I was testing with tons of apps active going back and forth between them, system load was over 4, 12 D processes, processor ranging from 50 to 100%, I was messaging and the phone rang - OK, I thought, let's see who I will have to recall now... and 1 second later the phone interface appeared! At that point I decided it was the time to post on TMO :)

So that's all folks! I hope this one is only the start of a constructive process trying to understand better the internals of 900, and at the same time the start for a good 'optimization based on use cases' wiki, or best, some CSSU packages adaptation based on use cases who any user could then choose!

Cheers, everybody.

PS: please, don't blame me too much for grammar and english mistakes - english is not my native language!
EDIT - And thank you for patience if you read everything - tried to clean a little bit with formatting after vi_ suggestion

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by jurop88 (Post 969193)

Everything said, here follows a resume of my experiences so far.
SETUP:
N900
8Gb class 6 uSD card
Power kernel, std overclock 850 (sometimes I up to 1100 while watching a quick video or using heavily Gnumeric)
USAGE:
Browser (maemo.org, home banking, other forums, no flash video and flash adverts blocked), 4 online IM accounts OR (mutually exclusive) bluetooth tethering for my PC, mediaplayer for OGGs, Sygyc maps, games from time to time, calendar and obviously PHONE
TWEAKS:

VM
Swap on MMC
modified /proc/sys/vm as follows:
swappines 70
dirty_ratio 8
dirty_background_ratio 4
vfs_cache_pressure 1000
page-cluster 6
oom_kill_allocating_task 1
min_free_kbytes 4096

MMC QUEUE
modified sys/block/mmcblk1/queue
nr_requests 32
read_ahead_kb 512

OHMD/SYSPART (modified /usr/share/policy/etc/rx51/syspart.conf)
partititions:
partition desktop memory-limit 70M
partition desktop cpu-shares 4096
partition active_ui memory-limit 130M
partition standby_ui memory-limit 95M
partition background memory-limit 25M
rules:
i have flashlight-extra and panorama installed, therefore I added in
[classify camera]
/usr/bin/flashlight-extra
/usr/bin/panorama
[classify desktop]
removed /usr/bin/matchbox-window-manager
added /usr/bin/hildon-sv-notification-daemon
[classify mediarend]
added /usr/bin/matchbox-window-manager
[classify mediasrc]
removed /usr/bin/hildon-sv-notification-daemon

OTHER
modified WSEGL_UseHWSync=1

---------------
MODS EXPLANATION
- swap on uSD
I think there is still a lot to understand about this tweak. I made someDD benchmarks on both eMMC (on an ext3 partition) and card, and i can tell the eMMC is slightly faster than my class 6 memory card when reading but slower when writing. Roughly, timings with 128Mb files states the speed at approx 1,2 MB/sec write and 14 MB/sec read. Probably reading times are affected of buffer kicking in at the start of reading, but then we should also consider processes running, system overhead and everything else. The internal memory figured around 0,95 MB/s and 16MB/sec
This difference in speed (eMMC vs uSD, eMMC slower 20% reading and faster 10% reading) is almost confirmed, albeit with different figures, under real usage as swap partitions. IOSTAT near usage peaks during the day showed the top figures to be 180 KB/s reading and 17KB/s wiriting for the eMMC and respectively 160 and 20Kb/s for the uSD
The changes in VM management have been tested almost by feeling, ranging through the whole scale of values. Tried swappiness 0, dirty ratios very low and very high, different philosophies as explained in so many threads on TMO. The final setting have been tuned by feeling, after having decided to keep memory clear via dirty ratios very low - I noticed on Maemo never more than 2 PDFlush threads are spawned probably due to the bandwith bottleneck, so having too much memory to free at once is not a good idea while suddendly you need ram. With a dirty_background_ratio of 3 or less, I noticed the nr_dirty value being always 0 and values of nr_dirty_writebacks increasing every moment I did something, so probably it was the moment when it became too aggressive. Same procedure had been followed for dirty_ratio setting. It is mentioned from many sources that some kernel version have a lower bound of 5 at these values, but this seemed not the case for my kernel.
I would like to have some confirmations or negations on what follows, because it is what I understood from looking on the internet on VM settings: Vfs_cache pressure had been greatly increased in order to convince the kernel to almost always decide to discard filesystem cache in favour of free ram. With a solid state memory there is little penalty reading the values again, it's better always to look for datas then start swapping. Reading datas from cache or looking them all around the disk take the same time. I prefer to keep in memory some useful page of user programs instead of their data with so low ram and no penalty. To speed up this process, the scheduler for uSD had been swapped to NOOP in order to put datas just as they are ready and not to use any extra computational power. We don't have moving parts and we don't need elevators!
Since the HW size of block of uSD is reported from my kernel to be 512K, and every single page is 4K, I set page cluster set to 7. It means that the kernel will try to swap pages in groups of 2^7=128 pages*4KB each=512; I wonder if I am missing something because Nokia engineers could not have mistaken this. If anyone has a pointer it will be very well accepted. Lastly, min_free_kbytes had been slightly increased and the OOM activated, but to be honest after the modifications in syspart.conf I never saw it kicking in. The uSD nr_request had been lowered after having read the comments on I/O pressure. In my use cases it made no percetible difference and benchmarks does not show difference, too. Read-ahead equalled to the dimension of phisical block. Sincerely, r/w benchmarks did not show any difference changing these values and I don't know if they are of any benefit globally.

Then come ohmd/syspart modifications.
I have to say that these, combined to moving the swap to uSD, are the things that made my day. Don't know if everybody has the same tremendous slowness when a notify has to arrive. You know, those 4/5 seconds in which you see your phone stop all of its activities and you wait to feel it rumble and when it eumbles you know that - if everything goes well - in 3 or 4 seconds an annoying yellow baloon will appear on the top of the screen. It could be an SMS, or an IM, who knows? If in this moment somebody decide to phone call you and you have an uptime of more than 10 hours and perhaps one or two browser windows open in the background, no way you will be able to respond and you will have to recall that unfortunate girls who had that urgent necessity to hear your voice...
Nokians decided to have the notification daemon in the group of essential services, such as telepathy or gstreamer. Well, if I see a yellow baloon 5 seconds later, it is not a problem for me, is it? So i slightly reorganized the assignment of syspart partitions (and thus priorities), also taking the occasion to promote matchbox (the window manager). I also dedicated a little bit less memory to applications and a little more to desktop and essential services. Everything will be clear if you take some time to read the values i put in /usr/share/policy/etc/rx51/syspart.conf. I also reduced the CPU slice of desktop group.

FINAL COMMENTS
- modifications in syspart UNTIL NOW for me had no visible drawbacks - so far so good. Feedbacks on that will be greatly appreciated
- moving swap on uSD worked for me (TM) - it will be nice to understand WHY, because the difference between internal memory and card is not so big. We really need some more details from somebody who knows N900 hardware very well (STSKeeps??? Where are you?!?!?)
- VM values are a balancing and probably are the most subject to be adapted to use cases. After lowering the dirty ratios and increasing vfs pressure for the aforementioned reasons, I slowly increased swappiness until the point I saw visually in real-time a certain balance between nr_dirty and nr_dirty_writebacks with system under high load.

With the settings reported here and my use case, I read 14 uninterruptible processes in the queue and processor 100% @850, with a waiting reaction time always less than 3/4s maximum. During last week only once I had to leave the n900 to settle for some minutes before going responsive again. Try to launch, without waiting states between your clicks, microb, contacts (my list is over 580 buddies), mediaplayer, angry birds, calendar, bounce evolution, mobilestellarium, gnumeric, and panorama - you get it!
But the best thing happened this morning - I was testing with tons of apps active going back and forth between them, system load was over 4, 12 D processes, processor ranging from 50 to 100%, I was messaging and the phone rang - OK, I thought, let's see who I will have to recall now... and 1 second later the phone interface appeared! At that point I decided it was the time to post on TMO :)

So that's all folks! I hope this one is only the start of a constructive process trying to understand better the internals of 900, and at the same time the start for a good 'optimization based on use cases' wiki, or best, some CSSU packages adaptation based on use cases who any user could then choose!

Cheers, everybody.

PS: please, don't blame me too much for grammar and english mistakes - english is not my native language!

Holy WALL OF TEXT bro. Please, some formatting!

Re: N900, ohmd, syspart, VM & swap tweaks

Tried to clean up a little bit, lot of text meant lot of formatting. TY for the suggestion, didn't thought about that

Re: N900, ohmd, syspart, VM & swap tweaks

amazing post jurop88, should implement it to swappolube :)

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by vi_ (Post 969218)

Whoa. Mind. Blown.

Kaboom.

Great finds here with modifications to priority processes omhd. I've have pulseaudio et al media at a high stack priority in order to reduce jittering for a few months now - but never felt the need to play around more.

Thanks for the testing you've done. Seriously.

Re: N900, ohmd, syspart, VM & swap tweaks

Moving hildon-sv-notification-daemon out of [mediasrc] closes the socket and doesn't allow any sound?

Re: N900, ohmd, syspart, VM & swap tweaks

Made some of these changes and will see how it pans out over the next few days. I am a fairly heavy user so it will be interesting.

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by hawaii (Post 969398)

Moving hildon-sv-notification-daemon out of [mediasrc] closes the socket and doesn't allow any sound?

Why should it? As far as I understood, changing the syspart.conf just changes resources utilization on a per-process basis, and that's the whole reason for ohmd presence. So, it should simply lower their priority.
What I can affirm is that on my machine in its current state, the baloons are now delayed (also 5 or 10 seconds) while chatting, don't know about emails, but I hear both vibration and notification sounds.

Re: N900, ohmd, syspart, VM & swap tweaks

if your configurations really make the n900 snappier and more responsive without sacrificing anything else, it should really be in a wiki or included via swappolube or something...

This is just a great work that you've done. Marvellous

Re: N900, ohmd, syspart, VM & swap tweaks

As for kernel reporting mmcb blocksize as "512k", it's not. It's saying logical blocksize is 512 bytes. This is meaningless for your purposes though, it only tells you the smallest request size that the mmc will accept. Internally it then translates 512 byte write into a read-modify-erse-write cycle of 128k or 256k, whatever its true block size is.

This brings us to the "noop" scheduler issue. You are correct that there are no moving parts, but the huge blocksize calls for scheduling writes close to eachother anyway, to minimize the amount of read-modify-erase-write cycles the mmc/usd has to do.

Imagine if kernel sends request for writing 4k at position 2M, and then 4k at position 8M, and 4k at position 2M+4k, 4k at 8M+4k, and so on. Each request makes the uSD/emmc internally read 128k (assuming that's the true eraseblocksize), change 4k of that 128k, erase another 128k block, write 128k to that block. A write amplification factor of 32. You can divide your raw write rate of a nominal 6Meg/s for Class6 with 32 to get estimated 192 kilobytes/sec...
So ideally we'd want an elevator that knows about the special properties of flash. but we don't have one, so we use CFQ. which atleast has some heuristics for distributing IO "fairly" between processes.

Incidentally, this is where the explanation for why moving swap to uSD seems to improve performance begins too.

The heaviest loads for the emmc is swap, and anything that uses databases like sqlite. That includes dialer and conversations, calendar, and many third party apps. Why is this a heavy load? Because these things typically write tiny amounts of data, and then request fsync() to ensure the data is on the disk. This triggers the writeout of all unwritten data in memory, and updating all the filesystem structures. Remember that a tiny amount of data spread out randomly triggers massive amount of writing internally to the emmc. Worse, while this goes on, all other requests are blocked.
And what else besides /home and swap is on emmc? /opt. Containing, these days, both apps and vital parts of the OS. The CPU is starved for data, waiting for requests to be written out so that the requests for the executable demand-paged code of apps can complete.

Btw for Harmattab I'm told sqlite will be using a more optimized db, that essentially works like one gigantic journal. Sequential writing is fast and good on flash, random in-place updates is bad.

Moving swap to uSD gives a path for swap that is always free (well almost always unless you do heavy acesses to uSD by other means), and offloading swap from emmc means less random IO load on the emmc.

Re: N900, ohmd, syspart, VM & swap tweaks

@jurop88

lots of respect and thanks..thats fantastic and lots of mindblowing effort you have put in.

it took me 3 reads just to understand things you have tried out..

very impressive..hope u do some more r&d and we can make the n900 more better

Thanks

Re: N900, ohmd, syspart, VM & swap tweaks

Hi Shadowjk,

thank you for you participation.

Quote:

Originally Posted by shadowjk (Post 971407)

Fair enough and rather consistent with which I found on the internet. Two questions:
1) why 512k will mean 512 byte? Can you point me somewhere, also through kernel source? I just started digging on the matter, found relevant code in the mmc driver (I hope to be on the right path to understand something) but I must admit my C knowledge is rather rusty
2) where to find the true HW block dimension? Is there a place where is it reported or shall I know it directly from the uSD producer?
The 128k size, though, explains why Nokians choosed to set page-cluster to 5; 32*4=128 and that's it

Quote:

Originally Posted by shadowjk (Post 971407)

This brings us to the "noop" scheduler issue. You are correct that there are no moving parts, but the huge blocksize calls for scheduling writes close to eachother anyway, to minimize the amount of read-modify-erase-write cycles the mmc/usd has to do. Imagine... (CUT)

From Wikipedia,

Quote:

The NOOP scheduler inserts all incoming I/O requests into a simple, unordered FIFO queue and implements request merging

It means, AFAIU, that when a block is ready to be written (request merging), it is written and the memory is freed.
Wikipedia again,

Quote:

CFQ works by placing synchronous requests submitted by processes into a number of per-process queues and then allocating timeslices for each of the queues to access the disk. The length of the time slice and the number of requests a queue is allowed to submit depends on the IO priority of the given process (...) It can be considered a natural extension of granting IO time slices to a process

So, it doesn't work on a 'try to write as less blocks as possible on the uSD' level but the goal is to give all processes a time slice 'hoping' that most writing and reading will be done in the same area. I gave a quick read at the code, and it looked like an the 'elevator' part has a huge weight, allowing some trackbacks (I am not an expert in this area, so pick everything with a grain of salt). The overhead is rather consistent, and at first sight with almost no advantages in case of a IO device where no mechanical part are moving.
After having used the setting in the first page for some days, I have to say that with NOOP probably the fragmentation is bigger, but the feeling is that it works faster UNTIL IT WORKS. Another member on the forum (don't remember precisely who) set a swap rotation during the night in order to avoid this fragmentation, and I can confirm that after two days my N900 started 'choking' and a swapon/swapoff/swapon/swapoff let it fly again, in line with identifying the issue due to swap fragmentation.

Quote:

Originally Posted by shadowjk (Post 971407)

So ideally we'd want an elevator that knows about the special properties of flash. but we don't have one, so we use CFQ. which atleast has some heuristics for distributing IO "fairly" between processes.

The argument is that we don't care about 'per process' I/O but exactly 128KB writings in order to speed them as much as possible.
What we ideally need is a scheduler saying:

Code:

- kernel: we need some free room. 

- scheduler: ok let's have a look at the discardable pages. Here they are. Just a second please. 

- scheduler choose exactly 128Kb ready for writing (and that's the page-cluster tunable at a kernel level, right?)

- scheduler frees the memory requested with a single page-writing

- scheduler: here I am again, you have those requested memory free

- kernel: thank you

The fact that then lot of pages are fragmented does not matter since the reading penalty is very low compared to - for example - an HD
I have already found an example of NOOP scheduler written in C on the internet, and it does not look to much hard to implement. Here we are speaking of brute force, not high math ;) - A simple modified NOOP algorithm good for flash could look like:

Code:

- check if the page to be unloaded is already cached and not dirty or in the current queue

if yes -> load the page requested and discard the unloaded one

if no -> put in the queue the page to be unloaded and serve the page to be loaded

is the queue 128Kb? 

if yes -> write it out and update table of swapped pages

if no -> job done

I know that the real writing will be performed by the uSD HW controller, but why the hell the HW controller would split a perfect aligned 128KB writing? Your thoughts? Any kernel gurus in the neighbourhood? Am I missing something? It looks too simple in order nobody thought about it...

Quote:

Originally Posted by shadowjk (Post 971407)

Moving swap to uSD gives a path for swap that is always free (well almost always unless you do heavy acesses to uSD by other means), and offloading swap from emmc means less random IO load on the emmc.

This sounds very reasonable and is consistent with other findings.

On a side note, I am digging into the ohmd & cgroups realm and I am happy to have learnt lot of things :) - probably the parameters in the first page will be tuned again after some days of usage and having looked at the patterns arised in terms of load and memory used.
EDIT - oh, and I forgot to report this https://bugs.maemo.org/show_bug.cgi?id=6203 where many hints on ohmd & syspart are given!

Re: N900, ohmd, syspart, VM & swap tweaks

hehe it looks like I made some confusion amongst kswapd and IO scheduler - still learning a lot in this illness period :)

Re: N900, ohmd, syspart, VM & swap tweaks

Hi jurop88,

I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:

And this one ;)

Have you made any more progress?

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by ivgalvez (Post 974634)

Hi jurop88,

I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:
(...)
Have you made any more progress?

It almost was same goal on my side. Now I'm back to my job so pace had slowed, but what I can say is that Nokia's engineers already did lot of work on the subject and the phone was probably best optimized for the general use case.
I since wrote the orginal post made some slight modifications, but still not updated here. Perhaps will do it through the WE

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by ivgalvez (Post 974634)

Hi jurop88,

I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:

And this one ;)

Have you made any more progress?

You want to look at the BFS-kernel thread, mlocker-thread ( my signature ) and the 4-Line-Cgroup-Patch, too!

Re: N900, ohmd, syspart, VM & swap tweaks

> partition desktop memory-limit 70M

When I've cgroups mounted I noticed that the desktop groups only need 25M.

So, it's better to write partition desktop memory-limit 25M

or echo "25M" > /dev/cgroup/cpu/desktop/memory.limit_in_bytes.

Re: N900, ohmd, syspart, VM & swap tweaks

Has anyone ever tried the deadline scheduler and:

Code:

echo 1 > /sys/block/mmcblkX/queue/iosched/fifo_batch

? I've just compiled BFS with the config option enabled, and will be giving it a try over the next few days. I got the idea from here

Re: N900, ohmd, syspart, VM & swap tweaks

I'd also like to try the anticipatory scheduler, a lot of the Android guys have been switching over to it...

Re: N900, ohmd, syspart, VM & swap tweaks

I considered anticipatory, but it was taken out of the kernel altogether as of 2.6.33 and supposedly CFQ replaced it after 2.6.18. Also I'm never sure whether heuristics are a good idea or not.. I suppose it wouldn't hurt to add it to the config as a module, though.

Re: N900, ohmd, syspart, VM & swap tweaks

Ok, I've built the latest BFS from git tree, with both deadline and anticipatory enabled as modules within the config. Everything runs as smoothly as before, with no added overhead. To enable either one of them, you have to echo deadline or anticipatory to /sys/block/mmcblk0/queue/scheduler - this seems to automagically insert the corresponding module (probably best to rmmod if you then change back to noop or cfq at a later date, though, as they stay loaded). I did find a thread regarding anticipatory on Android and found this snippet to be quite interesting:

Quote:

For instance anticipatory scheduler focuses on contiunous read and avoiding head movement which is not the case for flash drives.

Completely different rules apply here.

The community (Linux SSD) recommends NOOP or DEADLINE schedulers, I prefered DEADLINE for it's focus on preventing read starvation.
It works the best (elevator = deadline) for my Asus EEE PC 901 with SSD drives and attached SD cards.
Writes are slower, but there's much less lag when browsing web or opening folders because "read" operations have priority.
The same seems to work on my Galaxy S with stock JM1 with or without one-click-lag-fix.

..which makes me think deadline might be the best scheduler to use, even though the N900's drive isn't exactly an SSD?

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by Tigerite (Post 1020062)

Probably because most processes hang on read operations. Giving them higher priority should speed things up for normal usage. So I think when you're downloading something and at the same time are trying to start an app etc. the device should be snappier, while you couldn't care less about the download.

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by vi_ (Post 969197)

Holy WALL OF TEXT bro. Please, some formatting!

HOLY, and you fullquote it, to add that one liner comment :-S
/j

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by jurop88 (Post 969193)

Everything said, here follows a resume of my experiences so far.
...

Thanks, just applied all your settings. See how things go. :D

Re: N900, ohmd, syspart, VM & swap tweaks

Hi all,

I've also been trying to tweak /usr/share/policy/etc/current/syspart.conf

One thing I noticed though is, sometimes the values are being applied for a little while, and then it gets overridden again by ohmd.

For example, I've tried changing the desktop cpu-shares to say 4096, then i run "stop ohmd" and "start ohmd". For a short time, I can see the value 4096 is being set by cat-ing the corresponding value in /syspart/desktop/cpu.shares

But then after a while, ohmd seems to write back the original value 6144 to it.

Any ideas?

Thanks.

Re: N900, ohmd, syspart, VM & swap tweaks

Now I don't remember exactly - I worked on that a long ago - but there is a policy file written and compiled in Prolog (?) somewhere read by ohmd - and it's responsible for that.
Long story short, what I remember is that I resigned after a lot of searching. Now I am really short on time for next month, but if you are interested in looking at that I can look for my knowledge and send everything to you - some material found on wikis, irc logs and the like

Re: N900, ohmd, syspart, VM & swap tweaks

Quote:

Originally Posted by jurop88 (Post 1179219)

...or even better send it to some of the CSSU developers/maintainers (me, Pali, merlin1991,...)

Thanks.