![]() |
Infrastructure maintainance on 19.11.
Hi everybody,
sorry for the short notice but we will do some heavy maintainance to the maemo.org infrastructure tomorrow, starting at 10:00 CET (09:00 UTC). All systems will be affected. We expect to be down for at least 6 hours as we do upgrades on the underlying hypervisors. What we will do:
Sorry for any inconvenience this might cause. Best, Falk |
Re: Infrastructure maintainance on 19.11.
Thanks for notificatiln.
@tmo admin possibly to be made sticky on overall level? |
Re: Infrastructure maintainance on 19.11.
Hi everyone,
tl;dr: half of infrastucture broken, fix expected early next week, film at eleven. This maintainance didn't go to plan, here's a short post-mortem: Timeline: 10:00 - start updates and backups on blade-a 14:30 - backups and updates complete on blade-a, reboot confirmed successful 14:31 - uptime induced filesystem check after 1347 days 15:00 - start of backups on blade-b 17:12 - filesystem check complete, blade-a up and running 17:30 - first systems on blade-a confirmed up and working 18:30 - software upgrade on stage and mail complete 20:15 - backups of blade-b finished and copied onto blade-a backup space 20:16 - start of updates on blade-b 21:00 - updates on blade-b complete, reboot 21:01 - blade-b stuck in boot with corrupt bios image in flash 23:30 - all available remote recovery options tried, none working 23:40 - decision to go for Plan B, boot talk.maemo.org on blade-a, redirect everything else to talk.m.o 23:45 - blade-b turned off through IPMI 23:53 - talk.m.o available again Fallbacks in place: www.maemo.org, wiki.maemo.org, garage.maemo.org are redirected to talk.maemo.org Next Action Items: I'll visit the datacenter monday after work (around 18:00 CET) to try to recover the bios of the broken machine with a physical USB stick. If this is successful we'll migrate talk.m.o back to it's original host and reenable www.m.o, wiki.m.o, garage.m.o through DNS after the VMs and the blade are confirmed working Best, xes & falk |
Re: Infrastructure maintainance on 19.11.
My browser complaints about a wrong certificate; is this a side effect of the update? Is it temporary?
(Details: the name on the cert does not match the URL.) |
Re: Infrastructure maintainance on 19.11.
1 Attachment(s)
Quote:
A hint for all remaining N9 user: we have again no automatic network (WLAN auto/manual) detection. A nice screenshot attached (maybe later, my N9 does not let me select it :)) --edit Quote:
|
Re: Infrastructure maintainance on 19.11.
2 Attachment(s)
Let me share the screen that our Supermicro server showed to reward us for a day of work...
http://www.supermicro.nl/products/sy...cfm?parts=SHOW Then, we also discovered that Supermicro wants money to obtain a license to flash bios remotely using the IPMI. (anyway, we are not sure this could work to recovery the bios) Supermicro: really, thanks. |
Re: Infrastructure maintainance on 19.11.
Possible to replace the chip?
|
Re: Infrastructure maintainance on 19.11.
@win7mac
at the moment i can't say which is the "weight" of the problem we are facing until tomorrow Falk will make some tests while trying to restore the blade. Then, while with your personal pc / board / laptop you can try whatever you want and any hack, any trick is done because you have nothing to loose, with servers you have to enter in a different perspective where you have to consider risks, best options, time to fix, quality of result and possibility to make more damages. So, my reply is: i think that no one tries to remove a chip from a server mainboard without a spare board or without a warranty of result. |
Re: Infrastructure maintainance on 19.11.
I wasn't suggesting any tricks or hacks. Some BIOS are replaceable, but since it's not listed on that parts list, that's probably not an option. :(
|
Re: Infrastructure maintainance on 19.11.
Quote:
Best, Falk |
Re: Infrastructure maintainance on 19.11.
What's the cost of a new blade if it turns out the bios chip in blade-B is beyond recovery-by-flashing?
Are there different options on what kind of blades can be used with our server chassis, and what are the costs? |
Re: Infrastructure maintainance on 19.11.
Quote:
I am posting from work now and cannot reproduce it on my work PC (in Pale Moon on Windows 7). I could not reproduce it on my Jolla either. But that's how it showed on my daughter's Android tablet. I can try again later today when I come home. |
Re: Infrastructure maintainance on 19.11.
2 Attachment(s)
Webcat:
sfdroid browser: |
Re: Infrastructure maintainance on 19.11.
Hi. Sorry, I didn't quite follow the previous discussion but I just noticed that if I select Intro, Downloads, Development, Community, News I go directly Talk.
EDIT: Ok. I see this is mentioned in fstern's post #3. Sorry again. |
Re: Infrastructure maintainance on 19.11.
Short update from today's datacenter visit:
Quote:
Blade-b is totally broken. If I attach USB devices it just signals a different post error. I couldn't exchange blade-b with a spare blade because I couldn't remove blade-b from the chassis to exchange the CPU and memory. To exchange the two blades I have to remove the box from the rack (or at least uncable it). This is planned for this saturday, sadly I don't have time earlier. This will include a full powerdown of all maemo servers for about 30 minutes to finish work. Best, Falk |
Re: Infrastructure maintainance on 19.11.
Quote:
|
Re: Infrastructure maintainance on 19.11.
New maintenance notice.
i'm going to stop stage (repository) to create a backup copy. Due to the size of the VM this operation could last many hours. Thank you for your patience |
Re: Infrastructure maintainance on 19.11.
Quote:
To answer the actual question, blade mainboards will be swapped for now, techstaff and board will work out the details to populate the then empty slots (2) with hardware that is not doomed to fail and actually being replaced in all datacenters of people I talked to the past 3 days. We started to discuss ideas about how the setup (best fit) would look for us and so far we came up with the idea of having a 2(old):1(new) setup where 1 can take the load off both old blades, with the option to make it 2:1:1 if we got enough funds for that. We also need to increase storage capacity and it might be a good idea to replace the PSUs while we are at it. |
Re: Infrastructure maintainance on 19.11.
Maintenance Notice:
stage (repository) is up and running. |
Re: Infrastructure maintainance on 19.11.
By the way, could anyone with proper rights add a link to this topic as a TMO notice (just like the coding competition one)?
Letting non-daily TMO readers know that maemo (the infra) is not dead/dying but just under maintainance sounds like a reasonable idea to me :) |
Re: Infrastructure maintainance on 19.11.
# Maintenance Notice: MAEMO IS UP AND RUNNING.
More details about current status and work done will follow. |
Re: Infrastructure maintainance on 19.11.
What I recognized is that the maemo extra assistant page is working but is not moving the files to the autobuilder (no notice from extras-cauldron-list).
If this is already known ignore me. Thanks for the hard work which were caused by the faulty hardware. |
Re: Infrastructure maintainance on 19.11.
|
Re: Infrastructure maintainance on 19.11.
Quote:
So I guess it was just hanging somewhere. Really appreciate the overall efforts well done. |
All times are GMT. The time now is 14:23. |
vBulletin® Version 3.8.8