CorkSport

User Tag List

Results 1 to 12 of 12

Thread: Unplanned outage

  1. #1
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default Unplanned outage

    A power interruption took down the VIC datacentre this morning. Unfortunately I didn't spot it before I left for work, so was unable to restart the systems until lunch. Sorry for the downtime. I think we've used up about half of this years downtime for the site so far since moving (target is 99.5% uptime)

    This was caused by an uninterruptible power supply that has a minor fault : It is not producing a high enough voltage on the output (210 instead of 240 Volts) - as a result a short brownout can sometimes cause the server to stall and panic-stop.

    It's a known problem and the problem UPS is intended to be replaced in the near future.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x7.5+48 Enkei RPF1 - 225/45R18 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - 2XS turbo manifold - Hypertech tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - PC based GPS and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 225/40R18 Federal RS-RR - CPE TMIC - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift.

  2. #2
    Dog
    Dog is offline Senior Member
    Join Date
    Sep 2016
    Posts
    170

    Default

    You know the saying...."shit happens"....?

  3. #3
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Know it well. It usually isn't well received in Info Tech circles.
    We are likely to have occaisional interruptions like this until that UPS is replaced. Turns out it isn't faulty - it does scale the output up to 240V under load, but I don't like the design of it now so I won't be buying the same brand again. It'll do for light loads but obviously it can't hack what it's currently supporting, even though that's only 30% load max.

    There will be some more brief outages as I move stuff around, re-cable and perform maintenance. It's likely to settle over the next few months. Unlikely I'll manage 99.5% uptime for ozmps this year, but will aim for it all the same
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x7.5+48 Enkei RPF1 - 225/45R18 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - 2XS turbo manifold - Hypertech tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - PC based GPS and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 225/40R18 Federal RS-RR - CPE TMIC - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift.

  4. #4
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Another this afternoon, about 2:30 till 6:30 or so. Significant power fluctuation on the Bellarine peninsula I expect - We lost power for a second at my work office building in Geelong. I suspect it was several minutes at the datacentre. Couldn't do anything about it till I finished work. Three of four UPS gave out this time. If the faulting one had held out we may have stayed online.

    I'll be sticking this and another thread for planned outages.

  5. #5
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Some poor performance noted today and late last night. At this stage not expected to be related to router kernel updates that were performed last night but I may test that theory if I decide it's warranted. I'd suggest presently that another external web service that is integrated with our site is having some performance issues that is making for some slow page loads.

  6. #6
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Internet outage in Geelong and Bellarine peninsula impacting VIC datacentre. 11:50PM Sat --> 12:20AM Sun. Now back online.

  7. #7
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Unplanned outage today, apologies. Site was taken offline as a precaution due to an unexpected change in behavior of the administration tools. This type of malfunction is a red flag for admins, hence the site taken offline to be checked. After investigating and researching, I've not identified a hack and have reset the affected portion of the site software, which resolved the problem. Documentation suggests that this behavior is not unheard of for this software, although I still see if as a red flag.

    It could also have been induced by bug or admin error.

    For the record, In addition to periodic backups I keep snapshots of the ozmps site that are readily available. I can revert the entire site to a snapshot in seconds. In this case the most recent snapshot was a bit over a week ago. In the event that something is compromised, the quickest recovery path is to roll back to a previous clean snapshot. While the forum was closed today the red-flagged system state was frozen while the site ran off the previous snapshot.

    Usually the snapshots are a little more often than that, however the thing to note is that if I revert to a snapshot and make the site live, everything entered into the site over that period - between the snapshot and the current time is unavailable and potentially lost. It can be restored from backup too, however this is a more lengthy and annoying process.
    Last edited by Nexus; 24-06-2017 at 08:48 PM.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x7.5+48 Enkei RPF1 - 225/45R18 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - 2XS turbo manifold - Hypertech tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - PC based GPS and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 225/40R18 Federal RS-RR - CPE TMIC - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift.

  8. #8
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Unplanned outage today - from about 4:30PM AEST through to 8:30PM; the Toowoomba datacentre was taken offline by an upstream provider fault. This was a minor outage for ozmps and 50% of users may have noticed some page load failures and possibly source unavailable failures. I anticipate that most of the failures should have been fairly quickly replaced by successful connections.

    I'd appreciate any feedback from anyone who saw fault behaviour. ozmps is published via two public IP addresses. In this case one of the two IP's being down it's anticipated that users browsing will fall back to using the one that is online. In this case, the online site was where the source ozmps server lives, so the expected impact on users was fairly minimal and at least half of users are expected not to have experienced an outage. The half of users that might have noticed I'd like to know how the system behaved as an affected user.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x7.5+48 Enkei RPF1 - 225/45R18 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - 2XS turbo manifold - Hypertech tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - PC based GPS and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 225/40R18 Federal RS-RR - CPE TMIC - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift.

  9. #9
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Internet faults similar to yesterday evening persist. At this stage it doesn't seem to be preventing access to ozmps, as the VIC datacenter appears stable. QLD datacenter however is flapping up and down like a yoyo. Appears to be a widespread internet disruption.

  10. #10
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    VIC datacenter is having some internet reliability issues. Appears to be related to rainy weather here on the Bellarine. Another example of why the copper NBN network is rubbish compared to the original all fibre plan.
    QLD operates on fibre optic, VIC on copper.
    Fault is causing short dropouts accessing ozmps. You may see timeout messages or source unreachable messages while browsing. Generally the site will be accessible again after a short while.

  11. #11
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Confirmed tonight that short outages are weather related. Just had a few minutes downtime coinciding with rain. I've noticed the pattern and it is very convincing. I expect the signal quality (cable) is varying with moisture and humidity and when there's wet the signal quality drops below the thresholds established when the link was created so the connection drops and reconnects. Last outage lasted about three minutes. ozmps is technically offline while this happens and it affects nearly all ozmps users who try to open a page while the link is down.

  12. #12
    Nexus's Avatar
    Nexus is offline Administrator
    Join Date
    Nov 2008
    Location
    Toowoomba
    Age
    44
    Posts
    1,322

    Default

    Overnight outage from early hours of the morning ~3:00AM through to about 7:00AM this morning AEST. Likely caused by a power fluctuation - The UPS has been replaced and reliability has been much better since, however the PCI-ex chassis is very sensitive to power fluctuations and even a UPS switching from battery back to mains could potentially have induced a system lockup. Long term I'll consider a double-conversion UPS which doesn't fluctuate the output when switching to battery ($$$$) and before that I'll try out independent UPS for the pci-ex chasses and the rack server, and daisy chaining the second UPS if behaviour tests OK. Fault is unlikely to occur more than three or four times a year. Beyond that a new server will be added at some point and some of the web services migrated to that, with the pci-ex chassis and associated server becoming more dedicated to recording studio.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x7.5+48 Enkei RPF1 - 225/45R18 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - 2XS turbo manifold - Hypertech tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - PC based GPS and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 225/40R18 Federal RS-RR - CPE TMIC - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •