User Tag List

Page 1 of 4 123 ... LastLast
Results 1 to 20 of 69

Thread: Unplanned outage

  1. #1
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default Unplanned outage

    A power interruption took down the VIC datacentre this morning. Unfortunately I didn't spot it before I left for work, so was unable to restart the systems until lunch. Sorry for the downtime. I think we've used up about half of this years downtime for the site so far since moving (target is 99.5% uptime)

    This was caused by an uninterruptible power supply that has a minor fault : It is not producing a high enough voltage on the output (210 instead of 240 Volts) - as a result a short brownout can sometimes cause the server to stall and panic-stop.

    It's a known problem and the problem UPS is intended to be replaced in the near future.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  2. #2

    Default

    You know the saying...."shit happens"....?

  3. #3
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Know it well. It usually isn't well received in Info Tech circles.
    We are likely to have occaisional interruptions like this until that UPS is replaced. Turns out it isn't faulty - it does scale the output up to 240V under load, but I don't like the design of it now so I won't be buying the same brand again. It'll do for light loads but obviously it can't hack what it's currently supporting, even though that's only 30% load max.

    There will be some more brief outages as I move stuff around, re-cable and perform maintenance. It's likely to settle over the next few months. Unlikely I'll manage 99.5% uptime for ozmps this year, but will aim for it all the same
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  4. #4
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Another this afternoon, about 2:30 till 6:30 or so. Significant power fluctuation on the Bellarine peninsula I expect - We lost power for a second at my work office building in Geelong. I suspect it was several minutes at the datacentre. Couldn't do anything about it till I finished work. Three of four UPS gave out this time. If the faulting one had held out we may have stayed online.

    I'll be sticking this and another thread for planned outages.

  5. #5
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Some poor performance noted today and late last night. At this stage not expected to be related to router kernel updates that were performed last night but I may test that theory if I decide it's warranted. I'd suggest presently that another external web service that is integrated with our site is having some performance issues that is making for some slow page loads.

  6. #6
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Internet outage in Geelong and Bellarine peninsula impacting VIC datacentre. 11:50PM Sat --> 12:20AM Sun. Now back online.

  7. #7
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Unplanned outage today, apologies. Site was taken offline as a precaution due to an unexpected change in behavior of the administration tools. This type of malfunction is a red flag for admins, hence the site taken offline to be checked. After investigating and researching, I've not identified a hack and have reset the affected portion of the site software, which resolved the problem. Documentation suggests that this behavior is not unheard of for this software, although I still see if as a red flag.

    It could also have been induced by bug or admin error.

    For the record, In addition to periodic backups I keep snapshots of the ozmps site that are readily available. I can revert the entire site to a snapshot in seconds. In this case the most recent snapshot was a bit over a week ago. In the event that something is compromised, the quickest recovery path is to roll back to a previous clean snapshot. While the forum was closed today the red-flagged system state was frozen while the site ran off the previous snapshot.

    Usually the snapshots are a little more often than that, however the thing to note is that if I revert to a snapshot and make the site live, everything entered into the site over that period - between the snapshot and the current time is unavailable and potentially lost. It can be restored from backup too, however this is a more lengthy and annoying process.
    Last edited by Nexus; 24-06-2017 at 08:48 PM.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  8. #8
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Unplanned outage today - from about 4:30PM AEST through to 8:30PM; the Toowoomba datacentre was taken offline by an upstream provider fault. This was a minor outage for ozmps and 50% of users may have noticed some page load failures and possibly source unavailable failures. I anticipate that most of the failures should have been fairly quickly replaced by successful connections.

    I'd appreciate any feedback from anyone who saw fault behaviour. ozmps is published via two public IP addresses. In this case one of the two IP's being down it's anticipated that users browsing will fall back to using the one that is online. In this case, the online site was where the source ozmps server lives, so the expected impact on users was fairly minimal and at least half of users are expected not to have experienced an outage. The half of users that might have noticed I'd like to know how the system behaved as an affected user.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  9. #9
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Internet faults similar to yesterday evening persist. At this stage it doesn't seem to be preventing access to ozmps, as the VIC datacenter appears stable. QLD datacenter however is flapping up and down like a yoyo. Appears to be a widespread internet disruption.

  10. #10
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    VIC datacenter is having some internet reliability issues. Appears to be related to rainy weather here on the Bellarine. Another example of why the copper NBN network is rubbish compared to the original all fibre plan.
    QLD operates on fibre optic, VIC on copper.
    Fault is causing short dropouts accessing ozmps. You may see timeout messages or source unreachable messages while browsing. Generally the site will be accessible again after a short while.

  11. #11
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Confirmed tonight that short outages are weather related. Just had a few minutes downtime coinciding with rain. I've noticed the pattern and it is very convincing. I expect the signal quality (cable) is varying with moisture and humidity and when there's wet the signal quality drops below the thresholds established when the link was created so the connection drops and reconnects. Last outage lasted about three minutes. ozmps is technically offline while this happens and it affects nearly all ozmps users who try to open a page while the link is down.

  12. #12
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Overnight outage from early hours of the morning ~3:00AM through to about 7:00AM this morning AEST. Likely caused by a power fluctuation - The UPS has been replaced and reliability has been much better since, however the PCI-ex chassis is very sensitive to power fluctuations and even a UPS switching from battery back to mains could potentially have induced a system lockup. Long term I'll consider a double-conversion UPS which doesn't fluctuate the output when switching to battery ($$$$) and before that I'll try out independent UPS for the pci-ex chasses and the rack server, and daisy chaining the second UPS if behaviour tests OK. Fault is unlikely to occur more than three or four times a year. Beyond that a new server will be added at some point and some of the web services migrated to that, with the pci-ex chassis and associated server becoming more dedicated to recording studio.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  13. #13
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Systems are now back online after a VIC power outage lasting approximately 30min.

  14. #14
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Discovered today that one of the two reverse proxy servers were not responding. It's been restarted and is up now. It doesn't appear to have had any noticeable effect; I've seen no problem with accessing the website - the proxies are redundant and if your computer can't connect through the first one it tries, it's expected to try the other. Possibly some users have seen unusually slow page load times while their computer tries one then the other and gets a response.

  15. #15
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Second event similar to above : This time due to an upstream provider internet outage in QLD. No known impact on ozmpsclub.com browsing. Again some users may have seen unusually slow page load times while their computer tries one proxy then the other before getting a response.

  16. #16
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Had an obscure problem today - not very clear what caused it - may have been a problem with the configuration of new 'ozmps.club' website, or an ISP fault.
    Affected users were unable to browse the forum.
    An ISP fault is suspected. Browsing was affected for up to 75% of users. I'm leaning towards an ISP fault over a config error as I was unable to fault the site myself, but could see less users online than usual.
    While debugging I did get one failed test out of multiple tests. The failed test suggested the new 'ozmps.club' website had affected the site, however it's not clear if that was the cause or not. I haven't found a problem with the config yet.
    I've rolled back the changes for the new website for now and will investigate further over the weekend.
    Browsing is now normal and the number of online users has climbed to usual numbers, so problem appears resolved.

    If you saw a problem loading the website today or last night, any specific details you can provide - such as "I saw a default ubuntu web page" or "I saw a 404 error" will be helpful.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  17. #17
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Server restarted to recover from a performance issue - approximately 5 min outage due to web server performance fault.
    have also been checking backend systems, and corrected a fault with paypal and subscription. Testing isn't complete yet. VIP Subscriptions due to renew since August have been failing.
    Have also found what appears to be a minor database fault.
    'ozmps.club' is under construction and there'll be another test deploy of it this week.

  18. #18
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Brief outage earlier. Not sure what caused it - don't know exactly how long it lasted for. Site was partially functional. Don't know how many users affected. Server restarted, issue appears resolved.

  19. #19
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Had a more than two hour outage this morning from approx 9:00AM --> 11:30AM due to a widespread major fault with an upstream internet provider. The outage impacted both the Queensland and Victoria server sites resulting in all services being unreachable.

  20. #20
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Another internet outage from approx 2:45 --> 4:15PM - upstream provider outage.

Page 1 of 4 123 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •