User Tag List

Page 4 of 4 FirstFirst ... 234
Results 61 to 69 of 69

Thread: Unplanned outage

  1. #61
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    WHOOPS!

    Significant outage tonight due to filesystem corruption. Lost two or three of the latest ozmps posts, apologies. I had to repair the operating system and then restore a day old database.

    What happened?

    Well, I investigated a memory issue saturday evening. Shut down, tested memory configurations, made intended change and brought server back online, no problems apparent then, except...after the maintenance the array controller (server storage like your laptop hard disk) reported a battery capacitor failure; it's cable had detached. The fault occurred while correcting the disconnected battery capacitor.

    What's the array battery/capacitor do?

    It's there to protect the hard disk storage in the event of a hard crash like an uncontrolled power outage.
    The array controller spreads the storage across a whole bunch of hard disks (unlike your laptop) and applies "parity" so the storage can survive hard disks dying.
    In our case up to two hard disks can die and be replaced without losing any access, shutting down or damaging data.
    The array controller has several gigabytes of memory on it. When the server saves data to disk, it's sent to the memory, not directly to the disks.
    So if a power outage wiped that few Gb of memory before it's sent to disks the data would be lost.
    The battery saves the memory contents during a power outage and the controller writes the data to disk when the system powers back on.

    Yeah, so...how did that break shit?

    Well, everything was fine on shutdown. I investigated the battery/capacitor fault and found the cable was unplugged. Easy fix, yeah? But the cable is impossible to plug in while the controller installed, so I removed it, attached the cable, and reinstalled the controller. This really should be just fine, but....when the battery was re-attached to the memory and the card reinstalled, it appears to have scrambled the cache memory which should be blank, and then behaved as though there had been a power outage... which means....when the server was restarted it wrote scrambled data to disk....which corrupted the array filesystems.

    It did significant damage taking out 7 of 13 Operating systems running on this server. I'll probably be sorting out the fallout for a week.

    AARGH. That's easily the worst fault we've had since 2016
    Last edited by Nexus; 13-09-2021 at 01:49 AM.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  2. #62
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Unplanned outage from approx 6:30PM AEDT 22/12/21 through to 2:00PM 23/12/21 (today) due to stalled storage controller, system required hard reset.
    Minor fault apparent on restart - ozmps website email service fails to start so if you were sent an email reminder of any kind, the email won't send.
    Will investigate service failures tonight, doesn't seem urgent will possibly wait till after xmas

  3. #63
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Major fault found with email systems today resulting in a large backlog of emails being sent.
    Apologies if you received unexpected ozmps messages such as birthday greetings at the wrong time.
    This fault effected a number of features on the website.

    The fault is not completely cleared yet. To avoid sending any more email that might be perceived as spam I may purge any remaining email queues.
    I will update again once I am confident the system behaviour is fully functional.

  4. #64
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    OzMPS website email functions appear to be working again - please if you have any problems let us know either by private message or facebook.
    While investigating the email fault I did find some abuse and because there was a backlog which attempted to send all at once we were briefly blacklisted this afternoon.
    Email security has been tightened by turning off some infrequently used features.

    Thanks to rumblebee for notifying me of the problem this morning. Free VIP for you


    Quote Originally Posted by Nexus View Post
    Major fault found with email systems today resulting in a large backlog of emails being sent.
    Apologies if you received unexpected ozmps messages such as birthday greetings at the wrong time.
    This fault effected a number of features on the website.

    The fault is not completely cleared yet. To avoid sending any more email that might be perceived as spam I may purge any remaining email queues.
    I will update again once I am confident the system behaviour is fully functional.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  5. #65
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default Slowness

    Hi everyone, apologies some slowness last 24hrs or so due to a failed gateway in Queensland. I've adjusted DNS so that all traffic comes in via Victoria, and once this is effective performance will return to normal. This will vary from user to user up to 12 hours from now.

  6. #66
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Whoops! Sorry everyone, this is entirely on me....the changes made last week were not entirely effective - I only changed one of two DNS entries and as a result 50% of the time you would be experiencing slow access to the website. I should have double checked 12 hours after making the changes. I didn't notice since because on each occasion I was browsing the random choice of network route gave me the fast route. Today it pulled a go slow on me and I realized my mistake. Problem should now be resolved, but once again may take a few hours to become effective for everyone.

    Quote Originally Posted by Nexus View Post
    Hi everyone, apologies some slowness last 24hrs or so due to a failed gateway in Queensland. I've adjusted DNS so that all traffic comes in via Victoria, and once this is effective performance will return to normal. This will vary from user to user up to 12 hours from now.
    "Blue Meanie" 2007 Aurora Blue MPS 3 - 18x8.5+44 SSR GTX01 - 235/40R18 Michelin PS5 - 3.5" ETS TMIC - CPE stg 2 mount - HKS/CPE BPV - 2XS inlet - 2XS short shift - Corksport turbo manifold - HT 98 octane tune - Leather/Aluminium handbrake - Momo shifty knob - 7" touchscreen - JDM Mazda Retractable dashtop screen assembly - Bespoke Raspberry Pi Android based GPS/Carplay and instrumentation - 36AH reserve battery and C-TEK isolator - TEIN Street Advanced coilovers 1" drop - Superpro bushings - 220Kw/410Nm.

    "Lipstick" 2013 Velocity Red MPS 3 - 18x7.5+48 Enkei RPF1 -225/40R18 Federal RS-RR - CPE TMIC - COBB inlet - CPE stg 2 mount - COBB Stage 1 98 octane tune - COBB shifty knob - 2XS short shift - 2XS turbo manifold.

  7. #67
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Significant power outage here in VIC resulted in a 36 hour outage of the website; server needed some attention to bring everything back online.
    Back online today 14/12

  8. #68
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    Hi everyone, been offline for 24 hours after server maintenance. Should have been an hour or two but the ozmps server was in a paused snapshot state, the auto-start behavior of updated hypervisor doesn't seem to start virtual machines in this state, and the network monitoring software was also disrupted so I didn't pick it up until I performed maintenance on the system monitoring this morning.

    So Ozmps server failed to start, and monitoring didn't alert me until today.

  9. #69
    Join Date
    Nov 2008
    Location
    Craigieburn
    Age
    50
    Posts
    1,642
    Blog Entries
    1

    Default

    12/02/24 DNS outage.
    Major outage that prevented access to all domains hosted on lateralcognition.net, including ozmpsclub.com
    Fault traced to private master DNS server which failed an update, went offline, and stopped talking to the publicly accessible slave DNS servers.

    While this fault was active, all domain name records timed out and expired, effectively removing the domain(s) from the internet.

    Service has been restored this morning and over the next few hours behavior should return to normal as DNS records propagate.

Page 4 of 4 FirstFirst ... 234

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •