Search This Blog

Tuesday, September 28, 2010

SAS 5/iR FAIL!

The Dell E2K-UCS-51 (B), simply known as UCS-51, is an inexpensive RAID 0,1 card for SAS drives. It'd worked dutifully for a couple years until our mission critical CentOS 5.3 box froze with:
sd 0:1:0:0 rejecting I/O to offline device

I thought to myself that a hard drive I/O error would have said sda, not just sd. Surely the controller didn't go bad - I've never had a raid controller die on me. Upon reboot the kernel quickly blurted out:
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (ffffffffh)
mptbase: ioc0: ERROR - didn't initialize properly (-1)
Since this is our mission critical server my heart fell to the floor. Next boot, did CTRL-C to go into the card's raid configuration and received this:
I/O card parity interrupt at 41A7:A382
So now the card's own bios is crashing. Great, did it corrupt my raid-1 array?! Frantically I purchased another card off ebay and drove there and back as quick as possibly. The new card worked. All I had to do is reactivate the array which resync 'd the secondary harddrive automatically. CentOS boots fine.
Although I have backups, I did a lot of praying over the weekend because putting a backup box into production is a bit harder than just swapping out a card. The Lord Jesus brought me through it.
-eric wood

Thursday, September 2, 2010

Dead AB Micrologix 1000

When an operator replaced a blown 2/10th amp fuse with a 20 amp one, eventually some device is going to have a bad day - in my case it was an Allen Bradley Micrologix 1000 PLC.  After repeated failed attempts to find a replacement PLC from the now defunct OEM and knowing absolutely nothing about PLCs, a local engineering firm quoted $8000 to enter in about 100 rungs of ladder logic into a new PLC.  I'm desperate but I'm not willing to be taken advantage of.  Regretfully, they declined my counter offer of $1500.  So, I purchased the same PLC off ebay for $99 and was determined to take a stab at programming it using AB's free version of RSLogix called RSLogix Micro Starter Lite (v8.10).  Even with the ladder logic printed in the back of the machine manual I realized this was over my head.  So I took a stab at swapping the power supplies (which is the bottom board).  Yeah, it worked.  Why didn't we think of that in the first place?  The top board (where the firmware and logic is) was obviously spared from the (lightning?) damage.  I went ahead and purchased a serial cable and downloaded the ladder logic from the PLC into an .rss file for safe keeping.  So why $8000? Does having a EE degree gives you a license to rip people off?  Turns out that even $1500 may be too cheap to do what I done for $99 even when the code is already provided in printed form.  One PLC guy recently told me he got out of the automation field and into the IT field because of liability insurance costs of $50,000 was prohibitive.  Because PLCs typically start/stop/run heavy equipment, insurance companies claim high risk exposure of personal injury.  So he's sticking to the IT industry where $500 liability costs are palatable.