The Dell E2K-UCS-51 (B), simply known as UCS-51, is an inexpensive RAID 0,1 card for SAS drives. It'd worked dutifully for a couple years until our mission critical CentOS 5.3 box froze with:
sd 0:1:0:0 rejecting I/O to offline device
I thought to myself that a hard drive I/O error would have said sda, not just sd. Surely the controller didn't go bad - I've never had a raid controller die on me. Upon reboot the kernel quickly blurted out:
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (ffffffffh)
mptbase: ioc0: ERROR - didn't initialize properly (-1)
Since this is our mission critical server my heart fell to the floor. Next boot, did CTRL-C to go into the card's raid configuration and received this:
I/O card parity interrupt at 41A7:A382
So now the card's own bios is crashing. Great, did it corrupt my raid-1 array?! Frantically I purchased another card off ebay and drove there and back as quick as possibly. The new card worked. All I had to do is reactivate the array which resync 'd the secondary harddrive automatically. CentOS boots fine.
Although I have backups, I did a lot of praying over the weekend because putting a backup box into production is a bit harder than just swapping out a card. The Lord Jesus brought me through it.
-eric wood