SUN 1.05Gb Disk Troubles (Media Error)

  My original posting:

========================================================================

  Our Sparc-10, Model 512 is having some serious problems.

Configuration:

         Sparc-10, Model 512

         128Mb RAM

         2 Internal 1.05Gb Seagate Drives

         SunOS 5.3

I'm getting the following console message:

WARNING:

  /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0 (sd1):

  Error for command 'write' Error Level:

  Fatal

  Requested Block 91120, Error Block: 91234 Apr 19 18:58:01 sedan.tamu.edu unix: Sense Key: Media Error

  Vendor 'SEAGATE': ASC = 0x12 (no addr mark), ASCQ = 0x0, FRU = 0xe8

I shut the machine down and rebooted. When the reboot process went

through the file system check, it threw me into a shell and

requested that I run fsck manually on the filesystem in question

(mounted as /scratch). I did this, receiving many, many warnings and

errors indicating inconsistencies in the inodes, etc., etc.

  After completing the manual fsck, the machine came back up without

any other noticable problems. However, each time I attempt to access

/scratch, I get the above errors again.

  I was under the impression that the manual fsck should correct this

problem -- was I mistaken, or is this an indication of a physical

anomaly on the disk? We're not concerned with recovering the data

on the drive (as the name indicates, it's just a scratch disk), but

I would like to get rid of the above error each time something is

written to the disk.

========================================================================

On the "fsck" command:

  A couple of respondents pointed out that "fsck" only corrects files

  in the file system. Since this is a hardware problem, there is no

  reason to expect fsck to fix it.

Things to check:

  Several of the respondents suggested that we run the non-destructive

  options under "analyze" in the format command to see if the drive

  could recover. We ran format/analyze/read and format/analyze/refresh

  to test the disk thoroughly. Sure enough, a couple of blocks showed

  up with "fatal" errors from this analysis.

  Since I was a bit pressed for time (and folks were breathing down

  my neck to get the machine back up), I then reformatted the entire

  drive. Interesting to note that format didn't complain about the

  bad blocks previously found, but when I ran "newfs" it couldn't

  allocate one of the blocks, so it skipped the entire sector (pardon

  me if the block/sector terminology is off -- I'm doing this from

  a somewhat faulty memory!). This got us up and running with a

  slightly smaller partition and no problems so far.

Solution:

  Most eloquently put by Joseph Mervini:

             REPLACE THE SUCKER!!!!!!

  

  Almost all the respondents indicated that this is a hardware problem

  and that the drive should be replaced immediately if not sooner.

  Since its still under warranty and now that I have a bit of time

  I'm going to start pestering Sun to send out the replacement drive.

--Michael Zika

  Nuclear Engineering

  Texas A&M University

  (zika@trinity.tamu.edu)

Many thanx to all the respondents:

  daniel@CANR.Hydro.Qc.CA (Daniel Hurtubise)

  Steve Elliott <se@computing.lancaster.ac.uk>

  peter.allan@aea.orgn.uk (Peter Allan)

  rae@nvg_troy.nvg.com (James Rae)

  "Ray W. Hiltbrand" <Ray.W.Hiltbrand@Eng.Auburn.EDU>

  sckhoo@emtds1.nsc.com (Swee-Chuan Khoo)

  jamervi@sandia.gov (1236 Joseph A. Mervini)

  mike@trdlnk.com (Michael Sullivan)

[5907 byte] By [CodeProf.com] at [2007-12-25 8:45:00]