SunOS 4.1 multi-user dump causes crashes (RESOLVED!)

Summary [you can skip to the end if you already know the story]:

25-May-90:

  Upgrade from SunOS 4.0.1 to SunOS 4.1 on Sun-4/280's (with 1 ALM-II,

  2 Hitachi disks on a xylogics 451 controller, 1 tape drive on a

  xylogics 472 controller, 2 8 Mb and 1 32 Mb memory board). During

  first post-upgrade multi-user (logins disabled) full dump system

  crashed with:

    Memory Error Register 1d4<INTR,INTENA,CE_ENA,WBACKERR>

    DVMA=1, context=0, virtual address=fff3cfc0

    pme=0, physical address=fc0

    panic: writeback error

    syncing file system... {at this point it hangs and we have to reset

                             from the cpu board, though in one of the 20

                             or so crashes it saved a core image}

1-Jun-90:

  My first message to sun-spots/sun-managers. Got a few responses

  describing similar occurences, but no suggested solution worked.

20-Jun-90:

  Frustrated by Sun's lack of responsiveness in looking into the

  problem (hardware support people worked hard, swapping boards,

  building test systems, etc. despite their suspicions that the

  problem was software related), I posted my second message to

  sun-spots/sun-managers, and received even more reports of similar

  problems, including one other site that received a similar brush-off

  ("multi-user dumps aren't supported").

31-Jul-90:

  After repeated calls to Sun and getting various managers involved

  and having the problem "escalated" even further, the problem was

  finally identified.

**********************************************************************

Fix:

Remove from /etc/fstab the line:

        /dev/xy0b swap swap rw 0 0

Apparently in SunOS 4.1, if you have an fstab entry for the default

swap partition, then when you go multi-user and run swapon(8) the

default swap gets added again. This eventually leads to the kernel

crashing when dump runs and causes the system to swap. This is an

unconfirmed theory (we are still waiting for our sources), but

removing the fstab entry stopped the system from crashing. We are now

back to daily multi-user incremental dumps on our systems. Now all we

have to do is get one of our machines, whose disk got trashed when a

faulty disk controller was swapped in during one of numerous

experiments, back into full service.

Thanks to everyone who responded with suggestions and reports of

similar occurences. It helped put the pressure on Sun to get them to

look at the problem seriously.

                                                --Fuat

Internet:fuat@columbia.edu U.S. MAIL: Columbia University

  BITNET: fuat@cunixf Center for Computing Activities

    UUCP: ...!rutgers!columbia!cunixf!fuat 712 Watson Labs, 612 W115th St.

   Phone: (212) 854-5128 Fax: (212) 662-6442 New York, NY 10025

[4004 byte] By [CodeProf.com] at [2007-12-25 7:14:00]