rpc.ttdbserverd runaway

>>> On 14 Apr 1997 15:01:47 -0500,js@cctechnol.com (Johnie Stafford) said:

 js> I've noticed on one machine in the office (a Sparc 5 running Solars

 js> 2.5/CDE 1.0.1), that rpc.ttdbserverd runs away, taking up 80%-90% of

 js> the CPU. We kill it and it behaves for a while but eventually it it

 js> will happen again.

Thanks to "Rick von Richter" <rickv@mwh.com>, for the answer.

 rvr> I have almost the same setup, Sparc 5, 2.5.1, CDE 1.0.2 and have had the same

 rvr> probs. Here is a bug report from Sun on the issue. What I did was bring the

 rvr> system down to single user and then mount the filesystems. At the top of each

 rvr> of your filesystems is a TT_DB dir. Remove these directories and all stuff

 rvr> underneath them. They will be recreated by the system. Then reboot and

 rvr> continue as normal. Sometimes it will runaway again. Sun knows about this but

 rvr> I haven't heard if they are doing anything about it.

Here is the attached bug report:

                        Bug Reports document 4017415

                 [ Notify of patch changes ][ Mark README ]

----------------------------------------------------------------------------

 Bug Id: 4017415

 Category: tooltalk

 Subcategory: dbserver

 State: evaluated

 Synopsis: rpc.ttdbserverd spinning, consuming nearly all cpu time

 Description:

Customer reports an errorsituation with the rpc.ttdbserverd consuming nearly

all cpu-time on an Ultra-2 2.5.1. CDE is not installed, only SUNWdtcor.

Unfortunately, I could not reproduce the error, but at least found a

workaround,

that could probably help to understand the error situation posthum and take

measures against it.

# uname -a

SunOS kora2 5.5.1 Generic_103640-02 sun4u sparc SUNW,Ultra-2

Local Tooltalk databases:

/usr/TT_DB

/var/TT_DB

/export/root/TT_DB

/export/home/kora2/ac-home/TT_DB

/export/home/kora2/bv-home/TT_DB

/export/home/kora2/km-home/TT_DB

/export/home/kora2/inf-home/TT_DB

/export/home/kora2/nv-home/TT_DB

/TT_DB

# more /etc/inetd.conf | grep ttdb

100083/1 stream rpc/tcp wait root /usr/dt/bin/rpc.ttdbserverd

rpc.ttdbserverd

After a while (5 minutes upto 48 hours) after having cleaned out the

TT_DB databases and having rebooted the machine, the rpc.ttdbserverd

started spinning:

# w

  8:34am up 58 min(s), 3 users, load average: 1.20, 1.07, 1.02

User tty login@ idle JCPU PCPU what

root pts/0 7:36am 57 -sh

root pts/1 7:43am 7 4 truss -p 619

root pts/3 8:29am 1 w

# ps -ef|grep rpc.ttdb

    root 2006 1967 0 08:34:41 pts/3 0:00 grep rpc.ttdb

    root 619 216 91 07:42:20 ? 41:05 rpc.ttdbserverd

# kill -ABRT <ttdbserverd-pid> yielded the following stacktrace:

Reading symbolic information for /usr/dt/bin/rpc.ttdbserverd

warning: core object name "rpc.ttdbserver" matches

object name "rpc.ttdbserverd" within the limit of 14. assuming they

match

core file header read successfully

core file read error: address 0x5050c not in data space

core file read error: address 0x5050c not in data space

core file read error: address 0x5050c not in data space

Reading symbolic information for rtld /usr/lib/ld.so.1

core file read error: address 0x5050c not in data space

warning: cannot get address of PLT for "/usr/dt/bin/rpc.ttdbserverd"

detected a multi-LWP program

(l@1) terminated by signal ABRT (Abort)

(debugger) where

=>[1] 0xef70cdcc(0x5bdb9, 0xefffda84, 0x9, 0xfff898c3, 0x29de18,

0xefffdb44), at 0xef70cdcb

  [2] 0xef7039f0(0x14f4e0, 0xefffdb24, 0xefffdb40, 0xefffdb3c, 0xd24ff,

0x14f4e0), at 0xef7039ef

  [3] 0xef7037f0(0xefffdbd7, 0xefffdbcc, 0x0, 0xef716710, 0xef71670c,

0xefffdb24), at 0xef7037ef

  [4] isamfatalerror(0xefffdc60, 0xefffdc70, 0xefffdc78, 0xefffdc68, 0x7cda0,

0x7cda0), at 0x23c5c

  [5] _tt_create_obj_1(0xefffdcec, 0xcfe80, 0x1, 0x454, 0xef5fec08, 0x0), at

0x1e46c

  [6] db_server_svc_C:__sti(0x77658, 0xcfe80, 0x77658, 0x547a8, 0x548a4,

0x1e438), at 0x25690

  [7] 0xef5be1e4(0xcd0e8, 0x77658, 0xcff28, 0xcfe88, 0xef5ff210, 0xcfe80), at

0xef5be1e3

  [8] 0xef5be104(0xefffdee0, 0x0, 0xef5fec60, 0xef5ff210, 0xef773e90, 0x16), at

0xef5be103

  [9] 0xef5bffac(0x0, 0xffffe000, 0xef5f46ec, 0xef5fec60, 0xef5ff210, 0x17), at

0xef5bffab

  [10] _tt_process_transaction(0x71a30, 0x71a20, 0x796b8, 0x71a28, 0x71a2c,

0x71a18), at 0x246cc

(debugger)

 Work around:

Clearing out the databases did not help. At least ttdbck did not find

any problems.

Creating a partition-map and mapping all the tooltalk databases to

one single TT_DB did not help to avoid the error situation either,

but helped in that way as there was only one TT_DB to be cleared out.

Starting the rpc.ttdbserverd from a shell with an increased amount of

filedescriptors (128 instead of 64) helped to avoid the problem

permanently.

        Integrated in releases:

 Duplicate of:

 Patch id:

 See also:

 Summary:

The dbserver can run out of file descriptors between it and the various

libtts from clients that connect to it. The dbserve should zoom the number

of file descriptors from 64 to some larger number (probably 1024).

----------------------------------------------------------------------------

     Copyright 1997 Sun Microsystems, Inc. 2550 Garcia Ave., Mt. View, CA

     94043-1100 USA. All rights reserved.

[6439 byte] By [CodeProf.com] at [2007-12-25 10:04:00]