[concurrency-interest] RE: Hanging Night Job with 100% CPU Usage

Jacob Solotaroff Jacob.Solotaroff@evant.com
Fri, 27 Jun 2003 14:25:55 -0700


This is a multi-part message in MIME format.

------_=_NextPart_001_01C33CF2.B19C6835
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

I won't bug anyone on this anymore, but we actually got this to happen
on Solaris a few days ago.  We called kill -11 on it and got a binary
core file.
I tried pstack, gdb, and a few other tools but can't get anything but
native stack traces.  Does anyone know how to get a java stack trace out
of a binary core file?

Thanks so much.

-----Original Message-----
From: Jacob Solotaroff=20
Sent: Wednesday, June 25, 2003 11:20 AM
To: 'concurrency-interest@altair.cs.oswego.edu'
Subject: Hanging Night Job with 100% CPU Usage

Hi,
We've had a problem for the last couple of months.  Our product involves
a batch job that is run every night.  At one of our client sites
(running Java 1.3.1 on AIX) the batch job hangs every 7-10 days.  We
have been unable to reproduce the problem either in house or on their
same box with the exact same data.  The difficulty in reproducing it
would suggest a threading problem.  We use the Oswego concurrency
libraries quite a bit.  However, when it hangs, the CPU usage goes to
100%.  We've checked and this is not due to paging.  Generally a
threading-related hang is due to deadlock and has 0% CPU usage.  Has
anyone seen a hang like this?  One that you can't reproduce but has 100%
CPU usage?

We've tried doing various kill -QUIT and kill -11 commands on it, but
the ascii javacore file gives us "Exception 2" instead of the stack
trace and the binary core file does not show up.  This could be due to
permission problems since the job is run through SUDO (we're looking
into this).

Thanks,
Jacob
Disclaimer:

This e-mail message, including any attachments,is for the sole use of =
the intended recipient(s) and may contain confidential and privileged =
information. Any unauthorized review, use, disclosure or distribution is =
prohibited. If you are not the intended recipient, please contact the =
sender by reply e-mail as well as admin@evant.com, and destroy all =
copies of the original message.
------_=_NextPart_001_01C33CF2.B19C6835
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.0.6249.1">
<TITLE>RE: Hanging Night Job with 100% CPU Usage</TITLE>
</HEAD>
<BODY><FONT COLOR=3DBLACK>
<!-- Converted from text/rtf format -->

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">I won</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">t bug =
anyone on this anymore, but we</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">actually got this to happen on =
Solari</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">s a few =
days ago</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">.&nbsp; =
We called kill -11 on it and got a binary core file.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">I tried pstack, gdb, and a few other tools but =
can</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t get anything but native stack traces.&nbsp; =
Does anyone know how to get a java stack trace out of a binary core =
file?</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">Thanks so much.</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma">-----Original Message-----<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">From:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Jacob Solotaroff<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Sent:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Wednesday, June 25, 2003 =
11:20 AM<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">To:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> =
'concurrency-interest@altair.cs.oswego.edu'<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Subject:</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Hanging Night Job with 100% CPU =
Usage</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Hi,</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">We&#8217;ve had a problem for the last couple of =
months.&nbsp; Our product involves a batch job that is run every =
night.&nbsp; At one of our client sites (running Java 1.3.1 on AIX) the =
batch job hangs every 7-10 days.&nbsp; We have been unable to reproduce =
the problem either in house or on their same box with the exact same =
data.&nbsp; The difficulty in reproducing it would suggest a threading =
problem.&nbsp; We use the Oswego concurrency libraries quite a =
bit.&nbsp; However, when it hangs, the CPU usage goes to 100%.&nbsp; =
We&#8217;ve checked and this is not due to paging.&nbsp; Generally a =
threading-related hang is due to deadlock and has 0% CPU usage.&nbsp; =
Has anyone seen a hang like this?&nbsp; One that you can&#8217;t =
reproduce but has 100% CPU usage?</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">We&#8217;ve tried doing various kill &#8211;QUIT and kill =
-11 commands on it, but the ascii javacore file gives us =
&#8220;Exception 2&#8221; instead of the stack trace and the binary core =
file does not show up.&nbsp; This could be due to permission problems =
since the job is run through SUDO (we&#8217;re looking into =
this).</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Thanks,</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Jacob</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<br><font FACE=3D"Courier" SIZE=3D"2" >Disclaimer:<br> <br> This e-mail =
message, including any attachments,is for the sole use of the intended =
recipient(s) and may contain confidential and privileged information. =
Any unauthorized review, use, disclosure or distribution is prohibited. =
If you are not the intended recipient, please contact the sender by =
reply e-mail as well as <a =
href=3D"mailto:admin@evant.com">admin@evant.com</a>, and destroy all =
copies of the original message.
<br>
</font>
</BODY>
</HTML>
------_=_NextPart_001_01C33CF2.B19C6835--