[concurrency-interest] Hanging Night Job with 100% CPU Usage

Jacob Solotaroff Jacob.Solotaroff@evant.com
Wed, 25 Jun 2003 11:19:53 -0700


This is a multi-part message in MIME format.

------_=_NextPart_001_01C33B46.5F953285
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,
We've had a problem for the last couple of months.  Our product involves
a batch job that is run every night.  At one of our client sites
(running Java 1.3.1 on AIX) the batch job hangs every 7-10 days.  We
have been unable to reproduce the problem either in house or on their
same box with the exact same data.  The difficulty in reproducing it
would suggest a threading problem.  We use the Oswego concurrency
libraries quite a bit.  However, when it hangs, the CPU usage goes to
100%.  We've checked and this is not due to paging.  Generally a
threading-related hang is due to deadlock and has 0% CPU usage.  Has
anyone seen a hang like this?  One that you can't reproduce but has 100%
CPU usage?

We've tried doing various kill -QUIT and kill -11 commands on it, but
the ascii javacore file gives us "Exception 2" instead of the stack
trace and the binary core file does not show up.  This could be due to
permission problems since the job is run through SUDO (we're looking
into this).

Thanks,
Jacob
Disclaimer:

This e-mail message, including any attachments,is for the sole use of =
the intended recipient(s) and may contain confidential and privileged =
information. Any unauthorized review, use, disclosure or distribution is =
prohibited. If you are not the intended recipient, please contact the =
sender by reply e-mail as well as admin@evant.com, and destroy all =
copies of the original message.
------_=_NextPart_001_01C33B46.5F953285
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.0.6249.1">
<TITLE>Hanging Night Job with 100% CPU Usage</TITLE>
</HEAD>
<BODY><FONT COLOR=3DBLACK>
<!-- Converted from text/rtf format -->

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Hi,</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">We</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">ve had a problem for the last couple of =
months.&nbsp;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT SIZE=3D2 FACE=3D"Arial">Our product involves a =
batch job that is run every night.&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">At one of our client sites (running Java 1.3.1 on =
AIX)</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial"> the batch job hangs every 7-10 days.&nbsp; We =
have been unable to reproduce the problem either in house or on their =
same box</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> =
<FONT SIZE=3D2 FACE=3D"Arial">with th</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">e exact same data.&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">The difficulty in reproducing it would suggest a =
threading problem.&nbsp; We use the Oswego concurrency libraries quite a =
bit.&nbsp; However, when it han</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">gs, the CPU usage goes to 100%.&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">We</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">ve check</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">ed</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial"> and t</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">his is not due to =
paging.&nbsp; Generally a threading-related hang</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial"> is due to deadlock and has 0% CPU usage.&nbsp; Has =
anyone seen a hang like this?&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">One that you can</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">t reproduce but has 100% =
CPU usage?</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">We</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">ve tried doing various kill</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">&#8211;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">QUIT and kill -11 commands =
on it, but the ascii javacore file gives us</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">&#8220;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Exception =
2</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">&#8221;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial"> instead of the stack trace and the binary core file does =
not show up.&nbsp; This could be</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">due to permission problems since the job is run through =
SUDO (we</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">re looking into this).</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Thanks,</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Jacob</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<br><font FACE=3D"Courier" SIZE=3D"2" >Disclaimer:<br> <br> This e-mail =
message, including any attachments,is for the sole use of the intended =
recipient(s) and may contain confidential and privileged information. =
Any unauthorized review, use, disclosure or distribution is prohibited. =
If you are not the intended recipient, please contact the sender by =
reply e-mail as well as <a =
href=3D"mailto:admin@evant.com">admin@evant.com</a>, and destroy all =
copies of the original message.
<br>
</font>
</BODY>
</HTML>
------_=_NextPart_001_01C33B46.5F953285--