Received: from mnm [127.0.0.1] by localhost with POP3 (fetchmail-5.9.0) for akpm@localhost (single-drop); Tue, 10 Jun 2003 23:40:00 -0700 (PDT) Received: from digeo-e2k04.digeo.com ([192.168.2.24]) by pao-ex01.pao.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 10 Jun 2003 23:35:00 -0700 Received: from digeo-nav01.digeo.com ([192.168.1.233]) by digeo-e2k04.digeo.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 10 Jun 2003 23:34:59 -0700 Received: from packet.digeo.com ([192.168.17.15]) by digeo-nav01.digeo.com (SAVSMTP 3.1.1.32) with SMTP id M2003061023371314547 for ; Tue, 10 Jun 2003 23:37:13 -0700 Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.105]) by packet.digeo.com (8.12.8/8.12.8) with ESMTP id h5B6YvX8013081 for ; Tue, 10 Jun 2003 23:34:57 -0700 (PDT) Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e5.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h5B6Yqtd174654; Wed, 11 Jun 2003 02:34:53 -0400 Received: from sparklet.in.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h5B6YkAh246374; Wed, 11 Jun 2003 02:34:49 -0400 Received: (from suparna@localhost) by sparklet.in.ibm.com (8.11.6/8.11.0) id h5B6dtx02392; Wed, 11 Jun 2003 12:09:55 +0530 Date: Wed, 11 Jun 2003 12:09:55 +0530 From: Suparna Bhattacharya To: Andrew Morton Cc: philip.copeland@oracle.com Subject: Re: -mm7 go boom Message-ID: <20030611120955.A2385@in.ibm.com> Reply-To: suparna@in.ibm.com References: <1055298788.3224.57.camel@emerald> <20030610204950.2d783a89.akpm@digeo.com> <20030611112601.A2267@in.ibm.com> <20030610225358.3682764b.akpm@digeo.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="J/dobhs11T7y2rNN" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030610225358.3682764b.akpm@digeo.com>; from akpm@digeo.com on Tue, Jun 10, 2003 at 10:53:58PM -0700 X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang) X-OriginalArrivalTime: 11 Jun 2003 06:34:59.0460 (UTC) FILETIME=[94A79C40:01C32FE3] X-Spam-Status: No, hits=-39.0 required=6.0 tests=BAYES_01,EMAIL_ATTRIBUTION,IN_REP_TO,PATCH_UNIFIED_DIFF, QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES, USER_AGENT_MUTT autolearn=ham version=2.53 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.53 (1.174.2.15-2003-03-30-exp) --J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jun 10, 2003 at 10:53:58PM -0700, Andrew Morton wrote: > Suparna Bhattacharya wrote: > > > > > > > > I'm suspecting that there's garbage on the workqueue pointed to > > > by local var `cwq'. ie: a kioctx got freed up while it was still > > > queued up via schedule_work(). > > > > > > > Hmm, looking at the code, I don't see a protection against this > > case, so that's something to fix anyway I guess. However, we'd > > see this only if the program is done with the ioctx (e.g exit or > > Ctrl C, or an explicit call to destroy the ioctx). > > > > Phil, Do you see this when the program is about to exit (normally > > or due to some other signal) ? > > Apparently tasks were exitting at the time. The test runs over > a thousand processes. OK. Could try the attached patch and see if it helps. Its a little strong, but I didn't find a direct way to just flush /delete a particular workqueue entry. Regards Suparna -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Labs, India --J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="aio-flush-workqueue.patch" diff -ur -X dontdiff linux-2.5.70-mm5/fs/aio.c linux-2.5.70-mm5-dbg/fs/aio.c --- linux-2.5.70-mm5/fs/aio.c Fri Jun 6 17:45:26 2003 +++ linux-2.5.70-mm5-dbg/fs/aio.c Wed Jun 11 13:35:55 2003 @@ -346,6 +347,11 @@ aio_cancel_all(ctx); wait_for_all_aios(ctx); + /* + * this is an overkill, but ensures we don't leave + * the ctx on the aio_wq + */ + flush_workqueue(aio_wq); if (1 != atomic_read(&ctx->users)) printk(KERN_DEBUG @@ -1147,6 +1171,11 @@ aio_cancel_all(ioctx); wait_for_all_aios(ioctx); + /* + * this is an overkill, but ensures we don't leave + * the ctx on the aio_wq + */ + flush_workqueue(aio_wq); put_ioctx(ioctx); /* once for the lookup */ } --J/dobhs11T7y2rNN--