From: "Stephen C. Tweedie" For the past few months there has been a slow but steady trickle of reports of oopses in kjournald. Recently I got a couple of reports that were repeatable enough to rerun with extra debugging code. It turns out that we're releasing a journal_head while it is still linked onto the transaction's t_locked_list. The exact location is in journal_unmap_buffer(). On several exit paths, that does: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); journal_put_journal_head(jh); releasing the jh *after* dropping the j_list_lock and j_state_lock. kjournald can then be doing journal_commit_transaction(): spin_lock(&journal->j_list_lock); ... if (buffer_locked(bh)) { BUFFER_TRACE(bh, "locked"); if (!inverted_lock(journal, bh)) goto write_out_data; __journal_unfile_buffer(jh); __journal_file_buffer(jh, commit_transaction, BJ_Locked); jbd_unlock_bh_state(bh); The problem happens if journal_unmap_buffer()'s own put_journal_head() manages to get in between kjournald's *unfile_buffer and the following *file_buffer. Because journal_unmap_buffer() has dropped its bh_state lock by this point, there's nothing to prevent this, leading to a variety of unpleasant situations. In particular, the jh is unfiled at this point, so there's nothing to stop the put_journal_head() from freeing the memory we're just about to link onto the BJ_Locked list. I _think_ that the attached patch deals with this, but I'm still awaiting further testing to be sure. I thought I might as well get some other ext3 eyes on it while I wait for that -- I'll let you know as soon as I hear back from the other testing. The patch works by making sure that the various exits from journal_unmap_buffer() always call journal_put_journal_head() *before* unlocking the j_list_lock. This is correct according to the documented lock ranking, and it also matches the order in the existing main exit path at the end of the function. Signed-off-by: Andrew Morton --- /dev/null | 0 25-akpm/fs/jbd/transaction.c | 6 +++--- 2 files changed, 3 insertions(+), 3 deletions(-) diff -puN fs/jbd/transaction.c~ext3-jbd-race-releasing-in-use-journal_heads fs/jbd/transaction.c --- 25/fs/jbd/transaction.c~ext3-jbd-race-releasing-in-use-journal_heads Fri Mar 4 15:47:50 2005 +++ 25-akpm/fs/jbd/transaction.c Fri Mar 4 15:47:50 2005 @@ -1774,10 +1774,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget"); ret = __dispose_buffer(jh, journal->j_running_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* There is no currently-running transaction. So the @@ -1788,10 +1788,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "give to committing trans"); ret = __dispose_buffer(jh, journal->j_committing_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* The orphan record's transaction has @@ -1812,10 +1812,10 @@ static int journal_unmap_buffer(journal_ journal->j_running_transaction); jh->b_next_transaction = NULL; } + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return 0; } else { /* Good, the buffer belongs to the running transaction. diff -L fs/jbd/transaction.c.=K0000=.orig -puN /dev/null /dev/null _