From: Abhijit Karmarkar <abhijitk@veritas.com>

It's common practice to msync a large address range regularly, in which
often only a few ptes have actually been dirtied since the previous pass.

sync_pte_range then goes much faster if it tests whether pte is dirty
before locating and accessing each struct page cacheline; and it is hardly
slowed by ptep_clear_flush_dirty repeating that test in the opposite case,
when every pte actually is dirty.

But beware, s390's pte_dirty always says false, since its dirty bit is kept
in the storage key, located via the struct page address.  So skip this
optimization in its case: use a pte_maybe_dirty macro which just says true
if page_test_and_clear_dirty is implemented.

Signed-off-by: Abhijit Karmarkar <abhijitk@veritas.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 include/asm-generic/pgtable.h |    3 +++
 mm/msync.c                    |    2 ++
 2 files changed, 5 insertions(+)

diff -puN include/asm-generic/pgtable.h~msync-check-pte-dirty-earlier include/asm-generic/pgtable.h
--- 25/include/asm-generic/pgtable.h~msync-check-pte-dirty-earlier	Mon Jun  6 15:25:13 2005
+++ 25-akpm/include/asm-generic/pgtable.h	Mon Jun  6 15:25:13 2005
@@ -125,6 +125,9 @@ static inline void ptep_set_wrprotect(st
 
 #ifndef __HAVE_ARCH_PAGE_TEST_AND_CLEAR_DIRTY
 #define page_test_and_clear_dirty(page) (0)
+#define pte_maybe_dirty(pte)		pte_dirty(pte)
+#else
+#define pte_maybe_dirty(pte)		(1)
 #endif
 
 #ifndef __HAVE_ARCH_PAGE_TEST_AND_CLEAR_YOUNG
diff -puN mm/msync.c~msync-check-pte-dirty-earlier mm/msync.c
--- 25/mm/msync.c~msync-check-pte-dirty-earlier	Mon Jun  6 15:25:13 2005
+++ 25-akpm/mm/msync.c	Mon Jun  6 15:25:13 2005
@@ -34,6 +34,8 @@ static void sync_pte_range(struct vm_are
 
 		if (!pte_present(*pte))
 			continue;
+		if (!pte_maybe_dirty(*pte))
+			continue;
 		pfn = pte_pfn(*pte);
 		if (!pfn_valid(pfn))
 			continue;
_