From: Hugh Dickins anobjrmap 2/6 free page->mapping for use by anon Tracking anonymous pages by mm,address needs a pointer,offset pair in struct page: mapping,index the natural choice. However, swapcache already uses them for &swapper_space,swp_entry_t. But it's trivial to separate swapcache from pagecache with radix tree; most of swapper_space is actually unused, just a fiction to pretend swap like file; and page->private is a good place to keep swp_entry_t now swap never uses bufferheads. Define page_mapping(page) macro to give NULL when PageAnon, whatever that may put in page->mapping; define PG_swapcache bit, deduce swapper_space from that. This does mean more conditionals (many hidden in page_mapping), but I believe they'll be worth it. Sorry to lose another PG_ bit? Don't worry, I'm sure we can deduce PageSwapCache from PageAnon && private when tight; but that will demand a little care with the Anon/Swap transitions, at present they're pleasantly independent. Who owns page->list, Anon or Swap? Dunno, at present neither, useful for testing. Separating the caches slightly simplifies the tmpfs swizzling, can use functions with fewer underscores since page can briefly be in both caches. I've taken the liberty of moving a page_cache_release into remove_from_page_cache, and getting rid of ___add_to_page_cache. Seemed natural to add nr_swapcache separate from nr_pagecache; but if that poses a vmstat compatibility problem, easily reverted. Similarly, natural to remove the total_swapcache_pages double counting bogosity from vm_enough_memory: it dates from a time when we didn't free the swap when a swapcache page was freed (but though bogus in 2.4 also, I'd hesitate to change it there). It is likely that I've screwed up on the "Morton pages", those ext3 journal pages locked at truncate time which then turn into fish with wings: please check them out, I never manage to wrap my head around them. Certainly don't want a page using private for both bufferheads and swp_entry_t. Although these patches are for applying to 2.5.65-mm2, they're really based the version of rmap.c before Dave added the lovely but flawed find_pte (sorry, you _need_ page_table_lock), and the very unlovely page_convert_anon. My idea for anon pages doesn't let them be in the pagecache too, so I've had to erase page_convert_anon in this patch (sys_remap_file pages will get lost until freed, as if locked): 5/6 then puts that right. Similarly, I'm not calling those !page->mapping driver pages anon: count them in and out, but don't attempt to unmap them (unless I'm mistaken, they're usually pages a driver has allocated, has a reference to, can't be freed anyway). In passing, noticed shrink_list's CONFIG_SWAP was excluding try_to_unmap of file-backed pages, surely that's wrong? Moved its #endif up to allow them; but suspect !CONFIG_MMU was relying on !CONFIG_SWAP there? So added a try_to_unmap stub in linux/rmap.h; but I've not tested these configs. include/linux/swap.h | 3 mm/truncate.c | 1 (forwarded by akpm@digeo.com) 25-akpm/fs/buffer.c | 14 +-- 25-akpm/fs/proc/proc_misc.c | 4 25-akpm/include/linux/mm.h | 28 ++---- 25-akpm/include/linux/page-flags.h | 16 +-- 25-akpm/include/linux/pagemap.h | 11 -- 25-akpm/include/linux/rmap.h | 14 +-- 25-akpm/include/linux/swap.h | 3 25-akpm/mm/filemap.c | 29 +++--- 25-akpm/mm/fremap.c | 10 -- 25-akpm/mm/memory.c | 13 +-- 25-akpm/mm/mmap.c | 8 - 25-akpm/mm/page-writeback.c | 32 ++++--- 25-akpm/mm/page_alloc.c | 17 +++- 25-akpm/mm/page_io.c | 19 ---- 25-akpm/mm/rmap.c | 155 +++++++------------------------------ 25-akpm/mm/swap_state.c | 152 ++++++++++++++++-------------------- 25-akpm/mm/swapfile.c | 21 +++-- 25-akpm/mm/truncate.c | 1 25-akpm/mm/vmscan.c | 32 ++++--- 19 files changed, 224 insertions(+), 355 deletions(-) diff -puN fs/buffer.c~anobjrmap-2-mapping fs/buffer.c --- 25/fs/buffer.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/fs/buffer.c Thu Mar 20 17:46:03 2003 @@ -1457,7 +1457,7 @@ static inline void discard_buffer(struct */ int try_to_release_page(struct page *page, int gfp_mask) { - struct address_space * const mapping = page->mapping; + struct address_space * const mapping = page_mapping(page); if (!PageLocked(page)) BUG(); @@ -2717,22 +2717,18 @@ failed: int try_to_free_buffers(struct page *page) { - struct address_space * const mapping = page->mapping; + struct address_space * const mapping = page_mapping(page); struct buffer_head *buffers_to_free = NULL; int ret = 0; + BUG_ON(!mapping); BUG_ON(!PageLocked(page)); if (PageWriteback(page)) return 0; - if (mapping == NULL) { /* swapped-in anon page */ - ret = drop_buffers(page, &buffers_to_free); - goto out; - } - spin_lock(&mapping->private_lock); ret = drop_buffers(page, &buffers_to_free); - if (ret && !PageSwapCache(page)) { + if (ret) { /* * If the filesystem writes its buffers by hand (eg ext3) * then we can have clean buffers against a dirty page. We @@ -2744,7 +2740,7 @@ int try_to_free_buffers(struct page *pag clear_page_dirty(page); } spin_unlock(&mapping->private_lock); -out: + if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff -puN fs/proc/proc_misc.c~anobjrmap-2-mapping fs/proc/proc_misc.c --- 25/fs/proc/proc_misc.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/fs/proc/proc_misc.c Thu Mar 20 17:46:03 2003 @@ -184,8 +184,8 @@ static int meminfo_read_proc(char *page, K(i.totalram), K(i.freeram), K(i.bufferram), - K(ps.nr_pagecache-total_swapcache_pages-i.bufferram), - K(total_swapcache_pages), + K(ps.nr_pagecache-i.bufferram), + K(ps.nr_swapcache), K(active), K(inactive), K(i.totalhigh), diff -puN include/linux/mm.h~anobjrmap-2-mapping include/linux/mm.h --- 25/include/linux/mm.h~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/include/linux/mm.h Thu Mar 20 17:46:03 2003 @@ -363,6 +363,16 @@ void page_address_init(void); #endif /* + * On an anonymous page mapped into a user virtual memory area, + * page->mapping points to its anonmm, not to a struct address_space. + * + * Please note that, confusingly, "page_mapping" refers to the inode + * address_space which maps the page from disk; whereas "page_mapped" + * refers to user virtual address space into which the page is mapped. + */ +#define page_mapping(page) (PageAnon(page)? NULL: (page)->mapping) + +/* * Return true if this page is mapped into pagetables. Subtle: test pte.direct * rather than pte.chain. Because sometimes pte.direct is 64-bit, and .chain * is only 32-bit. @@ -430,6 +440,7 @@ int get_user_pages(struct task_struct *t int __set_page_dirty_buffers(struct page *page); int __set_page_dirty_nobuffers(struct page *page); +int set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); /* @@ -456,23 +467,6 @@ extern struct shrinker *set_shrinker(int extern void remove_shrinker(struct shrinker *shrinker); /* - * If the mapping doesn't provide a set_page_dirty a_op, then - * just fall through and assume that it wants buffer_heads. - * FIXME: make the method unconditional. - */ -static inline int set_page_dirty(struct page *page) -{ - if (page->mapping) { - int (*spd)(struct page *); - - spd = page->mapping->a_ops->set_page_dirty; - if (spd) - return (*spd)(page); - } - return __set_page_dirty_buffers(page); -} - -/* * On a two-level page table, this ends up being trivial. Thus the * inlining and the symmetry break with pte_alloc_map() that does all * of this out-of-line. diff -puN include/linux/page-flags.h~anobjrmap-2-mapping include/linux/page-flags.h --- 25/include/linux/page-flags.h~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/include/linux/page-flags.h Thu Mar 20 17:46:03 2003 @@ -74,7 +74,9 @@ #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ #define PG_reclaim 18 /* To be reclaimed asap */ #define PG_compound 19 /* Part of a compound page */ -#define PG_anon 20 /* Anonymous page */ + +#define PG_anon 20 /* Anonymous page: anonmm in mapping */ +#define PG_swapcache 21 /* Swap page: swp_entry_t in private */ /* * Global page accounting. One instance per CPU. Only unsigned longs are @@ -84,6 +86,7 @@ struct page_state { unsigned long nr_dirty; /* Dirty writeable pages */ unsigned long nr_writeback; /* Pages under writeback */ unsigned long nr_pagecache; /* Pages in pagecache */ + unsigned long nr_swapcache; /* Pages in swapcache */ unsigned long nr_page_table_pages;/* Pages used for pagetables */ unsigned long nr_reverse_maps; /* includes PageDirect */ unsigned long nr_mapped; /* mapped into pagetables */ @@ -261,15 +264,12 @@ extern void get_full_page_state(struct p #define SetPageAnon(page) set_bit(PG_anon, &(page)->flags) #define ClearPageAnon(page) clear_bit(PG_anon, &(page)->flags) -/* - * The PageSwapCache predicate doesn't use a PG_flag at this time, - * but it may again do so one day. - */ #ifdef CONFIG_SWAP -extern struct address_space swapper_space; -#define PageSwapCache(page) ((page)->mapping == &swapper_space) +#define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags) +#define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags) +#define ClearPageSwapCache(page) clear_bit(PG_swapcache, &(page)->flags) #else -#define PageSwapCache(page) 0 +#define PageSwapCache(page) 0 #endif struct page; /* forward declaration */ diff -puN include/linux/pagemap.h~anobjrmap-2-mapping include/linux/pagemap.h --- 25/include/linux/pagemap.h~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/include/linux/pagemap.h Thu Mar 20 17:46:03 2003 @@ -74,17 +74,6 @@ int add_to_page_cache_lru(struct page *p extern void remove_from_page_cache(struct page *page); extern void __remove_from_page_cache(struct page *page); -static inline void ___add_to_page_cache(struct page *page, - struct address_space *mapping, unsigned long index) -{ - list_add(&page->list, &mapping->clean_pages); - page->mapping = mapping; - page->index = index; - - mapping->nrpages++; - inc_page_state(nr_pagecache); -} - extern void FASTCALL(__lock_page(struct page *page)); extern void FASTCALL(unlock_page(struct page *page)); diff -puN include/linux/rmap.h~anobjrmap-2-mapping include/linux/rmap.h --- 25/include/linux/rmap.h~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/include/linux/rmap.h Thu Mar 20 17:46:03 2003 @@ -22,7 +22,6 @@ static inline void pte_chain_free(struct struct pte_chain *FASTCALL( page_add_rmap(struct page *, pte_t *, struct pte_chain *)); void FASTCALL(page_remove_rmap(struct page *, pte_t *)); -void page_convert_anon(struct page *page); /* * Called from mm/vmscan.c to handle paging out @@ -30,6 +29,13 @@ void page_convert_anon(struct page *page int FASTCALL(page_referenced(struct page *)); int FASTCALL(try_to_unmap(struct page *)); +#else /* !CONFIG_MMU */ + +#define page_referenced(page) TestClearPageReferenced(page) +#define try_to_unmap(page) SWAP_FAIL + +#endif /* CONFIG_MMU */ + /* * Return values of try_to_unmap */ @@ -37,12 +43,6 @@ int FASTCALL(try_to_unmap(struct page *) #define SWAP_AGAIN 1 #define SWAP_FAIL 2 -#else /* !CONFIG_MMU */ - -#define page_referenced(page) TestClearPageReferenced(page) - -#endif /* CONFIG_MMU */ - static inline void pte_chain_lock(struct page *page) { /* diff -puN include/linux/swap.h~anobjrmap-2-mapping include/linux/swap.h --- 25/include/linux/swap.h~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/include/linux/swap.h Thu Mar 20 17:46:03 2003 @@ -179,9 +179,7 @@ extern int rw_swap_page_sync(int, swp_en /* linux/mm/swap_state.c */ extern struct address_space swapper_space; -#define total_swapcache_pages swapper_space.nrpages extern void show_swap_cache_info(void); -extern int add_to_swap_cache(struct page *, swp_entry_t); extern int add_to_swap(struct page *); extern void __delete_from_swap_cache(struct page *); extern void delete_from_swap_cache(struct page *); @@ -219,7 +217,6 @@ extern spinlock_t swaplock; #else /* CONFIG_SWAP */ #define total_swap_pages 0 -#define total_swapcache_pages 0UL #define si_swapinfo(val) \ do { (val)->freeswap = (val)->totalswap = 0; } while (0) diff -puN mm/filemap.c~anobjrmap-2-mapping mm/filemap.c --- 25/mm/filemap.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/filemap.c Thu Mar 20 17:46:03 2003 @@ -81,7 +81,7 @@ void __remove_from_page_cache(struct pag { struct address_space *mapping = page->mapping; - BUG_ON(PageDirty(page) && !PageSwapCache(page)); + BUG_ON(PageDirty(page)); radix_tree_delete(&mapping->page_tree, page->index); list_del(&page->list); @@ -95,20 +95,22 @@ void remove_from_page_cache(struct page { struct address_space *mapping = page->mapping; - if (unlikely(!PageLocked(page))) - PAGE_BUG(page); + BUG_ON(!PageLocked(page)); write_lock(&mapping->page_lock); __remove_from_page_cache(page); write_unlock(&mapping->page_lock); + page_cache_release(page); } static inline int sync_page(struct page *page) { - struct address_space *mapping = page->mapping; + struct address_space *mapping = page_mapping(page); if (mapping && mapping->a_ops && mapping->a_ops->sync_page) return mapping->a_ops->sync_page(page); + if (PageSwapCache(page)) + blk_run_queues(); return 0; } @@ -192,16 +194,9 @@ restart: * This adds a page to the page cache, starting out as locked, unreferenced, * not uptodate and with no errors. * - * This function is used for two things: adding newly allocated pagecache - * pages and for moving existing anon pages into swapcache. - * - * In the case of pagecache pages, the page is new, so we can just run - * SetPageLocked() against it. The other page state flags were set by - * rmqueue() - * - * In the case of swapcache, try_to_swap_out() has already locked the page, so - * SetPageLocked() is ugly-but-OK there too. The required page state has been - * set up by swap_out_add_to_swap_cache(). + * This function is used to add newly allocated pagecache pages: + * the page is new, so we can just run SetPageLocked() against it. + * The other page state flags were set by rmqueue(). * * This function does not add the page to the LRU. The caller must do that. */ @@ -216,7 +211,11 @@ int add_to_page_cache(struct page *page, error = radix_tree_insert(&mapping->page_tree, offset, page); if (!error) { SetPageLocked(page); - ___add_to_page_cache(page, mapping, offset); + list_add(&page->list, &mapping->clean_pages); + page->mapping = mapping; + page->index = offset; + mapping->nrpages++; + inc_page_state(nr_pagecache); } else { page_cache_release(page); } diff -puN mm/fremap.c~anobjrmap-2-mapping mm/fremap.c --- 25/mm/fremap.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/fremap.c Thu Mar 20 17:46:03 2003 @@ -57,21 +57,11 @@ int install_page(struct mm_struct *mm, s pgd_t *pgd; pmd_t *pmd; struct pte_chain *pte_chain; - unsigned long pgidx; pte_chain = pte_chain_alloc(GFP_KERNEL); if (!pte_chain) goto err; - /* - * Convert this page to anon for objrmap if it's nonlinear - */ - pgidx = (addr - vma->vm_start) >> PAGE_SHIFT; - pgidx += vma->vm_pgoff; - pgidx >>= PAGE_CACHE_SHIFT - PAGE_SHIFT; - if (!PageAnon(page) && (page->index != pgidx)) - page_convert_anon(page); - pgd = pgd_offset(mm, addr); spin_lock(&mm->page_table_lock); diff -puN mm/memory.c~anobjrmap-2-mapping mm/memory.c --- 25/mm/memory.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/memory.c Thu Mar 20 17:46:03 2003 @@ -419,8 +419,8 @@ zap_pte_range(struct mmu_gather *tlb, pm if (!PageReserved(page)) { if (pte_dirty(pte)) set_page_dirty(page); - if (page->mapping && pte_young(pte) && - !PageSwapCache(page)) + if (pte_young(pte) && + page_mapping(page)) mark_page_accessed(page); tlb->freed++; page_remove_rmap(page, ptep); @@ -1329,6 +1329,7 @@ do_no_page(struct mm_struct *mm, struct struct page * new_page; pte_t entry; struct pte_chain *pte_chain; + int anon = 0; int ret; if (!vma->vm_ops || !vma->vm_ops->nopage) @@ -1349,10 +1350,6 @@ do_no_page(struct mm_struct *mm, struct if (!pte_chain) goto oom; - /* See if nopage returned an anon page */ - if (!new_page->mapping || PageSwapCache(new_page)) - SetPageAnon(new_page); - /* * Should we do an early C-O-W break? */ @@ -1365,8 +1362,8 @@ do_no_page(struct mm_struct *mm, struct copy_user_highpage(page, new_page, address); page_cache_release(new_page); lru_cache_add_active(page); - SetPageAnon(page); new_page = page; + anon = 1; } spin_lock(&mm->page_table_lock); @@ -1391,6 +1388,8 @@ do_no_page(struct mm_struct *mm, struct if (write_access) entry = pte_mkwrite(pte_mkdirty(entry)); set_pte(page_table, entry); + if (anon) + SetPageAnon(new_page); pte_chain = page_add_rmap(new_page, page_table, pte_chain); pte_unmap(page_table); } else { diff -puN mm/mmap.c~anobjrmap-2-mapping mm/mmap.c --- 25/mm/mmap.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/mmap.c Thu Mar 20 17:46:03 2003 @@ -87,14 +87,6 @@ int vm_enough_memory(long pages) free += nr_swap_pages; /* - * This double-counts: the nrpages are both in the - * page-cache and in the swapper space. At the same time, - * this compensates for the swap-space over-allocation - * (ie "nr_swap_pages" being too small). - */ - free += total_swapcache_pages; - - /* * The code below doesn't account for free space in the * inode and dentry slab cache, slab cache fragmentation, * inodes and dentries which will become freeable under diff -puN mm/page_alloc.c~anobjrmap-2-mapping mm/page_alloc.c --- 25/mm/page_alloc.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/page_alloc.c Thu Mar 20 17:46:03 2003 @@ -80,6 +80,10 @@ static void bad_page(const char *functio 1 << PG_lru | 1 << PG_active | 1 << PG_dirty | + 1 << PG_chainlock | + 1 << PG_direct | + 1 << PG_anon | + 1 << PG_swapcache | 1 << PG_writeback); set_page_count(page, 0); page->mapping = NULL; @@ -216,12 +220,14 @@ static inline void free_pages_check(cons 1 << PG_locked | 1 << PG_active | 1 << PG_reclaim | + 1 << PG_chainlock | + 1 << PG_direct | + 1 << PG_anon | + 1 << PG_swapcache | 1 << PG_writeback ))) bad_page(function, page); if (PageDirty(page)) ClearPageDirty(page); - if (PageAnon(page)) - ClearPageAnon(page); } /* @@ -322,6 +328,10 @@ static void prep_new_page(struct page *p 1 << PG_active | 1 << PG_dirty | 1 << PG_reclaim | + 1 << PG_chainlock | + 1 << PG_direct | + 1 << PG_anon | + 1 << PG_swapcache | 1 << PG_writeback ))) bad_page(__FUNCTION__, page); @@ -869,7 +879,7 @@ unsigned long get_page_cache_size(void) struct page_state ps; get_page_state(&ps); - return ps.nr_pagecache; + return ps.nr_pagecache + ps.nr_swapcache; } void si_meminfo(struct sysinfo *val) @@ -1442,6 +1452,7 @@ static char *vmstat_text[] = { "nr_dirty", "nr_writeback", "nr_pagecache", + "nr_swapcache", "nr_page_table_pages", "nr_reverse_maps", "nr_mapped", diff -puN mm/page_io.c~anobjrmap-2-mapping mm/page_io.c --- 25/mm/page_io.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/page_io.c Thu Mar 20 17:46:03 2003 @@ -16,8 +16,6 @@ #include #include #include -#include /* for block_sync_page() */ -#include #include #include @@ -32,7 +30,7 @@ get_swap_bio(int gfp_flags, struct page swp_entry_t entry; BUG_ON(!PageSwapCache(page)); - entry.val = page->index; + entry.val = page->private; sis = get_swap_info_struct(swp_type(entry)); bio->bi_sector = map_swap_page(sis, swp_offset(entry)) * @@ -130,13 +128,6 @@ out: return ret; } -struct address_space_operations swap_aops = { - .writepage = swap_writepage, - .readpage = swap_readpage, - .sync_page = block_sync_page, - .set_page_dirty = __set_page_dirty_nobuffers, -}; - /* * A scruffy utility function to read or write an arbitrary swap page * and wait on the I/O. @@ -149,10 +140,8 @@ int rw_swap_page_sync(int rw, swp_entry_ }; lock_page(page); - - BUG_ON(page->mapping); - page->mapping = &swapper_space; - page->index = entry.val; + SetPageSwapCache(page); + page->private = entry.val; if (rw == READ) { ret = swap_readpage(NULL, page); @@ -161,7 +150,7 @@ int rw_swap_page_sync(int rw, swp_entry_ ret = swap_writepage(page, &swap_wbc); wait_on_page_writeback(page); } - page->mapping = NULL; + ClearPageSwapCache(page); if (ret == 0 && (!PageUptodate(page) || PageError(page))) ret = -EIO; return ret; diff -puN mm/page-writeback.c~anobjrmap-2-mapping mm/page-writeback.c --- 25/mm/page-writeback.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/page-writeback.c Thu Mar 20 17:46:03 2003 @@ -477,21 +477,12 @@ EXPORT_SYMBOL(write_one_page); * FIXME: may need to call ->reservepage here as well. That's rather up to the * address_space though. * - * For now, we treat swapper_space specially. It doesn't use the normal - * block a_ops. - * * FIXME: this should move over to fs/buffer.c - buffer_heads have no business in mm/ */ #include int __set_page_dirty_buffers(struct page *page) { struct address_space * const mapping = page->mapping; - int ret = 0; - - if (mapping == NULL) { - SetPageDirty(page); - goto out; - } if (!PageUptodate(page)) buffer_error(); @@ -523,8 +514,7 @@ int __set_page_dirty_buffers(struct page __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } -out: - return ret; + return 0; } EXPORT_SYMBOL(__set_page_dirty_buffers); @@ -566,6 +556,24 @@ int __set_page_dirty_nobuffers(struct pa EXPORT_SYMBOL(__set_page_dirty_nobuffers); /* + * If the mapping doesn't provide a set_page_dirty a_op, then + * just fall through and assume that it wants buffer_heads. + */ +int set_page_dirty(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + int (*spd)(struct page *); + + if (!mapping) { + SetPageDirty(page); + return 0; + } + spd = mapping->a_ops->set_page_dirty; + return spd? (*spd)(page): __set_page_dirty_buffers(page); +} +EXPORT_SYMBOL(set_page_dirty); + +/* * set_page_dirty() is racy if the caller has no reference against * page->mapping->host, and if the page is unlocked. This is because another * CPU could truncate the page off the mapping and then free the mapping. @@ -592,7 +600,7 @@ int set_page_dirty_lock(struct page *pag int test_clear_page_dirty(struct page *page) { if (TestClearPageDirty(page)) { - struct address_space *mapping = page->mapping; + struct address_space *mapping = page_mapping(page); if (mapping && !mapping->backing_dev_info->memory_backed) dec_page_state(nr_dirty); diff -puN mm/rmap.c~anobjrmap-2-mapping mm/rmap.c --- 25/mm/rmap.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/rmap.c Thu Mar 20 17:46:03 2003 @@ -37,6 +37,20 @@ /* #define DEBUG_RMAP */ /* + * Something oopsable to put for now in the page->mapping + * of an anonymous page, to test that it is ignored. + */ +#define ANON_MAPPING_DEBUG ((struct address_space *) 1) + +static inline void +clear_page_anon(struct page *page) +{ + BUG_ON(page->mapping != ANON_MAPPING_DEBUG); + page->mapping = NULL; + ClearPageAnon(page); +} + +/* * Shared pages have a chain of pte_chain structures, used to locate * all the mappings to this page. We only need a pointer to the pte * here, the page struct for the page table page contains the process @@ -185,7 +199,7 @@ page_referenced_obj(struct page *page) return 0; if (!mapping) - BUG(); + return 0; if (PageSwapCache(page)) BUG(); @@ -298,8 +312,6 @@ page_add_rmap(struct page *page, pte_t * * find the mappings by walking the object vma chain for that object. */ if (!PageAnon(page)) { - if (!page->mapping) - BUG(); if (PageSwapCache(page)) BUG(); if (!page->pte.mapcount) @@ -331,6 +343,8 @@ page_add_rmap(struct page *page, pte_t * } #endif + page->mapping = ANON_MAPPING_DEBUG; + if (page->pte.direct == 0) { page->pte.direct = pte_paddr; SetPageDirect(page); @@ -392,8 +406,6 @@ void page_remove_rmap(struct page * page BUG(); if (!pfn_valid(page_to_pfn(page)) || PageReserved(page)) return; - if (!page_mapped(page)) - return; /* remap_page_range() from a driver? */ pte_chain_lock(page); @@ -402,8 +414,6 @@ void page_remove_rmap(struct page * page * find the mappings by walking the object vma chain for that object. */ if (!PageAnon(page)) { - if (!page->mapping) - BUG(); if (PageSwapCache(page)) BUG(); if (!page->pte.mapcount) @@ -472,10 +482,11 @@ void page_remove_rmap(struct page * page #endif out: - pte_chain_unlock(page); - if (!page_mapped(page)) + if (!page_mapped(page)) { dec_page_state(nr_mapped); - return; + clear_page_anon(page); + } + pte_chain_unlock(page); } /** @@ -565,8 +576,9 @@ try_to_unmap_obj(struct page *page) goto out; } + /* We lose track of sys_remap_file pages (for now) */ if (page->pte.mapcount) - BUG(); + ret = SWAP_FAIL; out: up(&mapping->i_shared_sem); @@ -628,12 +640,13 @@ static int try_to_unmap_one(struct page pte = ptep_get_and_clear(ptep); flush_tlb_page(vma, address); - if (PageSwapCache(page)) { + if (PageAnon(page)) { + swp_entry_t entry = { .val = page->private }; /* * Store the swap location in the pte. * See handle_pte_fault() ... */ - swp_entry_t entry = { .val = page->index }; + BUG_ON(!PageSwapCache(page)); swap_duplicate(entry); set_pte(ptep, swp_entry_to_pte(entry)); BUG_ON(pte_file(*ptep)); @@ -690,7 +703,7 @@ int try_to_unmap(struct page * page) if (!PageLocked(page)) BUG(); /* We need backing store to swap out a page. */ - if (!page->mapping) + if (!page_mapping(page) && !PageSwapCache(page)) BUG(); /* @@ -757,118 +770,12 @@ int try_to_unmap(struct page * page) } } out: - if (!page_mapped(page)) + if (!page_mapped(page)) { dec_page_state(nr_mapped); - return ret; -} - -/** - * page_convert_anon - Convert an object-based mapped page to pte_chain-based. - * @page: the page to convert - * - * Find all the mappings for an object-based page and convert them - * to 'anonymous', ie create a pte_chain and store all the pte pointers there. - * - * This function takes the address_space->i_shared_sem and the pte_chain_lock - * for the page. It jumps through some hoops to preallocate the correct number - * of pte_chain structures to ensure that it can complete without releasing - * the lock. - */ -void page_convert_anon(struct page *page) -{ - struct address_space *mapping = page->mapping; - struct vm_area_struct *vma; - struct pte_chain *pte_chain = NULL, *ptec; - pte_t *pte; - pte_addr_t pte_paddr; - int mapcount; - int index = 0; - - if (PageAnon(page)) - goto out; - -retry: - /* - * Preallocate the pte_chains outside the lock. - */ - mapcount = page->pte.mapcount; - if (mapcount > 1) { - for (; index < mapcount; index += NRPTE) { - ptec = pte_chain_alloc(GFP_KERNEL); - ptec->next = pte_chain; - pte_chain = ptec; - } - } - down(&mapping->i_shared_sem); - pte_chain_lock(page); - - /* - * Check to make sure the number of mappings didn't change. If they - * did, either retry or free enough pte_chains to compensate. - */ - if (mapcount < page->pte.mapcount) { - pte_chain_unlock(page); - up(&mapping->i_shared_sem); - goto retry; - } else if ((mapcount > page->pte.mapcount) && (mapcount > 1)) { - mapcount = page->pte.mapcount; - while ((index - NRPTE) > mapcount) { - index -= NRPTE; - ptec = pte_chain->next; - pte_chain_free(pte_chain); - pte_chain = ptec; - } - if (mapcount <= 1) - pte_chain_free(pte_chain); - } - SetPageAnon(page); - - if (mapcount == 0) - goto out_unlock; - else if (mapcount == 1) { - SetPageDirect(page); - page->pte.direct = 0; - } else - page->pte.chain = pte_chain; - - index = NRPTE-1; - list_for_each_entry(vma, &mapping->i_mmap, shared) { - pte = find_pte(vma, page, NULL); - if (pte) { - pte_paddr = ptep_to_paddr(pte); - pte_unmap(pte); - if (PageDirect(page)) { - page->pte.direct = pte_paddr; - goto out_unlock; - } - pte_chain->ptes[index] = pte_paddr; - if (!--index) { - pte_chain = pte_chain->next; - index = NRPTE-1; - } - } + if (PageAnon(page)) + clear_page_anon(page); } - list_for_each_entry(vma, &mapping->i_mmap_shared, shared) { - pte = find_pte(vma, page, NULL); - if (pte) { - pte_paddr = ptep_to_paddr(pte); - pte_unmap(pte); - if (PageDirect(page)) { - page->pte.direct = pte_paddr; - goto out_unlock; - } - pte_chain->ptes[index] = pte_paddr; - if (!--index) { - pte_chain = pte_chain->next; - index = NRPTE-1; - } - } - } -out_unlock: - pte_chain_unlock(page); - up(&mapping->i_shared_sem); -out: - return; + return ret; } /** diff -puN mm/swapfile.c~anobjrmap-2-mapping mm/swapfile.c --- 25/mm/swapfile.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/swapfile.c Thu Mar 20 17:46:03 2003 @@ -242,7 +242,7 @@ static int exclusive_swap_page(struct pa struct swap_info_struct * p; swp_entry_t entry; - entry.val = page->index; + entry.val = page->private; p = swap_info_get(entry); if (p) { /* Is the only swap cache user the cache itself? */ @@ -310,7 +310,7 @@ int remove_exclusive_swap_page(struct pa if (page_count(page) != 2) /* 2: us + cache */ return 0; - entry.val = page->index; + entry.val = page->private; p = swap_info_get(entry); if (!p) return 0; @@ -348,8 +348,14 @@ void free_swap_and_cache(swp_entry_t ent p = swap_info_get(entry); if (p) { - if (swap_entry_free(p, swp_offset(entry)) == 1) - page = find_trylock_page(&swapper_space, entry.val); + if (swap_entry_free(p, swp_offset(entry)) == 1) { + read_lock(&swapper_space.page_lock); + page = radix_tree_lookup(&swapper_space.page_tree, + entry.val); + if (page && TestSetPageLocked(page)) + page = NULL; + read_unlock(&swapper_space.page_lock); + } swap_info_put(p); } if (page) { @@ -959,15 +965,14 @@ int page_queue_congested(struct page *pa struct backing_dev_info *bdi; BUG_ON(!PageLocked(page)); /* It pins the swap_info_struct */ - - bdi = page->mapping->backing_dev_info; if (PageSwapCache(page)) { - swp_entry_t entry = { .val = page->index }; + swp_entry_t entry = { .val = page->private }; struct swap_info_struct *sis; sis = get_swap_info_struct(swp_type(entry)); bdi = sis->bdev->bd_inode->i_mapping->backing_dev_info; - } + } else + bdi = page->mapping->backing_dev_info; return bdi_write_congested(bdi); } #endif diff -puN mm/swap_state.c~anobjrmap-2-mapping mm/swap_state.c --- 25/mm/swap_state.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/swap_state.c Thu Mar 20 17:46:03 2003 @@ -13,40 +13,35 @@ #include #include #include -#include /* block_sync_page() */ +#include #include /* - * swapper_inode doesn't do anything much. It is really only here to - * avoid some special-casing in other parts of the kernel. + * Try to avoid special-casing swapcache in shrink_list. */ -static struct inode swapper_inode = { - .i_mapping = &swapper_space, -}; - static struct backing_dev_info swap_backing_dev_info = { .ra_pages = 0, /* No readahead */ .memory_backed = 1, /* Does not contribute to dirty memory */ }; -extern struct address_space_operations swap_aops; +static struct address_space_operations swap_aops = { + .writepage = swap_writepage, + .readpage = swap_readpage, + /* + * sync_page and set_page_dirty are special-cased. + */ +}; +/* + * Only a few fields of swapper_space, those initialized below, + * are ever used: leave most fields null to oops if ever used. + */ struct address_space swapper_space = { .page_tree = RADIX_TREE_INIT(GFP_ATOMIC), .page_lock = RW_LOCK_UNLOCKED, - .clean_pages = LIST_HEAD_INIT(swapper_space.clean_pages), - .dirty_pages = LIST_HEAD_INIT(swapper_space.dirty_pages), - .io_pages = LIST_HEAD_INIT(swapper_space.io_pages), - .locked_pages = LIST_HEAD_INIT(swapper_space.locked_pages), - .host = &swapper_inode, .a_ops = &swap_aops, .backing_dev_info = &swap_backing_dev_info, - .i_mmap = LIST_HEAD_INIT(swapper_space.i_mmap), - .i_mmap_shared = LIST_HEAD_INIT(swapper_space.i_mmap_shared), - .i_shared_sem = __MUTEX_INITIALIZER(swapper_space.i_shared_sem), - .private_lock = SPIN_LOCK_UNLOCKED, - .private_list = LIST_HEAD_INIT(swapper_space.private_list), }; #define INC_CACHE_INFO(x) do { swap_cache_info.x++; } while (0) @@ -68,30 +63,50 @@ void show_swap_cache_info(void) swap_cache_info.noent_race, swap_cache_info.exist_race); } -int add_to_swap_cache(struct page *page, swp_entry_t entry) +static int __add_to_swap_cache(struct page *page, swp_entry_t entry) +{ + int error; + + BUG_ON(PageSwapCache(page)); + BUG_ON(PagePrivate(page)); + error = radix_tree_preload(GFP_ATOMIC); + if (!error) { + page_cache_get(page); + write_lock(&swapper_space.page_lock); + error = radix_tree_insert(&swapper_space.page_tree, + entry.val, page); + if (!error) { + SetPageLocked(page); + SetPageSwapCache(page); + page->private = entry.val; + inc_page_state(nr_swapcache); + } else + page_cache_release(page); + write_unlock(&swapper_space.page_lock); + radix_tree_preload_end(); + } + return error; +} + +static int add_to_swap_cache(struct page *page, swp_entry_t entry) { int error; - if (page->mapping) - BUG(); if (!swap_duplicate(entry)) { INC_CACHE_INFO(noent_race); return -ENOENT; } - error = add_to_page_cache(page, &swapper_space, entry.val, GFP_ATOMIC); + error = __add_to_swap_cache(page, entry); /* - * Anon pages are already on the LRU, we don't run lru_cache_add here. + * Anon pages are already on the LRU, + * we don't run lru_cache_add here. */ - if (error != 0) { + if (error) { swap_free(entry); if (error == -EEXIST) INC_CACHE_INFO(exist_race); return error; } - if (!PageLocked(page)) - BUG(); - if (!PageSwapCache(page)) - BUG(); INC_CACHE_INFO(add_total); return 0; } @@ -105,7 +120,10 @@ void __delete_from_swap_cache(struct pag BUG_ON(!PageLocked(page)); BUG_ON(!PageSwapCache(page)); BUG_ON(PageWriteback(page)); - __remove_from_page_cache(page); + + radix_tree_delete(&swapper_space.page_tree, page->private); + ClearPageSwapCache(page); + dec_page_state(nr_swapcache); INC_CACHE_INFO(del_total); } @@ -149,8 +167,7 @@ int add_to_swap(struct page * page) /* * Add it to the swap cache and mark it dirty */ - err = add_to_page_cache(page, &swapper_space, - entry.val, GFP_ATOMIC); + err = __add_to_swap_cache(page, entry); if (pf_flags & PF_MEMALLOC) current->flags |= PF_MEMALLOC; @@ -158,8 +175,7 @@ int add_to_swap(struct page * page) switch (err) { case 0: /* Success */ SetPageUptodate(page); - ClearPageDirty(page); - set_page_dirty(page); + SetPageDirty(page); INC_CACHE_INFO(add_total); return 1; case -EEXIST: @@ -185,11 +201,12 @@ void delete_from_swap_cache(struct page { swp_entry_t entry; + BUG_ON(!PageSwapCache(page)); BUG_ON(!PageLocked(page)); BUG_ON(PageWriteback(page)); BUG_ON(page_has_buffers(page)); - entry.val = page->index; + entry.val = page->private; write_lock(&swapper_space.page_lock); __delete_from_swap_cache(page); @@ -201,27 +218,12 @@ void delete_from_swap_cache(struct page int move_to_swap_cache(struct page *page, swp_entry_t entry) { - struct address_space *mapping = page->mapping; - int err; - - write_lock(&swapper_space.page_lock); - write_lock(&mapping->page_lock); - - err = radix_tree_insert(&swapper_space.page_tree, entry.val, page); - if (!err) { - __remove_from_page_cache(page); - ___add_to_page_cache(page, &swapper_space, entry.val); - } - - write_unlock(&mapping->page_lock); - write_unlock(&swapper_space.page_lock); - + int err = __add_to_swap_cache(page, entry); if (!err) { + remove_from_page_cache(page); if (!swap_duplicate(entry)) BUG(); - /* shift page from clean_pages to dirty_pages list */ - BUG_ON(PageDirty(page)); - set_page_dirty(page); + SetPageDirty(page); INC_CACHE_INFO(add_total); } else if (err == -EEXIST) INC_CACHE_INFO(exist_race); @@ -231,29 +233,9 @@ int move_to_swap_cache(struct page *page int move_from_swap_cache(struct page *page, unsigned long index, struct address_space *mapping) { - swp_entry_t entry; - int err; - - BUG_ON(!PageLocked(page)); - BUG_ON(PageWriteback(page)); - BUG_ON(page_has_buffers(page)); - - entry.val = page->index; - - write_lock(&swapper_space.page_lock); - write_lock(&mapping->page_lock); - - err = radix_tree_insert(&mapping->page_tree, index, page); - if (!err) { - __delete_from_swap_cache(page); - ___add_to_page_cache(page, mapping, index); - } - - write_unlock(&mapping->page_lock); - write_unlock(&swapper_space.page_lock); - + int err = add_to_page_cache(page, mapping, index, GFP_ATOMIC); if (!err) { - swap_free(entry); + delete_from_swap_cache(page); /* shift page from clean_pages to dirty_pages list */ ClearPageDirty(page); set_page_dirty(page); @@ -261,7 +243,6 @@ int move_from_swap_cache(struct page *pa return err; } - /* * If we are the only user, then try to free up the swap cache. * @@ -319,9 +300,15 @@ void free_pages_and_swap_cache(struct pa */ struct page * lookup_swap_cache(swp_entry_t entry) { - struct page *found; + struct page *page; - found = find_get_page(&swapper_space, entry.val); + read_lock(&swapper_space.page_lock); + page = radix_tree_lookup(&swapper_space.page_tree, entry.val); + if (page) { + page_cache_get(page); + INC_CACHE_INFO(find_success); + } + read_unlock(&swapper_space.page_lock); /* * Unsafe to assert PageSwapCache and mapping on page found: * if SMP nothing prevents swapoff from deleting this page from @@ -329,9 +316,7 @@ struct page * lookup_swap_cache(swp_entr * that, but no need to change: we _have_ got the right page. */ INC_CACHE_INFO(find_total); - if (found) - INC_CACHE_INFO(find_success); - return found; + return page; } /* @@ -352,7 +337,12 @@ struct page * read_swap_cache_async(swp_ * that would confuse statistics: use find_get_page() * directly. */ - found_page = find_get_page(&swapper_space, entry.val); + read_lock(&swapper_space.page_lock); + found_page = radix_tree_lookup(&swapper_space.page_tree, + entry.val); + if (found_page) + page_cache_get(found_page); + read_unlock(&swapper_space.page_lock); if (found_page) break; diff -puN mm/truncate.c~anobjrmap-2-mapping mm/truncate.c --- 25/mm/truncate.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/truncate.c Thu Mar 20 17:46:03 2003 @@ -54,7 +54,6 @@ truncate_complete_page(struct address_sp ClearPageUptodate(page); ClearPageMappedToDisk(page); remove_from_page_cache(page); - page_cache_release(page); /* pagecache ref */ } /* diff -puN mm/vmscan.c~anobjrmap-2-mapping mm/vmscan.c --- 25/mm/vmscan.c~anobjrmap-2-mapping Thu Mar 20 17:46:03 2003 +++ 25-akpm/mm/vmscan.c Thu Mar 20 17:46:03 2003 @@ -176,20 +176,20 @@ static int shrink_slab(long scanned, un /* Must be called with page's pte_chain_lock held. */ static inline int page_mapping_inuse(struct page *page) { - struct address_space *mapping = page->mapping; + struct address_space *mapping; /* Page is in somebody's page tables. */ if (page_mapped(page)) return 1; - /* XXX: does this happen ? */ - if (!mapping) - return 0; - /* Be more reluctant to reclaim swapcache than pagecache */ if (PageSwapCache(page)) return 1; + mapping = page_mapping(page); + if (!mapping) + return 0; + /* File is mmap'd by somebody. */ if (!list_empty(&mapping->i_mmap)) return 1; @@ -261,22 +261,25 @@ shrink_list(struct list_head *page_list, goto activate_locked; } - mapping = page->mapping; + mapping = page_mapping(page); #ifdef CONFIG_SWAP /* - * Anonymous process memory without backing store. Try to - * allocate it some swap space here. + * Anonymous process memory has backing store? + * Try to allocate it some swap space here. * * XXX: implement swap clustering ? */ - if (page_mapped(page) && !mapping && !PagePrivate(page)) { + if (PageSwapCache(page)) + mapping = &swapper_space; + else if (PageAnon(page)) { pte_chain_unlock(page); if (!add_to_swap(page)) goto activate_locked; pte_chain_lock(page); - mapping = page->mapping; + mapping = &swapper_space; } +#endif /* CONFIG_SWAP */ /* * The page is mapped into the page tables of one or more @@ -294,7 +297,6 @@ shrink_list(struct list_head *page_list, ; /* try to free the page below */ } } -#endif /* CONFIG_SWAP */ pte_chain_unlock(page); /* @@ -335,7 +337,9 @@ shrink_list(struct list_head *page_list, .for_reclaim = 1, }; - list_move(&page->list, &mapping->locked_pages); + if (!PageSwapCache(page)) + list_move(&page->list, + &mapping->locked_pages); write_unlock(&mapping->page_lock); SetPageReclaim(page); @@ -399,7 +403,7 @@ shrink_list(struct list_head *page_list, #ifdef CONFIG_SWAP if (PageSwapCache(page)) { - swp_entry_t swap = { .val = page->index }; + swp_entry_t swap = { .val = page->private }; __delete_from_swap_cache(page); write_unlock(&mapping->page_lock); swap_free(swap); @@ -641,7 +645,7 @@ refill_inactive_zone(struct zone *zone, * FIXME: need to consider page_count(page) here if/when we * reap orphaned pages via the LRU (Daniel's locking stuff) */ - if (total_swap_pages == 0 && !page->mapping && + if (total_swap_pages == 0 && !page_mapping(page) && !PagePrivate(page)) { list_add(&page->lru, &l_active); continue; _