From: Patrick Mochel I'm pleased to announce the release of the first patchset of power management changes for 2.6.0. The purpose of this release is to give people a chance to review and test the PM code before it's sent on to Linus. These patches include a number of cleanups and fixes to the PM core code, the driver core PM code, and swsusp. I have verified that all suspend states (standby, suspend-to-ram, and suspend-to-disk) work on a number of personal systems using ACPI as the low-level power interface. However, this is with limited functionality (from a VGA console with minimal processes running). These patches should restore suspend functionality for those that were able to successfully do it before -test3 and -test4. My apologies for the inconvenience my previous changes caused. These patches will probably not allow any more people to suspend/resume than before. The net benefit of these, and the already committed ones, are a cleaner power management subsystem and the development of the proper framework for successfully suspending and resuming the entire system. There are still several rough edges, though we seem to be making headway on those relatively rapidly, and are my sole focus at the moment. My main concerns right now are: - Platform devices, and more generally, devices that may belong to more than one class. It's mainly a driver model problem, though it has PM implications that appear to be holding a few people up. - Drivers Drivers have always been the main impedence to having a working PM core, though it's been difficult to make a lot of progress. I have a number of devices that I will verify work properly, and be in contact with the maintainers if necessary. (Though, I seem to be having more problems with IRQ routing at the moment.) - Getting it work on more systems. Hopefully we will not run into any serious issues, though the PM code has traditionally been finicky. I have a wide array of test machines and willing testers, so this should move quickly. - APM I unfortunately have not had a chance to look into the reported APM problems. But, I'm happy to say that I finally dug out an old laptop that has APM on it. I should make traction soon. I encourage willing people to download the patch, test, and report any problems back to me and/or the list. I cannot guarantee definite or timely results for systems where PM simply doesn't work. However, the more systems we characterize, the easier this will become in the future. Please be patient. If you're using BitKeeper, you can pull the tree from: bk://kernel.bkbits.net:/home/mochel/linux-2.5-power Or, a GNU patch is available at: http://developer.osdl.org/~mochel/patches/test4-pm1/test4-pm1.diff.bz2 There are split patches for each BK revision in that directory. The changelogs are appended. (03/08/30 1.1301) [acpi] Replace /proc/acpi/sleep - Bad to remove proc file now, even though it's nearly useless. Reinstated in the name of compatibility. - Restored original semantics - if software_suspend() is enabled, then just call that (and never go into low-power state). Otherwise, call acpi_suspend(). - acpi_suspend() is simply a wrapper for pm_suspend(), passing down the right argument. This is so we don't have to do everything manually anymore. - Fixed long-standing bug by checking for "4b" in string written in to determine if we want to enter S4bios. (03/08/30 1.1300) [swsusp] Restore software_suspend() call. - Allows 'backdoor' interface to swsusp, as requested by Pavel. - Simply a wrapper to pm_suspend(), though guaranteeing that swsusp is used, and system is shutdown (and put into low-power state). - Call in sys_reboot() changed back to call to software_suspend(). (03/08/30 1.1299) [swsusp] Use BIO interface when reading from swap. - bios are the preferred method for doing this type of stuff in 2.6. The __bread() uses bio's in the end anyway. - bios make it really easy to implement write functionality, so we are able to reset the swap signature immediately after checking it during resume. So, if something happens while resuming, we will still have valid swap to use. - Thanks to Jens for some help in getting it working several months ago. (03/08/29 1.1298) [swsusp] Minor cleanups in read_suspend_image() - Make resume_bdev global to file, so we don't have to pass it around (we always use the same one, so it shouldn't make a difference). - Allocate cur in read_suspend_image(), since it's the only function that uses it. - Check all errors and make sure we free cur if any happen. - Make sure to return errors from the functions called, not our own. - Free the pagedir if we hit an error after we allocate it. (03/08/27 1.1297) [acpi] Move register save closer to call to enter sleep state. - By moving acpi_{save,restore}_state_mem() into acpi_pm_enter(), implying after interrupts have been disabled and nothing else is running on the system, S3 is able to resume properly. (03/08/27 1.1296) [power] Make sure devices get added to the PM lists before bus_add_device(). - Prevents ordering issues when drivers add more devices ->probe(). (03/08/26 1.1295) [power] Separate suspend-to-disk from other suspend sequences. - Put in kernel/power/disk.c - Make compilation depend on CONFIG_SOFTWARE_SUSPEND (should probably be renamed to CONFIG_PM_STD or some such). (03/08/25 1.1294) [power] Fix handling of pm_users. - Actually decrement on device_pm_release() - Call from device_pm_remove(). (03/08/25 1.1292) [power] Fix device suspend handling - Handle -EAGAIN in device_suspend() properly: keep going, with error reset to 0. - Call dpm_resume() if we got a real error, instead of device_resume(), which would deadlock. (03/08/22 1.1276.19.8) [power] swsusp Cleanups - do_magic() - Rename to swsusp_arch_suspend(). - Move declaration to swsusp.c - arch_prepare_suspend() - Return an int - Fix x86 version to return -EFAULT if cpu does not have pse, instead of calling panic(). - Call from swsusp_save(). - do_magic_suspend_1() - Move body to pm_suspend_disk() - Remove. - do_magic_suspend_2() - Rename to swsusp_suspend() - Move IRQ fiddling to suspend_save_image(), since that's the only call that needs it. - Return an int. - do_magic_resume_1() - Move body to pm_resume(). - Remove - do_magic_resume_2() - Rename to swsusp_resume(). - Return an int. - swsusp general - Remove unnecessary includes. - Remove suspend_pagedir_lock, since it was only used to disable IRQs. - Change swsusp_{suspend,resume} return an int, so pm_suspend_disk() knows if anything failed. (03/08/22 1.1276.19.7) [power] Move i386-specific swsusp code to arch/i386/power/ (03/08/22 1.1276.19.6) [power] Fix up sysfs state handling. (03/08/22 1.1276.19.5) [power] Make sure console level is high when suspending. (03/08/22 1.1276.20.1) [power] Fix sysfs state reporting. /dev/null | 235 -------------------------- arch/i386/Makefile | 1 arch/i386/kernel/Makefile | 2 arch/i386/power/Makefile | 2 arch/i386/power/cpu.c | 141 +++++++++++++++ arch/i386/power/swsusp.S | 92 ++++++++++ drivers/acpi/sleep/main.c | 53 +++-- drivers/acpi/sleep/proc.c | 73 ++++++++ drivers/acpi/sleep/sleep.h | 3 drivers/base/core.c | 33 +-- drivers/base/power/main.c | 13 - drivers/base/power/power.h | 3 drivers/base/power/resume.c | 21 +- drivers/base/power/suspend.c | 10 - include/asm-i386/suspend.h | 7 include/linux/suspend.h | 1 kernel/power/Makefile | 2 kernel/power/console.c | 2 kernel/power/disk.c | 337 +++++++++++++++++++++++++++++++++++++ kernel/power/main.c | 384 ++++++------------------------------------- kernel/power/power.h | 39 +--- kernel/power/swsusp.c | 341 ++++++++++++++++++++++---------------- kernel/sys.c | 2 23 files changed, 1006 insertions(+), 791 deletions(-) diff -puN arch/i386/kernel/Makefile~test4-pm1 arch/i386/kernel/Makefile --- 25/arch/i386/kernel/Makefile~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/arch/i386/kernel/Makefile 2003-09-05 00:50:16.000000000 -0700 @@ -18,9 +18,7 @@ obj-$(CONFIG_KGDB) += kgdb_stub.o obj-$(CONFIG_X86_MSR) += msr.o obj-$(CONFIG_X86_CPUID) += cpuid.o obj-$(CONFIG_MICROCODE) += microcode.o -obj-$(CONFIG_PM) += suspend.o obj-$(CONFIG_APM) += apm.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend_asm.o obj-$(CONFIG_X86_SMP) += smp.o smpboot.o obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o obj-$(CONFIG_X86_MPPARSE) += mpparse.o diff -puN -L arch/i386/kernel/suspend_asm.S arch/i386/kernel/suspend_asm.S~test4-pm1 /dev/null --- 25/arch/i386/kernel/suspend_asm.S +++ /dev/null 2002-08-30 16:31:37.000000000 -0700 @@ -1,94 +0,0 @@ -.text - -/* Originally gcc generated, modified by hand */ - -#include -#include -#include - - .text - -ENTRY(do_magic) - pushl %ebx - cmpl $0,8(%esp) - jne .L1450 - call do_magic_suspend_1 - call save_processor_state - - movl %esp, saved_context_esp - movl %eax, saved_context_eax - movl %ebx, saved_context_ebx - movl %ecx, saved_context_ecx - movl %edx, saved_context_edx - movl %ebp, saved_context_ebp - movl %esi, saved_context_esi - movl %edi, saved_context_edi - pushfl ; popl saved_context_eflags - - call do_magic_suspend_2 - jmp .L1449 - .p2align 4,,7 -.L1450: - movl $swapper_pg_dir-__PAGE_OFFSET,%ecx - movl %ecx,%cr3 - - call do_magic_resume_1 - movl $0,loop - cmpl $0,nr_copy_pages - je .L1453 - .p2align 4,,7 -.L1455: - movl $0,loop2 - .p2align 4,,7 -.L1459: - movl pagedir_nosave,%ecx - movl loop,%eax - movl loop2,%edx - sall $4,%eax - movl 4(%ecx,%eax),%ebx - movl (%ecx,%eax),%eax - movb (%edx,%eax),%al - movb %al,(%edx,%ebx) - movl %cr3, %eax; - movl %eax, %cr3; # flush TLB - - movl loop2,%eax - leal 1(%eax),%edx - movl %edx,loop2 - movl %edx,%eax - cmpl $4095,%eax - jbe .L1459 - movl loop,%eax - leal 1(%eax),%edx - movl %edx,loop - movl %edx,%eax - cmpl nr_copy_pages,%eax - jb .L1455 - .p2align 4,,7 -.L1453: - movl $__USER_DS,%eax - - movw %ax, %ds - movw %ax, %es - movl saved_context_esp, %esp - movl saved_context_ebp, %ebp - movl saved_context_eax, %eax - movl saved_context_ebx, %ebx - movl saved_context_ecx, %ecx - movl saved_context_edx, %edx - movl saved_context_esi, %esi - movl saved_context_edi, %edi - call restore_processor_state - pushl saved_context_eflags ; popfl - call do_magic_resume_2 -.L1449: - popl %ebx - ret - - .section .data.nosave -loop: - .quad 0 -loop2: - .quad 0 - .previous - \ No newline at end of file diff -puN -L arch/i386/kernel/suspend.c arch/i386/kernel/suspend.c~test4-pm1 /dev/null --- 25/arch/i386/kernel/suspend.c +++ /dev/null 2002-08-30 16:31:37.000000000 -0700 @@ -1,141 +0,0 @@ -/* - * Suspend support specific for i386. - * - * Distribute under GPLv2 - * - * Copyright (c) 2002 Pavel Machek - * Copyright (c) 2001 Patrick Mochel - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -static struct saved_context saved_context; -static void fix_processor_context(void); - -unsigned long saved_context_eax, saved_context_ebx; -unsigned long saved_context_ecx, saved_context_edx; -unsigned long saved_context_esp, saved_context_ebp; -unsigned long saved_context_esi, saved_context_edi; -unsigned long saved_context_eflags; - -extern void enable_sep_cpu(void *); - -void save_processor_state(void) -{ - kernel_fpu_begin(); - - /* - * descriptor tables - */ - asm volatile ("sgdt %0" : "=m" (saved_context.gdt_limit)); - asm volatile ("sidt %0" : "=m" (saved_context.idt_limit)); - asm volatile ("sldt %0" : "=m" (saved_context.ldt)); - asm volatile ("str %0" : "=m" (saved_context.tr)); - - /* - * segment registers - */ - asm volatile ("movw %%es, %0" : "=m" (saved_context.es)); - asm volatile ("movw %%fs, %0" : "=m" (saved_context.fs)); - asm volatile ("movw %%gs, %0" : "=m" (saved_context.gs)); - asm volatile ("movw %%ss, %0" : "=m" (saved_context.ss)); - - /* - * control registers - */ - asm volatile ("movl %%cr0, %0" : "=r" (saved_context.cr0)); - asm volatile ("movl %%cr2, %0" : "=r" (saved_context.cr2)); - asm volatile ("movl %%cr3, %0" : "=r" (saved_context.cr3)); - asm volatile ("movl %%cr4, %0" : "=r" (saved_context.cr4)); -} - -static void -do_fpu_end(void) -{ - /* restore FPU regs if necessary */ - /* Do it out of line so that gcc does not move cr0 load to some stupid place */ - kernel_fpu_end(); -} - -void restore_processor_state(void) -{ - - /* - * control registers - */ - asm volatile ("movl %0, %%cr4" :: "r" (saved_context.cr4)); - asm volatile ("movl %0, %%cr3" :: "r" (saved_context.cr3)); - asm volatile ("movl %0, %%cr2" :: "r" (saved_context.cr2)); - asm volatile ("movl %0, %%cr0" :: "r" (saved_context.cr0)); - - /* - * segment registers - */ - asm volatile ("movw %0, %%es" :: "r" (saved_context.es)); - asm volatile ("movw %0, %%fs" :: "r" (saved_context.fs)); - asm volatile ("movw %0, %%gs" :: "r" (saved_context.gs)); - asm volatile ("movw %0, %%ss" :: "r" (saved_context.ss)); - - /* - * now restore the descriptor tables to their proper values - * ltr is done i fix_processor_context(). - */ - asm volatile ("lgdt %0" :: "m" (saved_context.gdt_limit)); - asm volatile ("lidt %0" :: "m" (saved_context.idt_limit)); - asm volatile ("lldt %0" :: "m" (saved_context.ldt)); - - /* - * sysenter MSRs - */ - if (boot_cpu_has(X86_FEATURE_SEP)) - enable_sep_cpu(NULL); - - fix_processor_context(); - do_fpu_end(); -} - -static void fix_processor_context(void) -{ - int cpu = smp_processor_id(); - struct tss_struct * t = init_tss + cpu; - - set_tss_desc(cpu,t); /* This just modifies memory; should not be necessary. But... This is necessary, because 386 hardware has concept of busy TSS or some similar stupidity. */ - cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff; - - load_TR_desc(); /* This does ltr */ - load_LDT(¤t->active_mm->context); /* This does lldt */ - - /* - * Now maybe reload the debug registers - */ - if (current->thread.debugreg[7]){ - loaddebug(¤t->thread, 0); - loaddebug(¤t->thread, 1); - loaddebug(¤t->thread, 2); - loaddebug(¤t->thread, 3); - /* no 4 and 5 */ - loaddebug(¤t->thread, 6); - loaddebug(¤t->thread, 7); - } - -} - -EXPORT_SYMBOL(save_processor_state); -EXPORT_SYMBOL(restore_processor_state); diff -puN arch/i386/Makefile~test4-pm1 arch/i386/Makefile --- 25/arch/i386/Makefile~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/arch/i386/Makefile 2003-09-05 00:50:16.000000000 -0700 @@ -103,6 +103,7 @@ drivers-$(CONFIG_MATH_EMULATION) += arch drivers-$(CONFIG_PCI) += arch/i386/pci/ # must be linked after kernel/ drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/ +drivers-$(CONFIG_PM) += arch/i386/power/ CFLAGS += $(mflags-y) AFLAGS += $(mflags-y) diff -puN /dev/null arch/i386/power/cpu.c --- /dev/null 2002-08-30 16:31:37.000000000 -0700 +++ 25-akpm/arch/i386/power/cpu.c 2003-09-05 00:50:16.000000000 -0700 @@ -0,0 +1,141 @@ +/* + * Suspend support specific for i386. + * + * Distribute under GPLv2 + * + * Copyright (c) 2002 Pavel Machek + * Copyright (c) 2001 Patrick Mochel + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static struct saved_context saved_context; +static void fix_processor_context(void); + +unsigned long saved_context_eax, saved_context_ebx; +unsigned long saved_context_ecx, saved_context_edx; +unsigned long saved_context_esp, saved_context_ebp; +unsigned long saved_context_esi, saved_context_edi; +unsigned long saved_context_eflags; + +extern void enable_sep_cpu(void *); + +void save_processor_state(void) +{ + kernel_fpu_begin(); + + /* + * descriptor tables + */ + asm volatile ("sgdt %0" : "=m" (saved_context.gdt_limit)); + asm volatile ("sidt %0" : "=m" (saved_context.idt_limit)); + asm volatile ("sldt %0" : "=m" (saved_context.ldt)); + asm volatile ("str %0" : "=m" (saved_context.tr)); + + /* + * segment registers + */ + asm volatile ("movw %%es, %0" : "=m" (saved_context.es)); + asm volatile ("movw %%fs, %0" : "=m" (saved_context.fs)); + asm volatile ("movw %%gs, %0" : "=m" (saved_context.gs)); + asm volatile ("movw %%ss, %0" : "=m" (saved_context.ss)); + + /* + * control registers + */ + asm volatile ("movl %%cr0, %0" : "=r" (saved_context.cr0)); + asm volatile ("movl %%cr2, %0" : "=r" (saved_context.cr2)); + asm volatile ("movl %%cr3, %0" : "=r" (saved_context.cr3)); + asm volatile ("movl %%cr4, %0" : "=r" (saved_context.cr4)); +} + +static void +do_fpu_end(void) +{ + /* restore FPU regs if necessary */ + /* Do it out of line so that gcc does not move cr0 load to some stupid place */ + kernel_fpu_end(); +} + +void restore_processor_state(void) +{ + + /* + * control registers + */ + asm volatile ("movl %0, %%cr4" :: "r" (saved_context.cr4)); + asm volatile ("movl %0, %%cr3" :: "r" (saved_context.cr3)); + asm volatile ("movl %0, %%cr2" :: "r" (saved_context.cr2)); + asm volatile ("movl %0, %%cr0" :: "r" (saved_context.cr0)); + + /* + * segment registers + */ + asm volatile ("movw %0, %%es" :: "r" (saved_context.es)); + asm volatile ("movw %0, %%fs" :: "r" (saved_context.fs)); + asm volatile ("movw %0, %%gs" :: "r" (saved_context.gs)); + asm volatile ("movw %0, %%ss" :: "r" (saved_context.ss)); + + /* + * now restore the descriptor tables to their proper values + * ltr is done i fix_processor_context(). + */ + asm volatile ("lgdt %0" :: "m" (saved_context.gdt_limit)); + asm volatile ("lidt %0" :: "m" (saved_context.idt_limit)); + asm volatile ("lldt %0" :: "m" (saved_context.ldt)); + + /* + * sysenter MSRs + */ + if (boot_cpu_has(X86_FEATURE_SEP)) + enable_sep_cpu(NULL); + + fix_processor_context(); + do_fpu_end(); +} + +static void fix_processor_context(void) +{ + int cpu = smp_processor_id(); + struct tss_struct * t = init_tss + cpu; + + set_tss_desc(cpu,t); /* This just modifies memory; should not be necessary. But... This is necessary, because 386 hardware has concept of busy TSS or some similar stupidity. */ + cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff; + + load_TR_desc(); /* This does ltr */ + load_LDT(¤t->active_mm->context); /* This does lldt */ + + /* + * Now maybe reload the debug registers + */ + if (current->thread.debugreg[7]){ + loaddebug(¤t->thread, 0); + loaddebug(¤t->thread, 1); + loaddebug(¤t->thread, 2); + loaddebug(¤t->thread, 3); + /* no 4 and 5 */ + loaddebug(¤t->thread, 6); + loaddebug(¤t->thread, 7); + } + +} + +EXPORT_SYMBOL(save_processor_state); +EXPORT_SYMBOL(restore_processor_state); diff -puN /dev/null arch/i386/power/Makefile --- /dev/null 2002-08-30 16:31:37.000000000 -0700 +++ 25-akpm/arch/i386/power/Makefile 2003-09-05 00:50:16.000000000 -0700 @@ -0,0 +1,2 @@ +obj-$(CONFIG_PM) += cpu.o +obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o diff -puN /dev/null arch/i386/power/swsusp.S --- /dev/null 2002-08-30 16:31:37.000000000 -0700 +++ 25-akpm/arch/i386/power/swsusp.S 2003-09-05 00:50:16.000000000 -0700 @@ -0,0 +1,92 @@ +.text + +/* Originally gcc generated, modified by hand */ + +#include +#include +#include + + .text + +ENTRY(swsusp_arch_suspend) + pushl %ebx + cmpl $0,8(%esp) + jne .L1450 + call save_processor_state + + movl %esp, saved_context_esp + movl %eax, saved_context_eax + movl %ebx, saved_context_ebx + movl %ecx, saved_context_ecx + movl %edx, saved_context_edx + movl %ebp, saved_context_ebp + movl %esi, saved_context_esi + movl %edi, saved_context_edi + pushfl ; popl saved_context_eflags + + call swsusp_suspend + jmp .L1449 + .p2align 4,,7 +.L1450: + movl $swapper_pg_dir-__PAGE_OFFSET,%ecx + movl %ecx,%cr3 + + movl $0,loop + cmpl $0,nr_copy_pages + je .L1453 + .p2align 4,,7 +.L1455: + movl $0,loop2 + .p2align 4,,7 +.L1459: + movl pagedir_nosave,%ecx + movl loop,%eax + movl loop2,%edx + sall $4,%eax + movl 4(%ecx,%eax),%ebx + movl (%ecx,%eax),%eax + movb (%edx,%eax),%al + movb %al,(%edx,%ebx) + movl %cr3, %eax; + movl %eax, %cr3; # flush TLB + + movl loop2,%eax + leal 1(%eax),%edx + movl %edx,loop2 + movl %edx,%eax + cmpl $4095,%eax + jbe .L1459 + movl loop,%eax + leal 1(%eax),%edx + movl %edx,loop + movl %edx,%eax + cmpl nr_copy_pages,%eax + jb .L1455 + .p2align 4,,7 +.L1453: + movl $__USER_DS,%eax + + movw %ax, %ds + movw %ax, %es + movl saved_context_esp, %esp + movl saved_context_ebp, %ebp + movl saved_context_eax, %eax + movl saved_context_ebx, %ebx + movl saved_context_ecx, %ecx + movl saved_context_edx, %edx + movl saved_context_esi, %esi + movl saved_context_edi, %edi + call restore_processor_state + pushl saved_context_eflags ; popfl + call swsusp_resume +.L1449: + popl %ebx + ret + + .section .data.nosave +loop: + .quad 0 +loop2: + .quad 0 + .previous + diff -puN drivers/acpi/sleep/main.c~test4-pm1 drivers/acpi/sleep/main.c --- 25/drivers/acpi/sleep/main.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/acpi/sleep/main.c 2003-09-05 00:50:16.000000000 -0700 @@ -41,7 +41,6 @@ static u32 acpi_suspend_states[] = { static int acpi_pm_prepare(u32 state) { - int error = 0; u32 acpi_state = acpi_suspend_states[state]; if (!sleep_states[acpi_state]) @@ -56,21 +55,9 @@ static int acpi_pm_prepare(u32 state) acpi_set_firmware_waking_vector( (acpi_physical_address) acpi_wakeup_address); } - ACPI_FLUSH_CPU_CACHE(); - - /* Do arch specific saving of state. */ - if (state > PM_SUSPEND_STANDBY) { - if ((error = acpi_save_state_mem())) - goto Err; - } - acpi_enter_sleep_state_prep(acpi_state); - return 0; - Err: - acpi_set_firmware_waking_vector(0); - return error; } @@ -90,6 +77,15 @@ static int acpi_pm_enter(u32 state) u32 acpi_state = acpi_suspend_states[state]; ACPI_FLUSH_CPU_CACHE(); + + /* Do arch specific saving of state. */ + if (state > PM_SUSPEND_STANDBY) { + int error = acpi_save_state_mem(); + if (error) + return error; + } + + local_irq_save(flags); switch (state) { @@ -114,6 +110,15 @@ static int acpi_pm_enter(u32 state) local_irq_restore(flags); printk(KERN_DEBUG "Back to C!\n"); + /* restore processor state + * We should only be here if we're coming back from STR or STD. + * And, in the case of the latter, the memory image should have already + * been loaded from disk. + */ + if (state > PM_SUSPEND_STANDBY) + acpi_restore_state_mem(); + + return ACPI_SUCCESS(status) ? 0 : -EFAULT; } @@ -130,14 +135,6 @@ static int acpi_pm_finish(u32 state) { acpi_leave_sleep_state(state); - /* restore processor state - * We should only be here if we're coming back from STR or STD. - * And, in the case of the latter, the memory image should have already - * been loaded from disk. - */ - if (state > ACPI_STATE_S1) - acpi_restore_state_mem(); - /* reset firmware waking vector */ acpi_set_firmware_waking_vector((acpi_physical_address) 0); @@ -149,6 +146,20 @@ static int acpi_pm_finish(u32 state) } +int acpi_suspend(u32 acpi_state) +{ + u32 states[] = { + [1] = PM_SUSPEND_STANDBY, + [3] = PM_SUSPEND_MEM, + [4] = PM_SUSPEND_DISK, + }; + + if (acpi_state <= 4 && states[acpi_state]) + return pm_suspend(states[acpi_state]); + return -EINVAL; +} + + static struct pm_ops acpi_pm_ops = { .prepare = acpi_pm_prepare, .enter = acpi_pm_enter, diff -puN drivers/acpi/sleep/proc.c~test4-pm1 drivers/acpi/sleep/proc.c --- 25/drivers/acpi/sleep/proc.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/acpi/sleep/proc.c 2003-09-05 00:50:16.000000000 -0700 @@ -13,12 +13,71 @@ #include "sleep.h" +#define ACPI_SYSTEM_FILE_SLEEP "sleep" #define ACPI_SYSTEM_FILE_ALARM "alarm" #define _COMPONENT ACPI_SYSTEM_COMPONENT ACPI_MODULE_NAME ("sleep") +static int acpi_system_sleep_seq_show(struct seq_file *seq, void *offset) +{ + int i; + + ACPI_FUNCTION_TRACE("acpi_system_sleep_seq_show"); + + for (i = 0; i <= ACPI_STATE_S5; i++) { + if (sleep_states[i]) { + seq_printf(seq,"S%d ", i); + if (i == ACPI_STATE_S4 && acpi_gbl_FACS->S4bios_f) + seq_printf(seq, "S4bios "); + } + } + + seq_puts(seq, "\n"); + + return 0; +} + +static int acpi_system_sleep_open_fs(struct inode *inode, struct file *file) +{ + return single_open(file, acpi_system_sleep_seq_show, PDE(inode)->data); +} + +static int +acpi_system_write_sleep ( + struct file *file, + const char *buffer, + size_t count, + loff_t *ppos) +{ + char str[12]; + u32 state = 0; + int error = 0; + + if (count > sizeof(str) - 1) + goto Done; + memset(str,0,sizeof(str)); + if (copy_from_user(str, buffer, count)) + return -EFAULT; + + /* Check for S4 bios request */ + if (!strcmp(str,"4b")) { + error = acpi_suspend(4); + goto Done; + } + state = simple_strtoul(str, NULL, 0); +#ifdef CONFIG_SOFTWARE_SUSPEND + if (state == 4) { + error = software_suspend(); + goto Done; + } +#endif + error = acpi_suspend(state); + Done: + return error ? error : count; +} + static int acpi_system_alarm_seq_show(struct seq_file *seq, void *offset) { u32 sec, min, hr; @@ -294,6 +353,14 @@ end: } +static struct file_operations acpi_system_sleep_fops = { + .open = acpi_system_sleep_open_fs, + .read = seq_read, + .write = acpi_system_write_sleep, + .llseek = seq_lseek, + .release = single_release, +}; + static struct file_operations acpi_system_alarm_fops = { .open = acpi_system_alarm_open_fs, .read = seq_read, @@ -307,6 +374,12 @@ static int acpi_sleep_proc_init(void) { struct proc_dir_entry *entry = NULL; + /* 'sleep' [R/W]*/ + entry = create_proc_entry(ACPI_SYSTEM_FILE_SLEEP, + S_IFREG|S_IRUGO|S_IWUSR, acpi_root_dir); + if (entry) + entry->proc_fops = &acpi_system_sleep_fops; + /* 'alarm' [R/W] */ entry = create_proc_entry(ACPI_SYSTEM_FILE_ALARM, S_IFREG|S_IRUGO|S_IWUSR, acpi_root_dir); diff -puN drivers/acpi/sleep/sleep.h~test4-pm1 drivers/acpi/sleep/sleep.h --- 25/drivers/acpi/sleep/sleep.h~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/acpi/sleep/sleep.h 2003-09-05 00:50:16.000000000 -0700 @@ -1,5 +1,4 @@ extern u8 sleep_states[]; - -extern acpi_status acpi_suspend (u32 state); +extern int acpi_suspend (u32 state); diff -puN drivers/base/core.c~test4-pm1 drivers/base/core.c --- 25/drivers/base/core.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/base/core.c 2003-09-05 00:50:16.000000000 -0700 @@ -225,28 +225,30 @@ int device_add(struct device *dev) dev->kobj.parent = &parent->kobj; if ((error = kobject_add(&dev->kobj))) - goto register_done; - - /* now take care of our own registration */ - + goto Error; + if ((error = device_pm_add(dev))) + goto PMError; + if ((error = bus_add_device(dev))) + goto BusError; down_write(&devices_subsys.rwsem); if (parent) list_add_tail(&dev->node,&parent->children); up_write(&devices_subsys.rwsem); - bus_add_device(dev); - - device_pm_add(dev); - /* notify platform of device entry */ if (platform_notify) platform_notify(dev); - - register_done: - if (error && parent) - put_device(parent); + Done: put_device(dev); return error; + BusError: + device_pm_remove(dev); + PMError: + kobject_unregister(&dev->kobj); + Error: + if (parent) + put_device(parent); + goto Done; } @@ -312,8 +314,6 @@ void device_del(struct device * dev) { struct device * parent = dev->parent; - device_pm_remove(dev); - down_write(&devices_subsys.rwsem); if (parent) list_del_init(&dev->node); @@ -324,14 +324,11 @@ void device_del(struct device * dev) */ if (platform_notify_remove) platform_notify_remove(dev); - bus_remove_device(dev); - + device_pm_remove(dev); kobject_del(&dev->kobj); - if (parent) put_device(parent); - } /** diff -puN drivers/base/power/main.c~test4-pm1 drivers/base/power/main.c --- 25/drivers/base/power/main.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/base/power/main.c 2003-09-05 00:50:16.000000000 -0700 @@ -36,12 +36,14 @@ DECLARE_MUTEX(dpm_sem); static inline void device_pm_hold(struct device * dev) { - atomic_inc(&dev->power.pm_users); + if (dev) + atomic_inc(&dev->power.pm_users); } static inline void device_pm_release(struct device * dev) { - atomic_inc(&dev->power.pm_users); + if (dev) + atomic_dec(&dev->power.pm_users); } @@ -61,11 +63,9 @@ static inline void device_pm_release(str void device_pm_set_parent(struct device * dev, struct device * parent) { struct device * old_parent = dev->power.pm_parent; - if (old_parent) - device_pm_release(old_parent); + device_pm_release(old_parent); dev->power.pm_parent = parent; - if (parent) - device_pm_hold(parent); + device_pm_hold(parent); } EXPORT_SYMBOL(device_pm_set_parent); @@ -91,6 +91,7 @@ void device_pm_remove(struct device * de dev->bus ? dev->bus->name : "No Bus", dev->kobj.name); down(&dpm_sem); dpm_sysfs_remove(dev); + device_pm_release(dev->power.pm_parent); list_del(&dev->power.entry); up(&dpm_sem); } diff -puN drivers/base/power/power.h~test4-pm1 drivers/base/power/power.h --- 25/drivers/base/power/power.h~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/base/power/power.h 2003-09-05 00:50:16.000000000 -0700 @@ -58,7 +58,8 @@ extern void dpm_sysfs_remove(struct devi /* * resume.c */ -extern int dpm_resume(void); + +extern void dpm_resume(void); extern void dpm_power_up(void); extern int resume_device(struct device *); diff -puN drivers/base/power/resume.c~test4-pm1 drivers/base/power/resume.c --- 25/drivers/base/power/resume.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/base/power/resume.c 2003-09-05 00:50:16.000000000 -0700 @@ -28,6 +28,19 @@ int resume_device(struct device * dev) } + +void dpm_resume(void) +{ + while(!list_empty(&dpm_off)) { + struct list_head * entry = dpm_off.next; + struct device * dev = to_device(entry); + list_del_init(entry); + resume_device(dev); + list_add_tail(entry,&dpm_active); + } +} + + /** * device_resume - Restore state of each device in system. * @@ -38,13 +51,7 @@ int resume_device(struct device * dev) void device_resume(void) { down(&dpm_sem); - while(!list_empty(&dpm_off)) { - struct list_head * entry = dpm_off.next; - struct device * dev = to_device(entry); - list_del_init(entry); - resume_device(dev); - list_add_tail(entry,&dpm_active); - } + dpm_resume(); up(&dpm_sem); } diff -puN drivers/base/power/suspend.c~test4-pm1 drivers/base/power/suspend.c --- 25/drivers/base/power/suspend.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/drivers/base/power/suspend.c 2003-09-05 00:50:16.000000000 -0700 @@ -81,14 +81,18 @@ int device_suspend(u32 state) while(!list_empty(&dpm_active)) { struct list_head * entry = dpm_active.prev; struct device * dev = to_device(entry); - if ((error = suspend_device(dev,state))) - goto Error; + if ((error = suspend_device(dev,state))) { + if (error != -EAGAIN) + goto Error; + else + error = 0; + } } Done: up(&dpm_sem); return error; Error: - device_resume(); + dpm_resume(); goto Done; } diff -puN include/asm-i386/suspend.h~test4-pm1 include/asm-i386/suspend.h --- 25/include/asm-i386/suspend.h~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/include/asm-i386/suspend.h 2003-09-05 00:50:16.000000000 -0700 @@ -6,11 +6,12 @@ #include #include -static inline void +static inline int arch_prepare_suspend(void) { if (!cpu_has_pse) - panic("pse required"); + return -EPERM; + return 0; } /* image of the saved processor state */ @@ -38,8 +39,6 @@ struct saved_context { extern void save_processor_state(void); extern void restore_processor_state(void); -extern int do_magic(int resume); - #ifdef CONFIG_ACPI_SLEEP extern unsigned long saved_eip; extern unsigned long saved_esp; diff -puN include/linux/suspend.h~test4-pm1 include/linux/suspend.h --- 25/include/linux/suspend.h~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/include/linux/suspend.h 2003-09-05 00:50:16.000000000 -0700 @@ -53,6 +53,7 @@ extern suspend_pagedir_t *pagedir_nosave extern void do_suspend_lowlevel(int resume); extern void do_suspend_lowlevel_s4bios(int resume); +extern int software_suspend(void); #else /* CONFIG_SOFTWARE_SUSPEND */ static inline int software_suspend(void) { diff -puN kernel/power/console.c~test4-pm1 kernel/power/console.c --- 25/kernel/power/console.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/power/console.c 2003-09-05 00:50:16.000000000 -0700 @@ -8,7 +8,7 @@ #include #include "power.h" -static int new_loglevel = 7; +static int new_loglevel = 10; static int orig_loglevel; static int orig_fgconsole, orig_kmsg; diff -puN /dev/null kernel/power/disk.c --- /dev/null 2002-08-30 16:31:37.000000000 -0700 +++ 25-akpm/kernel/power/disk.c 2003-09-05 00:50:16.000000000 -0700 @@ -0,0 +1,337 @@ +/* + * kernel/power/disk.c - Suspend-to-disk support. + * + * Copyright (c) 2003 Patrick Mochel + * Copyright (c) 2003 Open Source Development Lab + * + * This file is release under the GPLv2 + * + */ + +#define DEBUG + + +#include +#include +#include +#include +#include +#include "power.h" + + +extern u32 pm_disk_mode; +extern struct pm_ops * pm_ops; + +extern int swsusp_save(void); +extern int swsusp_write(void); +extern int swsusp_read(void); +extern int swsusp_restore(void); +extern int swsusp_free(void); + +extern long sys_sync(void); + + +/** + * power_down - Shut machine down for hibernate. + * @mode: Suspend-to-disk mode + * + * Use the platform driver, if configured so, and return gracefully if it + * fails. + * Otherwise, try to power off and reboot. If they fail, halt the machine, + * there ain't no turning back. + */ + +static int power_down(u32 mode) +{ + unsigned long flags; + int error = 0; + + local_irq_save(flags); + device_power_down(PM_SUSPEND_DISK); + switch(mode) { + case PM_DISK_PLATFORM: + error = pm_ops->enter(PM_SUSPEND_DISK); + break; + case PM_DISK_SHUTDOWN: + printk("Powering off system\n"); + machine_power_off(); + break; + case PM_DISK_REBOOT: + machine_restart(NULL); + break; + } + machine_halt(); + device_power_up(); + local_irq_restore(flags); + return 0; +} + + +static int in_suspend __nosavedata = 0; + + +/** + * free_some_memory - Try to free as much memory as possible + * + * ... but do not OOM-kill anyone + * + * Notice: all userland should be stopped at this point, or + * livelock is possible. + */ + +static void free_some_memory(void) +{ + printk("Freeing memory: "); + while (shrink_all_memory(10000)) + printk("."); + printk("|\n"); + blk_run_queues(); +} + + +static inline void platform_finish(void) +{ + if (pm_disk_mode == PM_DISK_PLATFORM) { + if (pm_ops && pm_ops->finish) + pm_ops->finish(PM_SUSPEND_DISK); + } +} + +static void finish(void) +{ + device_resume(); + platform_finish(); + thaw_processes(); + pm_restore_console(); +} + + +static int prepare(void) +{ + int error; + + pm_prepare_console(); + + sys_sync(); + if (freeze_processes()) { + error = -EBUSY; + goto Thaw; + } + + if (pm_disk_mode == PM_DISK_PLATFORM) { + if (pm_ops && pm_ops->prepare) { + if ((error = pm_ops->prepare(PM_SUSPEND_DISK))) + goto Thaw; + } + } + + /* Free memory before shutting down devices. */ + free_some_memory(); + + if ((error = device_suspend(PM_SUSPEND_DISK))) + goto Finish; + + return 0; + Finish: + platform_finish(); + Thaw: + thaw_processes(); + pm_restore_console(); + return error; +} + + +/** + * pm_suspend_disk - The granpappy of power management. + * + * If we're going through the firmware, then get it over with quickly. + * + * If not, then call swsusp to do it's thing, then figure out how + * to power down the system. + */ + +int pm_suspend_disk(void) +{ + int error; + + if ((error = prepare())) + return error; + + pr_debug("PM: Attempting to suspend to disk.\n"); + if (pm_disk_mode == PM_DISK_FIRMWARE) + return pm_ops->enter(PM_SUSPEND_DISK); + + pr_debug("PM: snapshotting memory.\n"); + in_suspend = 1; + local_irq_disable(); + if ((error = swsusp_save())) + goto Done; + + pr_debug("PM: writing image.\n"); + + /* + * FIXME: Leftover from swsusp. Are they necessary? + */ + mb(); + barrier(); + + error = swsusp_write(); + if (!error && in_suspend) { + error = power_down(pm_disk_mode); + pr_debug("PM: Power down failed.\n"); + } else + pr_debug("PM: Image restored successfully.\n"); + swsusp_free(); + Done: + local_irq_enable(); + finish(); + return error; +} + + +/** + * pm_resume - Resume from a saved image. + * + * Called as a late_initcall (so all devices are discovered and + * initialized), we call swsusp to see if we have a saved image or not. + * If so, we quiesce devices, the restore the saved image. We will + * return above (in pm_suspend_disk() ) if everything goes well. + * Otherwise, we fail gracefully and return to the normally + * scheduled program. + * + */ + +static int pm_resume(void) +{ + int error; + + pr_debug("PM: Reading swsusp image.\n"); + + if ((error = swsusp_read())) + goto Done; + + pr_debug("PM: Preparing system for restore.\n"); + + if ((error = prepare())) + goto Free; + + barrier(); + mb(); + local_irq_disable(); + + /* FIXME: The following (comment and mdelay()) are from swsusp. + * Are they really necessary? + * + * We do not want some readahead with DMA to corrupt our memory, right? + * Do it with disabled interrupts for best effect. That way, if some + * driver scheduled DMA, we have good chance for DMA to finish ;-). + */ + pr_debug("PM: Waiting for DMAs to settle down.\n"); + mdelay(1000); + + pr_debug("PM: Restoring saved image.\n"); + swsusp_restore(); + local_irq_enable(); + pr_debug("PM: Restore failed, recovering.n"); + finish(); + Free: + swsusp_free(); + Done: + pr_debug("PM: Resume from disk failed.\n"); + return 0; +} + +late_initcall(pm_resume); + + +static char * pm_disk_modes[] = { + [PM_DISK_FIRMWARE] = "firmware", + [PM_DISK_PLATFORM] = "platform", + [PM_DISK_SHUTDOWN] = "shutdown", + [PM_DISK_REBOOT] = "reboot", +}; + +/** + * disk - Control suspend-to-disk mode + * + * Suspend-to-disk can be handled in several ways. The greatest + * distinction is who writes memory to disk - the firmware or the OS. + * If the firmware does it, we assume that it also handles suspending + * the system. + * If the OS does it, then we have three options for putting the system + * to sleep - using the platform driver (e.g. ACPI or other PM registers), + * powering off the system or rebooting the system (for testing). + * + * The system will support either 'firmware' or 'platform', and that is + * known a priori (and encoded in pm_ops). But, the user may choose + * 'shutdown' or 'reboot' as alternatives. + * + * show() will display what the mode is currently set to. + * store() will accept one of + * + * 'firmware' + * 'platform' + * 'shutdown' + * 'reboot' + * + * It will only change to 'firmware' or 'platform' if the system + * supports it (as determined from pm_ops->pm_disk_mode). + */ + +static ssize_t disk_show(struct subsystem * subsys, char * buf) +{ + return sprintf(buf,"%s\n",pm_disk_modes[pm_disk_mode]); +} + + +static ssize_t disk_store(struct subsystem * s, const char * buf, size_t n) +{ + int error = 0; + int i; + u32 mode = 0; + + down(&pm_sem); + for (i = PM_DISK_FIRMWARE; i < PM_DISK_MAX; i++) { + if (!strcmp(buf,pm_disk_modes[i])) { + mode = i; + break; + } + } + if (mode) { + if (mode == PM_DISK_SHUTDOWN || mode == PM_DISK_REBOOT) + pm_disk_mode = mode; + else { + if (pm_ops && pm_ops->enter && + (mode == pm_ops->pm_disk_mode)) + pm_disk_mode = mode; + else + error = -EINVAL; + } + } else + error = -EINVAL; + + pr_debug("PM: suspend-to-disk mode set to '%s'\n", + pm_disk_modes[mode]); + up(&pm_sem); + return error ? error : n; +} + +power_attr(disk); + +static struct attribute * g[] = { + &disk_attr.attr, + NULL, +}; + + +static struct attribute_group attr_group = { + .attrs = g, +}; + + +static int __init pm_disk_init(void) +{ + return sysfs_create_group(&power_subsys.kset.kobj,&attr_group); +} + +core_initcall(pm_disk_init); diff -puN kernel/power/main.c~test4-pm1 kernel/power/main.c --- 25/kernel/power/main.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/power/main.c 2003-09-05 00:50:16.000000000 -0700 @@ -8,32 +8,23 @@ * */ +#define DEBUG + #include #include -#include #include +#include #include #include #include -#include #include "power.h" -static DECLARE_MUTEX(pm_sem); - -static struct pm_ops * pm_ops = NULL; - -static u32 pm_disk_mode = PM_DISK_SHUTDOWN; - -#ifdef CONFIG_SOFTWARE_SUSPEND -static int have_swsusp = 1; -#else -static int have_swsusp = 0; -#endif - -extern long sys_sync(void); +DECLARE_MUTEX(pm_sem); +struct pm_ops * pm_ops = NULL; +u32 pm_disk_mode = PM_DISK_SHUTDOWN; /** * pm_set_ops - Set the global power method table. @@ -51,171 +42,6 @@ void pm_set_ops(struct pm_ops * ops) /** - * pm_suspend_standby - Enter 'standby' state. - * - * 'standby' is also known as 'Power-On Suspend'. Here, we power down - * devices, disable interrupts, and enter the state. - */ - -static int pm_suspend_standby(void) -{ - int error = 0; - unsigned long flags; - - if (!pm_ops || !pm_ops->enter) - return -EPERM; - - local_irq_save(flags); - if ((error = device_power_down(PM_SUSPEND_STANDBY))) - goto Done; - error = pm_ops->enter(PM_SUSPEND_STANDBY); - local_irq_restore(flags); - device_power_up(); - Done: - return error; -} - - -/** - * pm_suspend_mem - Enter suspend-to-RAM state. - * - * Identical to pm_suspend_standby() - we power down devices, disable - * interrupts, and enter the low-power state. - */ - -static int pm_suspend_mem(void) -{ - int error = 0; - unsigned long flags; - - if (!pm_ops || !pm_ops->enter) - return -EPERM; - - local_irq_save(flags); - if ((error = device_power_down(PM_SUSPEND_STANDBY))) - goto Done; - error = pm_ops->enter(PM_SUSPEND_STANDBY); - local_irq_restore(flags); - device_power_up(); - Done: - return error; -} - - -/** - * power_down - Shut machine down for hibernate. - * @mode: Suspend-to-disk mode - * - * Use the platform driver, if configured so, and return gracefully if it - * fails. - * Otherwise, try to power off and reboot. If they fail, halt the machine, - * there ain't no turning back. - */ - -static int power_down(u32 mode) -{ - unsigned long flags; - int error = 0; - - local_irq_save(flags); - device_power_down(PM_SUSPEND_DISK); - switch(mode) { - case PM_DISK_PLATFORM: - error = pm_ops->enter(PM_SUSPEND_DISK); - if (error) { - device_power_up(); - local_irq_restore(flags); - return error; - } - case PM_DISK_SHUTDOWN: - machine_power_off(); - break; - case PM_DISK_REBOOT: - machine_restart(NULL); - break; - } - machine_halt(); - return 0; -} - - -static int in_suspend __nosavedata = 0; - - -/** - * free_some_memory - Try to free as much memory as possible - * - * ... but do not OOM-kill anyone - * - * Notice: all userland should be stopped at this point, or - * livelock is possible. - */ - -static void free_some_memory(void) -{ - printk("Freeing memory: "); - while (shrink_all_memory(10000)) - printk("."); - printk("|\n"); - blk_run_queues(); -} - - -/** - * pm_suspend_disk - The granpappy of power management. - * - * If we're going through the firmware, then get it over with quickly. - * - * If not, then call swsusp to do it's thing, then figure out how - * to power down the system. - */ - -static int pm_suspend_disk(void) -{ - int error; - - pr_debug("PM: Attempting to suspend to disk.\n"); - if (pm_disk_mode == PM_DISK_FIRMWARE) - return pm_ops->enter(PM_SUSPEND_DISK); - - if (!have_swsusp) - return -EPERM; - - pr_debug("PM: snapshotting memory.\n"); - in_suspend = 1; - if ((error = swsusp_save())) - goto Done; - - if (in_suspend) { - pr_debug("PM: writing image.\n"); - error = swsusp_write(); - if (!error) - error = power_down(pm_disk_mode); - pr_debug("PM: Power down failed.\n"); - } else - pr_debug("PM: Image restored successfully.\n"); - swsusp_free(); - Done: - return error; -} - - - -#define decl_state(_name) \ - { .name = __stringify(_name), .fn = pm_suspend_##_name } - -struct pm_state { - char * name; - int (*fn)(void); -} pm_states[] = { - [PM_SUSPEND_STANDBY] = decl_state(standby), - [PM_SUSPEND_MEM] = decl_state(mem), - [PM_SUSPEND_DISK] = decl_state(disk), - { NULL }, -}; - - -/** * suspend_prepare - Do prep work before entering low-power state. * @state: State we're entering. * @@ -228,23 +54,21 @@ static int suspend_prepare(u32 state) { int error = 0; + if (!pm_ops || !pm_ops->enter) + return -EPERM; + pm_prepare_console(); - sys_sync(); if (freeze_processes()) { error = -EAGAIN; goto Thaw; } - if (pm_ops && pm_ops->prepare) { + if (pm_ops->prepare) { if ((error = pm_ops->prepare(state))) goto Thaw; } - /* Free memory before shutting down devices. */ - if (state == PM_SUSPEND_DISK) - free_some_memory(); - if ((error = device_suspend(state))) goto Finish; @@ -253,7 +77,7 @@ static int suspend_prepare(u32 state) pm_restore_console(); return error; Finish: - if (pm_ops && pm_ops->finish) + if (pm_ops->finish) pm_ops->finish(state); Thaw: thaw_processes(); @@ -261,6 +85,22 @@ static int suspend_prepare(u32 state) } +static int suspend_enter(u32 state) +{ + int error = 0; + unsigned long flags; + + local_irq_save(flags); + if ((error = device_power_down(state))) + goto Done; + error = pm_ops->enter(state); + local_irq_restore(flags); + device_power_up(); + Done: + return error; +} + + /** * suspend_finish - Do final work before exiting suspend sequence. * @state: State we're coming out of. @@ -279,6 +119,16 @@ static void suspend_finish(u32 state) } + + +char * pm_states[] = { + [PM_SUSPEND_STANDBY] = "standby", + [PM_SUSPEND_MEM] = "mem", + [PM_SUSPEND_DISK] = "disk", + NULL, +}; + + /** * enter_state - Do common work of entering low-power state. * @state: pm_state structure for state we're entering. @@ -293,7 +143,6 @@ static void suspend_finish(u32 state) static int enter_state(u32 state) { int error; - struct pm_state * s = &pm_states[state]; if (down_trylock(&pm_sem)) return -EBUSY; @@ -304,12 +153,17 @@ static int enter_state(u32 state) goto Unlock; } - pr_debug("PM: Preparing system for suspend.\n"); + if (state == PM_SUSPEND_DISK) { + error = pm_suspend_disk(); + goto Unlock; + } + + pr_debug("PM: Preparing system for suspend\n"); if ((error = suspend_prepare(state))) goto Unlock; pr_debug("PM: Entering state.\n"); - error = s->fn(); + error = suspend_enter(state); pr_debug("PM: Finishing up.\n"); suspend_finish(state); @@ -335,138 +189,10 @@ int pm_suspend(u32 state) } -/** - * pm_resume - Resume from a saved image. - * - * Called as a late_initcall (so all devices are discovered and - * initialized), we call swsusp to see if we have a saved image or not. - * If so, we quiesce devices, the restore the saved image. We will - * return above (in pm_suspend_disk() ) if everything goes well. - * Otherwise, we fail gracefully and return to the normally - * scheduled program. - * - */ - -static int pm_resume(void) -{ - int error; - - if (!have_swsusp) - return 0; - - pr_debug("PM: Reading swsusp image.\n"); - - if ((error = swsusp_read())) - goto Done; - - pr_debug("PM: Preparing system for restore.\n"); - - if ((error = suspend_prepare(PM_SUSPEND_DISK))) - goto Free; - - pr_debug("PM: Restoring saved image.\n"); - swsusp_restore(); - - pr_debug("PM: Restore failed, recovering.n"); - suspend_finish(PM_SUSPEND_DISK); - Free: - swsusp_free(); - Done: - pr_debug("PM: Resume from disk failed.\n"); - return 0; -} - -late_initcall(pm_resume); - decl_subsys(power,NULL,NULL); -#define power_attr(_name) \ -static struct subsys_attribute _name##_attr = { \ - .attr = { \ - .name = __stringify(_name), \ - .mode = 0644, \ - }, \ - .show = _name##_show, \ - .store = _name##_store, \ -} - - -static char * pm_disk_modes[] = { - [PM_DISK_FIRMWARE] = "firmware", - [PM_DISK_PLATFORM] = "platform", - [PM_DISK_SHUTDOWN] = "shutdown", - [PM_DISK_REBOOT] = "reboot", -}; - -/** - * disk - Control suspend-to-disk mode - * - * Suspend-to-disk can be handled in several ways. The greatest - * distinction is who writes memory to disk - the firmware or the OS. - * If the firmware does it, we assume that it also handles suspending - * the system. - * If the OS does it, then we have three options for putting the system - * to sleep - using the platform driver (e.g. ACPI or other PM registers), - * powering off the system or rebooting the system (for testing). - * - * The system will support either 'firmware' or 'platform', and that is - * known a priori (and encoded in pm_ops). But, the user may choose - * 'shutdown' or 'reboot' as alternatives. - * - * show() will display what the mode is currently set to. - * store() will accept one of - * - * 'firmware' - * 'platform' - * 'shutdown' - * 'reboot' - * - * It will only change to 'firmware' or 'platform' if the system - * supports it (as determined from pm_ops->pm_disk_mode). - */ - -static ssize_t disk_show(struct subsystem * subsys, char * buf) -{ - return sprintf(buf,"%s\n",pm_disk_modes[pm_disk_mode]); -} - - -static ssize_t disk_store(struct subsystem * s, const char * buf, size_t n) -{ - int error = 0; - int i; - u32 mode = 0; - - down(&pm_sem); - for (i = PM_DISK_FIRMWARE; i < PM_DISK_MAX; i++) { - if (!strcmp(buf,pm_disk_modes[i])) { - mode = i; - break; - } - } - if (mode) { - if (mode == PM_DISK_SHUTDOWN || mode == PM_DISK_REBOOT) - pm_disk_mode = mode; - else { - if (pm_ops && pm_ops->enter && - (mode == pm_ops->pm_disk_mode)) - pm_disk_mode = mode; - else - error = -EINVAL; - } - } else - error = -EINVAL; - - pr_debug("PM: suspend-to-disk mode set to '%s'\n", - pm_disk_modes[mode]); - up(&pm_sem); - return error ? error : n; -} - -power_attr(disk); - /** * state - control system power state. * @@ -480,27 +206,28 @@ power_attr(disk); static ssize_t state_show(struct subsystem * subsys, char * buf) { - struct pm_state * state; + int i; char * s = buf; - for (state = &pm_states[0]; state->name; state++) - s += sprintf(s,"%s ",state->name); + for (i = 0; i < PM_SUSPEND_MAX; i++) { + if (pm_states[i]) + s += sprintf(s,"%s ",pm_states[i]); + } s += sprintf(s,"\n"); return (s - buf); } static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t n) { - u32 state; - struct pm_state * s; + u32 state = PM_SUSPEND_STANDBY; + char ** s; int error; - for (state = 0; state < PM_SUSPEND_MAX; state++) { - s = &pm_states[state]; - if (s->name && !strcmp(buf,s->name)) + for (s = &pm_states[state]; *s; s++, state++) { + if (!strcmp(buf,*s)) break; } - if (s) + if (*s) error = enter_state(state); else error = -EINVAL; @@ -511,7 +238,6 @@ power_attr(state); static struct attribute * g[] = { &state_attr.attr, - &disk_attr.attr, NULL, }; @@ -520,7 +246,7 @@ static struct attribute_group attr_group }; -static int pm_init(void) +static int __init pm_init(void) { int error = subsystem_register(&power_subsys); if (!error) diff -puN kernel/power/Makefile~test4-pm1 kernel/power/Makefile --- 25/kernel/power/Makefile~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/power/Makefile 2003-09-05 00:50:16.000000000 -0700 @@ -1,4 +1,4 @@ obj-y := main.o process.o console.o pm.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o +obj-$(CONFIG_SOFTWARE_SUSPEND) += disk.o swsusp.o obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o diff -puN kernel/power/power.h~test4-pm1 kernel/power/power.h --- 25/kernel/power/power.h~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/power/power.h 2003-09-05 00:50:16.000000000 -0700 @@ -10,34 +10,27 @@ #ifdef CONFIG_SOFTWARE_SUSPEND -extern int swsusp_save(void); -extern int swsusp_write(void); -extern int swsusp_read(void); -extern int swsusp_restore(void); -extern int swsusp_free(void); +extern int pm_suspend_disk(void); + #else -static inline int swsusp_save(void) -{ - return 0; -} -static inline int swsusp_write(void) +static inline int pm_suspend_disk(void) { - return 0; -} -static inline int swsusp_read(void) -{ - return 0; -} -static inline int swsusp_restore(void) -{ - return 0; -} -static inline int swsusp_free(void) -{ - return 0; + return -EPERM; } #endif +extern struct semaphore pm_sem; +#define power_attr(_name) \ +static struct subsys_attribute _name##_attr = { \ + .attr = { \ + .name = __stringify(_name), \ + .mode = 0644, \ + }, \ + .show = _name##_show, \ + .store = _name##_store, \ +} + +extern struct subsystem power_subsys; extern int freeze_processes(void); extern void thaw_processes(void); diff -puN kernel/power/swsusp.c~test4-pm1 kernel/power/swsusp.c --- 25/kernel/power/swsusp.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/power/swsusp.c 2003-09-05 00:50:16.000000000 -0700 @@ -34,38 +34,21 @@ * For TODOs,FIXMEs also look in Documentation/swsusp.txt */ -#include #include +#include #include -#include -#include -#include #include -#include #include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include -#include #include #include -#include #include -#include -#include #include "power.h" -unsigned char software_suspend_enabled = 1; + +extern int swsusp_arch_suspend(int resume); #define __ADDRESS(x) ((unsigned long) phys_to_virt(x)) #define ADDRESS(x) __ADDRESS((x) << PAGE_SHIFT) @@ -76,9 +59,6 @@ extern char __nosave_begin, __nosave_end extern int is_head_of_free_region(struct page *); -/* Locks */ -spinlock_t suspend_pagedir_lock __nosavedata = SPIN_LOCK_UNLOCKED; - /* Variables to be preserved over suspend */ static int pagedir_order_check; static int nr_copy_pages_check; @@ -488,21 +468,30 @@ static int suspend_prepare_image(void) return 0; } + +/** + * suspend_save_image - Prepare and write saved image to swap. + * + * IRQs are re-enabled here so we can resume devices and safely write + * to the swap devices. We disable them again before we leave. + * + * The second lock_swapdevices() will unlock ignored swap devices since + * writing is finished. + * It is important _NOT_ to umount filesystems at this point. We want + * them synced (in case something goes wrong) but we DO not want to mark + * filesystem clean: it is not. (And it does not matter, if we resume + * correctly, we'll mark system clean, anyway.) + */ + static int suspend_save_image(void) { int error; - + local_irq_enable(); device_resume(); - lock_swapdevices(); error = write_suspend_image(); - lock_swapdevices(); /* This will unlock ignored swap devices since writing is finished */ - - /* It is important _NOT_ to umount filesystems at this point. We want - * them synced (in case something goes wrong) but we DO not want to mark - * filesystem clean: it is not. (And it does not matter, if we resume - * correctly, we'll mark system clean, anyway.) - */ + lock_swapdevices(); + local_irq_disable(); return error; } @@ -510,66 +499,49 @@ static int suspend_save_image(void) * Magic happens here */ -void do_magic_resume_1(void) -{ - barrier(); - mb(); - spin_lock_irq(&suspend_pagedir_lock); /* Done to disable interrupts */ - PRINTK( "Waiting for DMAs to settle down...\n"); - /* We do not want some readahead with DMA to corrupt our memory, right? - Do it with disabled interrupts for best effect. That way, if some - driver scheduled DMA, we have good chance for DMA to finish ;-). */ - mdelay(1000); -} - -void do_magic_resume_2(void) +int swsusp_resume(void) { BUG_ON (nr_copy_pages_check != nr_copy_pages); BUG_ON (pagedir_order_check != pagedir_order); /* Even mappings of "global" things (vmalloc) need to be fixed */ __flush_tlb_global(); - spin_unlock_irq(&suspend_pagedir_lock); + return 0; } -/* do_magic() is implemented in arch/?/kernel/suspend_asm.S, and basically does: +/* swsusp_arch_suspend() is implemented in arch/?/power/swsusp.S, + and basically does: if (!resume) { - do_magic_suspend_1(); save_processor_state(); SAVE_REGISTERS - do_magic_suspend_2(); + swsusp_suspend(); return; } GO_TO_SWAPPER_PAGE_TABLES - do_magic_resume_1(); COPY_PAGES_BACK RESTORE_REGISTERS restore_processor_state(); - do_magic_resume_2(); + swsusp_resume(); */ -void do_magic_suspend_1(void) -{ - mb(); - barrier(); - spin_lock_irq(&suspend_pagedir_lock); -} -int do_magic_suspend_2(void) +int swsusp_suspend(void) { - int is_problem; + int error; read_swapfiles(); - is_problem = suspend_prepare_image(); - spin_unlock_irq(&suspend_pagedir_lock); - if (!is_problem) - return suspend_save_image(); - printk(KERN_EMERG "%sSuspend failed, trying to recover...\n", name_suspend); - barrier(); - mb(); - mdelay(1000); - return -EFAULT; + error = suspend_prepare_image(); + if (!error) + error = suspend_save_image(); + if (error) { + printk(KERN_EMERG "%sSuspend failed, trying to recover...\n", + name_suspend); + barrier(); + mb(); + mdelay(1000); + } + return error; } /* More restore stuff */ @@ -701,61 +673,146 @@ static int __init sanity_check(struct su return 0; } -static int __init bdev_read_page(struct block_device *bdev, - long pos, void *buf) +static struct block_device * resume_bdev; + + +/** + * Using bio to read from swap. + * This code requires a bit more work than just using buffer heads + * but, it is the recommended way for 2.5/2.6. + * The following are to signal the beginning and end of I/O. Bios + * finish asynchronously, while we want them to happen synchronously. + * A simple atomic_t, and a wait loop take care of this problem. + */ + +static atomic_t io_done = ATOMIC_INIT(0); + +static void start_io(void) +{ + atomic_set(&io_done,1); +} + +static int end_io(struct bio * bio, unsigned int num, int err) { - struct buffer_head *bh; - BUG_ON (pos%PAGE_SIZE); - bh = __bread(bdev, pos/PAGE_SIZE, PAGE_SIZE); - if (!bh || (!bh->b_data)) { - return -1; - } - memcpy(buf, bh->b_data, PAGE_SIZE); /* FIXME: may need kmap() */ - BUG_ON(!buffer_uptodate(bh)); - brelse(bh); + atomic_set(&io_done,0); return 0; -} +} + +static void wait_io(void) +{ + blk_run_queues(); + while(atomic_read(&io_done)) + io_schedule(); +} + + +/** + * submit - submit BIO request. + * @rw: READ or WRITE. + * @off physical offset of page. + * @page: page we're reading or writing. + * + * Straight from the textbook - allocate and initialize the bio. + * If we're writing, make sure the page is marked as dirty. + * Then submit it and wait. + */ + +static int submit(int rw, pgoff_t page_off, void * page) +{ + int error = 0; + struct bio * bio; + + bio = bio_alloc(GFP_ATOMIC,1); + if (!bio) + return -ENOMEM; + bio->bi_sector = page_off * (PAGE_SIZE >> 9); + bio_get(bio); + bio->bi_bdev = resume_bdev; + bio->bi_end_io = end_io; + + if (bio_add_page(bio, virt_to_page(page), PAGE_SIZE, 0) < PAGE_SIZE) { + printk("ERROR: adding page to bio at %ld\n",page_off); + error = -EFAULT; + goto Done; + } + + if (rw == WRITE) + bio_set_pages_dirty(bio); + start_io(); + submit_bio(rw,bio); + wait_io(); + Done: + bio_put(bio); + return error; +} + +static int +read_page(pgoff_t page_off, void * page) +{ + return submit(READ,page_off,page); +} + +static int +write_page(pgoff_t page_off, void * page) +{ + return submit(WRITE,page_off,page); +} + extern dev_t __init name_to_dev_t(const char *line); -static int __init read_suspend_image(struct block_device *bdev, - union diskpage *cur) + +#define next_entry(diskpage) diskpage->link.next + +static int __init read_suspend_image(void) { swp_entry_t next; int i, nr_pgdir_pages; + union diskpage *cur; + int error = 0; -#define PREPARENEXT \ - { next = cur->link.next; \ - next.val = swp_offset(next) * PAGE_SIZE; \ - } - - if (bdev_read_page(bdev, 0, cur)) return -EIO; + cur = (union diskpage *)get_zeroed_page(GFP_ATOMIC); + if (!cur) + return -ENOMEM; - if ((!memcmp("SWAP-SPACE",cur->swh.magic.magic,10)) || - (!memcmp("SWAPSPACE2",cur->swh.magic.magic,10))) { - printk(KERN_ERR "%sThis is normal swap space\n", name_resume ); - return -EINVAL; - } + if ((error = read_page(0, cur))) + goto Done; - PREPARENEXT; /* We have to read next position before we overwrite it */ + /* + * We have to read next position before we overwrite it + */ + next = next_entry(cur); if (!memcmp("S1",cur->swh.magic.magic,2)) memcpy(cur->swh.magic.magic,"SWAP-SPACE",10); else if (!memcmp("S2",cur->swh.magic.magic,2)) memcpy(cur->swh.magic.magic,"SWAPSPACE2",10); - else { - printk("swsusp: %s: Unable to find suspended-data signature (%.10s - misspelled?\n", - name_resume, cur->swh.magic.magic); - return -EFAULT; + else if ((!memcmp("SWAP-SPACE",cur->swh.magic.magic,10)) || + (!memcmp("SWAPSPACE2",cur->swh.magic.magic,10))) { + printk(KERN_ERR "swsusp: Partition is normal swap space\n"); + error = -EINVAL; + goto Done; + } else { + printk(KERN_ERR "swsusp: Invalid partition type.\n"); + error = -EINVAL; + goto Done; } + /* + * Reset swap signature now. + */ + if ((error = write_page(0,cur))) + goto Done; + printk( "%sSignature found, resuming\n", name_resume ); MDELAY(1000); - if (bdev_read_page(bdev, next.val, cur)) return -EIO; - if (sanity_check(&cur->sh)) /* Is this same machine? */ - return -EPERM; - PREPARENEXT; + if ((error = read_page(swp_offset(next), cur))) + goto Done; + /* Is this same machine? */ + if ((error = sanity_check(&cur->sh))) + goto Done; + next = next_entry(cur); pagedir_save = cur->sh.suspend_pagedir; nr_copy_pages = cur->sh.num_pbes; @@ -763,8 +820,10 @@ static int __init read_suspend_image(str pagedir_order = get_bitmask_order(nr_pgdir_pages); pagedir_nosave = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC, pagedir_order); - if (!pagedir_nosave) - return -ENOMEM; + if (!pagedir_nosave) { + error = -ENOMEM; + goto Done; + } PRINTK( "%sReading pagedir, ", name_resume ); @@ -772,15 +831,17 @@ static int __init read_suspend_image(str for (i=nr_pgdir_pages-1; i>=0; i--) { BUG_ON (!next.val); cur = (union diskpage *)((char *) pagedir_nosave)+i; - if (bdev_read_page(bdev, next.val, cur)) return -EIO; - PREPARENEXT; + error = read_page(swp_offset(next), cur); + if (error) + goto FreePagedir; + next = next_entry(cur); } BUG_ON (next.val); - if (relocate_pagedir()) - return -ENOMEM; - if (check_pagedir()) - return -ENOMEM; + if ((error = relocate_pagedir())) + goto FreePagedir; + if ((error = check_pagedir())) + goto FreePagedir; printk( "Reading image data (%d pages): ", nr_copy_pages ); for(i=0; i < nr_copy_pages; i++) { @@ -789,11 +850,18 @@ static int __init read_suspend_image(str printk( "." ); /* You do not need to check for overlaps... ... check_pagedir already did this work */ - if (bdev_read_page(bdev, swp_offset(swap_address) * PAGE_SIZE, (char *)((pagedir_nosave+i)->address))) - return -EIO; + error = read_page(swp_offset(swap_address), + (char *)((pagedir_nosave+i)->address)); + if (error) + goto FreePagedir; } printk( "|\n" ); - return 0; + Done: + free_page((unsigned long)cur); + return error; + FreePagedir: + free_pages((unsigned long)pagedir_nosave,pagedir_order); + goto Done; } /** @@ -806,24 +874,23 @@ int swsusp_save(void) printk("swsusp is not supported with high- or discontig-mem.\n"); return -EPERM; #endif - return 0; + return arch_prepare_suspend(); } /** * swsusp_write - Write saved memory image to swap. * - * do_magic(0) returns after system is resumed. + * swsusp_arch_suspend(0) returns after system is resumed. * - * do_magic() copies all "used" memory to "free" memory, then - * unsuspends all device drivers, and writes memory to disk + * swsusp_arch_suspend() copies all "used" memory to "free" memory, + * then unsuspends all device drivers, and writes memory to disk * using normal kernel mechanism. */ int swsusp_write(void) { - arch_prepare_suspend(); - return do_magic(0); + return swsusp_arch_suspend(0); } @@ -833,7 +900,6 @@ int swsusp_write(void) int __init swsusp_read(void) { - union diskpage *cur; int error; char b[BDEVNAME_SIZE]; @@ -844,19 +910,13 @@ int __init swsusp_read(void) printk("swsusp: Resume From Partition: %s, Device: %s\n", resume_file, __bdevname(resume_device, b)); - cur = (union diskpage *)get_zeroed_page(GFP_ATOMIC); - if (cur) { - struct block_device *bdev; - bdev = open_by_devnum(resume_device, FMODE_READ, BDEV_RAW); - if (!IS_ERR(bdev)) { - set_blocksize(bdev, PAGE_SIZE); - error = read_suspend_image(bdev, cur); - blkdev_put(bdev, BDEV_RAW); - } else - error = PTR_ERR(bdev); - free_page((unsigned long)cur); + resume_bdev = open_by_devnum(resume_device, FMODE_READ, BDEV_RAW); + if (!IS_ERR(resume_bdev)) { + set_blocksize(resume_bdev, PAGE_SIZE); + error = read_suspend_image(); + blkdev_put(resume_bdev, BDEV_RAW); } else - error = -ENOMEM; + error = PTR_ERR(resume_bdev); if (!error) PRINTK("Reading resume file was successful\n"); @@ -873,7 +933,7 @@ int __init swsusp_read(void) int __init swsusp_restore(void) { - return do_magic(1); + return swsusp_arch_suspend(1); } @@ -885,13 +945,20 @@ int swsusp_free(void) { PRINTK( "Freeing prev allocated pagedir\n" ); free_suspend_pagedir((unsigned long) pagedir_save); - - PRINTK( "Fixing swap signatures... " ); - mark_swapfiles(((swp_entry_t) {0}), MARK_SWAP_RESUME); - PRINTK( "ok\n" ); return 0; } + +int software_suspend(void) +{ + struct pm_ops swsusp_ops = { + .pm_disk_mode = PM_DISK_SHUTDOWN, + }; + + pm_set_ops(&swsusp_ops); + return pm_suspend(PM_SUSPEND_DISK); +} + static int __init resume_setup(char *str) { if (strlen(str)) diff -puN kernel/sys.c~test4-pm1 kernel/sys.c --- 25/kernel/sys.c~test4-pm1 2003-09-05 00:50:16.000000000 -0700 +++ 25-akpm/kernel/sys.c 2003-09-05 00:50:16.000000000 -0700 @@ -456,7 +456,7 @@ asmlinkage long sys_reboot(int magic1, i #ifdef CONFIG_SOFTWARE_SUSPEND case LINUX_REBOOT_CMD_SW_SUSPEND: - if (!pm_suspend(PM_SUSPEND_DISK)) + if (!software_suspend()) break; do_exit(0); break; _