Age | Commit message (Collapse) | Author | Files | Lines |
|
Victim will be used as new test case for error injection. It provides
an unified interface to export physical address for CE/PFA/IFU/DCU
test, even for eMCA.
Signed-off-by: zhilongx.liu <zhilongx.liu@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
casefile is used to save what test cases will be used finally. So
a proper introduction is necessary.
BTW, fix a spell mistake in runmcetest.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Fix two bugs in two test cases.
1) In the test for disk file soft off-line, it often fails
because it is mmaped via shared mode. Now chaning it
to private mode to fix wider test environment.
2) in run_soft.sh there is one spell mistake so that some
test case will fail.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
If BIOS is bogus so that error injection can't be executed
as expected, curent test case will fail. Fix this bug.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Too many BIOSes are bogus so that we have to disable
auto trigger mechanism for PFA test case.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
eMCA is a kind of new mechanism to report H/W errors since
IVB-EX platform. By now only eMCA Gen1 is supported, which
means only CE error can be reported from this path.
Signed-off-by: Liu, ZhilongX <zhilongx.liu@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Add load checker of hwpoison-inject module for all other hwpoison
test cases besides run_hugepage_overcommit.sh.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Add load checker of hwpoison-inject module for test case
run_hugepage_overcommit.sh.
NOTE: Gong revisits this patch a little bit.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy
(over-commit page). If page is in buddy, page_hstate(page) will be
NULL. It will hit a NULL pointer dereference in
dequeue_hwpoisoned_huge_page().
[ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[ 890.685741] IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
[ 890.692861] PGD c23762067 PUD c24be2067 PMD 0
[ 890.697314] Oops: 0000 [#1] SMP
This test case is targeted for the bug reported by Jianguo Wu,
where we have NULL pointer access when we have to free source
hugepage under overcommitting situation.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Remove possible EDAC driver to avoid interference.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
"&>>" can't be recognized on some Linux OS such as SuSE because it
uses older BASH version, So use substitute mode.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
1. Don't use $ROOT to locate BSP directory, $TMP_DIR instead
2. Change the invoke sequence of variables (NUM_FAIL_CPU/NUM_PASS_CPU)
to avoid any complaint.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
To avoid temporary files are saved in wrong directory when test
script is executed under its own directory, TMP_DIR path should
be identified before the test.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The lack of double quotation leads to a grammar mistake.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Fix incomplete dmesg information which is used for result analysis.
Put related dmesg/mcelog log under path/to/apei-inj/log/.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This test includes regular EINJ error injection test and
Vendor Extension Specific Error Injection test with ACPI5.0 enabled BIOS.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
tinjpage and ttranshuge can get SIGCHLD(CLD_DUMPED) from their child
processes, but now they only check CLD_KILLED, so tests fail.
This behavior of the kernel might not be wrong, because the defalut
action of the SIGBUS is 'coredump', not 'terminate' (see comments in
include/linux/signal.h).
With this patch, we accept SIGCHLD(CLD_DUMPED) as a correct behavior.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Code displacement:
- moved common code into helper.sh to avoid duplicates,
- merged run-huge-test.sh into run_hugepage.sh and
run-transhuge-test.sh into run_thp.sh.
Minor improvements:
- added sysctl vm.memory_failure_early_kill=0 in the setup of each
testcase (some testcases change this global parameter, so it's safe
to reset it to 0 to avoid interference between testcases),
- added freeing resources (shmems, semaphores) and unpoisoning
in the cleanup of each testcase,
- added counter check ("HardwareCorrupted:" in /proc/meminfo)
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
New page-types fixes some bugs and support THP, so update this
tool for mce-test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
One dot is missed in the Makefile so that GDB can't get symbol
table from the binary when debugging.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The type parameter in mount entry is random especially for pseudo
filesystem, thus, we don't want a hardcode on it.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
anonymous hugepage, file backed hugepage and shared memory hugepage
need a mounted hugetlbfs.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add missed file attribution for BSP test case.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Basic BSP online/offline tests include 3 modes: PER-CPU mode
GROUP-CPU mode and S3/S4 with CPU0 onlined or offlined,
respectively.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Coverage test cases are only for white-box test during development
of some RAS features in the kernel. By now it is totally obsolete.
Mask these test cases to avoid confusing users.
It will be removed from the test suite after some time, if no one
has complainant.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This fixes a compile warning.
open(2) manpage says:
... mode specifies the permissions to use in case a new
file is created. This argument must be supplied when
O_CREAT is specified in flags; ...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
_XOPEN_SOURCE=500 must be defined for pread
but this will result in MAP_ANONYMOUS not being defined
-> also define _BSD_SOURCE for MAP_ANONYMOUS
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some she-bang are missed in the bash header.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add some information to remind one possible reasons
when meeting failures.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The output from special dialog version has double quote even if
--separate-output is used. If so, rip them to ensure the output
is like regular dialog output.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms OS doesn't support parameter notrigger.
Under this kind of situation, injection procedure is dangerous
because it maybe causes sytem oops/crash. If no this parameter,
the test should be teminated.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms PFA will not be triggered so that the PFA test
can't finish. So the timeout functionality is necessary to avoid
endless PFA test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
If ERST table is full, the test can't begin. To avoid
this potential issue, if existing ERST record, erase
one record to relase the storage space and let the test
go on.
Because the ERST test maybe damges the data in the ERST
table, please restore the valid data in the ERST to the
other safe place.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some EDAC modules will stop mcelog to collect the error log from
kernel mcelog buffer, which cause the mcelog PFA function invalid.
To avoid the influence from EDAC module, remove the specific EDAC
module before the test and restore it after the test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms original PFA case can't work well because
of no actual reading/writing action in time. This patch enhances
the reading/writing operations to ensure the error can be triggered.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some test scripts can't be recognized well on some Linux OS,
such as Ubuntu. Change default *sh* to *bash*.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@intel.com>
|
|
This patch adds two SRAR functinal test cases (DCU & IFU). The
SRAR test is highly BIOS dependent so if BIOS is bogus, system
will be hang or panic. By default these two test cases are
disabled, if one wants to test SRAR, please open them.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms old methods can't find debugfs correctly,
so a new way via /proc/mounts is used to find debugfs path.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Many minor fixes are added. Some for compatibility, some for
enhancement, and the others for bug fixes.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Old logic will filter out comment lines and the words containing
on/off letters in case list files when executing case selecting.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
mcemenu and runmcetest are shell files and should own 'x' bit.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This new design reorganize entire structure of MCE-test. After
applying new structure, MCE-test owns new unified output format
and interface.
In principle, during this change, no functional change. Only some
minor fixes and updates are added, BTW, a few new test cases are
merged such as PFA. Other test cases will be applied after this
change is fused into current MCE-test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
param_extension is an new module parameter to support
param1/param2 as an BIOS extension for specific vendor.
By default the tests need to enable this parameter to
to get param1/param2.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This is part of the SRAR test cases. It is used
to test DCU error happening under user land and
other CPUs working in the user context, kernel
context, NMI context and IRQ context.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
reformat erst-inject.c to make it to follow UNIX style
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1) in the last patch after update makefile rule, I forget to update
corresponding shell script. And the shell script mode attribute is
not correct, too
2) update erst-inject tool to provide more friendly prompt
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
this case is used to test read/write/clear operations
on ERST.
Pay attention, please use this case on the kernel >=2.6.39-rc1.
More detail information please refer the test case itself.
BTW, this case doesn't consider the situation such as duplicate
or missing id because current firmware has bugs. It will be
updated after the firmware fixes this issue.
V3 -> V2: Makefile without recursive make
V2 -> V1: add copyright information
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
guest_tmp usage is totally wrong. It assumes existing
the same directory on the host and guest. In fact, the
definition is just correct for guest system. Otherwise,
the file guest_tmp can't be transfered to the host correctly.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
when first connecting to guest OS, guest OS will transfer
its public key fingerprint to the host OS. To avoid interactive
operation in the test procedure, no strict check is necessary.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
1) latest qemu monitor output format is changed, so
update the condition check
2) it looks the starting anonymous memory addresses of simple_process
can't be used as injection address. Just skip them.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
here is the fix list:
1) rc3.d shouldn't be the default start position. it should be
assgined according to the /etc/inittab
2) when test case quits unexpected, qemu should be killed, too.
3) delete an extra local parameter
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add more content into it to make it more readable and
operable. Besides the update for README file. Some related
patches are added into mce-test suite, too.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some operations in the procedure of creating guest image
can be done automatically. Such as copying simple_process
and page-types tool into guest image.
Another update is about public/private keys. The original usage
maybe breaks the path relationship because user can set
public/private key file path indepently without HOST_DIR involved.
But these setting is useless, so delete these options.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Here is the list:
1) EARLYKILL is defined but not used
2) some echo info outside functions include "!", which will
make shell confused and give wrong output
3) add page-types check on the host side
4) some $mnt usages are dangerous. Such as $mnt$get_tmp
will return wrong path
5) fix a spell error for variable QEMU_PID
6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch
7) in the usage host_run.sh can be executed directly but in fact
it doesn't. Add execution permission for it.
8) add "-h" description and option "h" should not be given a ":"
9) make "-m" option a consistent action as other options
10) add more conditions check before tests
11) simplify some statements
12) auto mount mce_inject module
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This reverts commit 916cfd584ec37aa3dec3aae25b265e2701b35246.
|
|
This reverts commit b09f37e5d0d93d33fd5930222cc106708d85e1ed.
|
|
This reverts commit 5c854ab100dcbd6a445a0c07e2f35f40fefe2a59.
|
|
Add more content into it to make it more readable and
operable. Besides the update for README file. Some related
patches are added into mce-test suite, too.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some operations in the procedure of creating guest image
can be done automatically. Such as copying simple_process
and page-types tool into guest image.
Another update is about public/private keys. The original usage
maybe breaks the path relationship because user can set
public/private key file path indepently without HOST_DIR involved.
But these setting is useless, so delete these options.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
I give a quick overview and find some defects. Here is the list:
1) EARLYKILL is defined but not used
2) some echo info outside functions include "!", which will
make shell confused and give wrong output
3) add page-types check on the host side
4) some $mnt usages are dangerous. Such as $mnt$get_tmp
will return wrong path
5) fix a spell error for variable QEMU_PID
6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch
7) in the usage host_run.sh can be executed directly but in fact
it doesn't. Add execution permission for it.
8) add "-h" description and option "h" should not be given a ":"
9) make "-m" option a consistent action as other options
10) add more conditions check before tests
11) simplify some statements
12) auto mount mce_inject module
All of these fixes don't touch actual functions.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
THP is supported from v2.6.38-rc1. So add hwpoison test for testing it easier.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Make parameters of write/read_hugepage() understand easier.
And add comment for the write/read_hugepage().
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
The addr of write/read_hugepage() is the mapping address of file.
So no matter how many hugepages are mapped, addr will be
the head address of all hugepages.
The avoid of write/read_hugepage() is the address which does not
want to be touched. So it could be the head address of any hugepage.
So addr == avoid in write/read_hugepage() is not equal always except
the avoid is the address of the first hugepage.
This patch fixed it.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
When the cowflag is valid, child process should copy all the hugepage of
its parent. But now no matter what cowflag is, the child process will not do
copy-on-write operation. It is because the parameter(size==0) of
write_hugepage() make write_hugepage() do nothing.
This problem is introduced by
commit c6a4c3d950385063db705e520bc9b6cda9587f57
Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
With this patch, the state of parent and child processes will be like following:
Before this patch After this patch
NO-COW Parent and child processes are killed. Same as before.
COW Parent and child processes are killed. Only parent process is killed.
(Here process is killed by memory-failure.)
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
si_addr_lsb check in sighandler() is also extended to hugepage shift.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Soft offlining is driven by using options '-O' and '-x'
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Add three testcases for hugepage soft offlining.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Add routines allocating/freeing hugepages of the following types:
- hugepage on shared memory,
- anonymous hugepage,
- filebacked hugepage.
And also add read/write helper functions.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch makes the following changes to the mce-test suite's kvm test.
(git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git)
. Re-enable the late kill option (-l) on host_run.sh.
. Add a virtual guest RAM size option (-m) to host_run.sh that gets passed to
qemu-system-x86_64. This allows for testing guest's >= 4069M in size.
. Allow for guest .img files to consist of LVM partitions.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
add failure statistic and exit value check, so that
it is easy to run automatic test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
If hwpoison.sh is executed from the top level
Makefile, it doesn't compile/install the required binaries. The Makefile
in mce-test/stress works correctly.
...
Test aborted by unexpected error!
[error] !!! no bin subdir there !!!
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch is to fix below problem:
...
result summary:
fs_metadata -- no test finished
details: /root/git/mce-test/stress/log/fs_metadata/fs_metadata.log
fsck.ext3 -- fsck on /dev/loop5 got pass
totally 1 task-groups report failures
...
...
[04-05 16:29:08] thread 0 starts with pid 25027
tee: ./hwpoison/fs_metadata/k-threads.pid: No such file or directory
25027
Signed-off-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Supply correct err() and errmsg macro, don't use implicit
ones from glibc with different prototype
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
So far not hooked up to standard "make test" because
the kernel patches are not in yet.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
In case something goes wrong in the kernel with the poisoned
mappings
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
If I run hwpoison.sh without -C option I get the following errors:
./hwpoison.sh: line 366: [: -eq: unary operator expected
./hwpoison.sh: line 371: [: -gt: unary operator expected
./hwpoison.sh: line 372: [: -eq: unary operator expected
The reason is g_children is NULL, which should be zero.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Not strictly needed due to line buffering, but more
future proof.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This way the child won't fail if there were already other
errors.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Based on a report from Evan McNabb
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
$ltp_root/pan/ - use invalid() to exit when error is related to command option. - add die() to let stress tester work fine with common func check_debugfs().
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
On the Ubuntu platform, sh is linked to dash so that
all of these shell scripts can't run correctly. It needs to
be substituted with BASH.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
added clearing and backing up old logs for kdump driver. As testcases causes
reboot and the script is re-run after each reboot the test ends up in infinite
loop (as setupped stamp is moved).
Second one is with loading mce-inject module. The kdump test driver is
appereantly run with "set -ex" so all lines that can return non zero (and
should not stop script exectuion) must be used only as a part of a
conditionals.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch is to add KVM RAS test suite into mce-test, which is a
collection of test scripts for testing the Linux kernel MCE processing
features in KVM guest system.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
1. auto-load einj module before apei test begins and update APEI_IF
definition to a proper place
2. fix typos in the check_debugfs
3. enhance the module check before stress test
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. test path shouldn't be placed under "/"in the stress/hwpoison.sh
2. to clear the log history, backup old test log with different names.
3. add execution attribute for apei test case
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
cleanup some confusion execution paths
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
g_ltppan needs to be updated after g_ltproot is set.
BTW, I consider g_ltppan should be under g_ltproot directly. It is
more clear.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. more graceful output in the check_debugfs
2. eliminate trivial usage of parameter "debugfs" in hwpoison.sh
3. add some additional checks before driver kicks off, if not so,
one maybe meets such info "Failed: MCE log is different from input",
in fact it is only because module mce_inject isn't be inserted.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
update definiton of APEI_IF. Now it can be located anywhere.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
check_debugfs should not only be serviced for mce.
And add a new function dedicated for mce.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This essentially renames test-simple to test
Also some minor fixes to the Makefiel
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This runs all the standard functional tests for a quick test in hwpoison
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- Add way to specify random seed
- Add timeout
- Various new checks to be more user friendly
- Use standard option parsing
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This is to handle kernel where the filter defaults to off.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- Fix indentation
- Always report failure to parent
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch adds the testcases for mmap/IPV shared pages.
The purpose of these testcases is as follows:
- We can check whether a process A is killed expectedly when it accesses
the page shared with and hwpoisoned by another process B
(in the late killing case).
- We can check whether a process A is killed at once when another process B
injected hwpoison into the page shared by both of them
(in the early killing case).
ChangeLog:
- Add synchronization code between parent and child process with semaphore.
- Share the common function do_shared() between mmap case and IPV case.
- Add error chack code.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
It is the fs-metadata workload. fs-metadata is designed to test i-node
operations with heavy workload and make sure every i-node operation gets
the expected result. In details, it firstly generates a huge directory
hierarchy on the target disk, then it performs unlink operations on this
directory hierarchy and duplicate a copy of the directory, finally it
checks if these two directories are same as expected.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
page-poisoning test program is an extension of tinjpage test program with a
multi-process model. It spawns thousands of processes that inject HWPosion
error to various pages simultaneously thru madvise syscall. Then it checks
if these errors get handled correctly, i.e. whether each test process
receives or doesn't receive SIGBUS signal as expected.
In details, page-poisoning is designed to cover all of possible userspace page
types via following two test operations:
- anonymous pages operations.
- file data operations.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Documentation of MCE stress test suite.
Reviewed-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
The MCE stress test suite is a collection of tools and test scripts, which
intends to achieve stress testing for Linux kernel MCA high level handlers
that include HWPosion page recovery, soft page offline, and so on.
In general, this test suite is designed to do stress testing thru various
test interfaces, i.e. madvise syscall, HWPoison page injector, and APEI
injector (see ACPI4.0 spec). And it's able to support most of popular
Linux File Systems (FS), that is, there is an option for user to specify which
FS type they want the test to be running on.
The MCE stress test suite consists of four parts: test driver, workload
controller, customized workloads, and background workloads.
The main test idea is described as below:
- Test driver launchs various customized workloads to continuously generate
lots of pages with expected page states, Note, all of these workloads know
about their expected results that should not be affected by Linux MCE high
level handlers.
- Then test driver injects MCE errors to these pages thru either madvise
syscall or HWPoison injector or APEI injector. While Linux Kernel handling
these MCE errors, all the workloads continue running normally,
- After long time running, test driver will collect test result of each
workload to see if any unexpected failures happened. In such a way, it can
decide if any bug is found.
- If any system panics or FS corruption happens, that means there must be a
bug. It's the bottom line to decide if test gets pass.
Test driver (a.k.a hwpoison.sh) drives the whole test procedure. It's
responsible for managing test environment, setting up error injection
interface, controlling test progress, launching workloads, injecting page
errors, as well as recordng test logs and reportng test result.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
My debugging shows mmap() with MLOCKED flag will set page dirty again,
then kernel handler would never enter into clean page handling logic.
So use fsync() after mmap() to make the page clean.
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Tested-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
(requires hwpoison-2.6.32)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
tsrc tests hwpoison
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Update the howto.txt.
-recommend to stop cron before mce testing.
-add an introduction to loop-mce-test as well.
Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com>
|
|
Some parameter changes and other minor changes.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Rename to tools/loop-mce-test.sh to follow naming convention. chmod +x.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Based on work from Dean Nelson
Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- all second errors are optional because the VFS reports only once
- hole errors are optional because we can't propagate errors for holes
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Also add Fengguang as author
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
On machine with SER_P, machine_check_poll in kernel filters out MCE
with MCI_STATUS_S instead of MCI_STATUS_UC. So for some test cases run
on machine with/without SER_P, both UC and S should be set.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
SLE11 change the kdump name from "kdump" to "boot.kdump". So
fix it in a usual way.
Signed-off-by: Chen Gong <gong.chen@intel.com>
|
|
update the document for test with kdump test driver.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
|
|
Add a new test group -- poll_noser, add three cases -- fatal_poll,
srar_poll and uc_poll to test the conditional control statement in
machine_check_poll.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
|
|
There was a conflict with the MADV_POISON value, so update to the
new one. Note -- you need a new kernel for testing now.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Add config/simple.conf and config/kdump.conf to make "make test" works
again. Only minimal test cases works on all machine are added to these
config files.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Because fake panic configuration file moved from sysfs to debugfs.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
LTP uses Linux kernel coding style now. So fix some coding style issue
reported by checkpatch.pl.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Because it has been removed.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
It is not used now.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Cleanup README. Update kernel requirement in doc/howto.txt.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
two cases -- srao_mem_scrub_noripv and srao_ewb_noripv were added into
panic_ucr, update the document accordingly.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
|
|
expecterr() and optionalerr() think there was an error if a given
return value of a function call is 0, otherwise no error. But this
assumption is not always true (e.g. write(2)). So we should check
errors in caller side and then pass the result.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
In file dirty case, it tries to write some data to a test file
opened with O_RDONLY. It gets an unexpected error. Fix it by
opening the file with O_RDWR flag.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Because they cause system panic.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
The new cases are srao_ewb_noripv and srao_mem_scrub_noripv. After either
one of them injected, the result of the case seems right. But panic type
and exception message needed as standards for checking the result of the
cases does not make sense in fact. Because they are "NULL". So I modify them.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
add two cases to recoverable_ucr.
--srao_mem_scrub_noripv
--srao_ewb_noripv
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
|
|
There are two cases added, one(srar_no_en) for panic_ucr and the
other(ucna_over) for poll_ucr. Update the document for them.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
Add two cases: ucna_over in poll_ucr and srar_no_en in panic_ucr.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Add document for test cases in recoverable_ucr.
- add soft-inj_recoverable_ucr.txt
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
It was too difficult to compile with kernel includes, so add
a hierarchy of fake kernel includes to stub out kernel functions.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
And delete tmp file after shell exit
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
tools/mce_shell.sh simulates the environment of mce-test driver and
test case script, used for debugging. mce-test library functions can
be invoked by mce-shell interactively.
|
|
Update some of the document which is out of date. And add some
new document of the test case which does not have document.
- modified soft-inj_non-panic.txt
- modified soft-inj_panic.txt
- modified soft-inj_panic_npcc.txt
- add soft-inj_panic_noser.txt
- add soft-inj_panic_ucr.txt
- add soft-inj_poll_ucr.txt
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
Some of the reference cases are same as the test cases, so I think
they are not necessary now.
- delete the reference case of fatal
- delete the reference case of fatal_over
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
Corrected Machine check should be logged in do_machine_check if system
will go panic.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
The speed of kernel output log is too slow to be catched on some
machine. And there is a random sleep mechansim for random testing.
So we move random sleep before kernel log extracting, and extend
sleep time to at least 5 seconds.
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
Because it is obsolete now.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
The value of /proc/sys/kernel/panic_on_oops in some system is setted
with "1" as default, so we want to resolve the trouble with this
function for mcetest.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
|
|
Start background testing in its own process group, so all processes
in background testing process group can be killed with:
kill -TERM -$pgrp
|
|
It seems that /bin/cp does not work for debugfs seq file, so uses
/bin/cat instead.
|
|
$KSRC_DIR may be a symbol link, this may break "find" used on that.
So convert it into canonical form before usage, this can check whether
it is a valid directory too.
|