Hello,
This is an RFC patch intended to first review basic design of --split option support.
This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
Another idea is to introduce a new directive to specify the number of vmcores into which we split /proc/vmcore, and then we append --split option if and only if the directive is specified with the value more than 1 cpu.
From e6afa242829768ee0b9e58637444acf3fed4b442 Mon Sep 17 00:00:00 2001
From: HATAYAMA Daisuke d.hatayama@jp.fujitsu.com Date: Tue, 25 Mar 2014 17:09:42 +0900 Subject: [PATCH] Add --split support for dump on filesystem
This commit implement makedumpfile --split option support, allowing filtering and compression in paralell.
In this design, --split option is automatically appended if more than 1 cpu is available. Also, the number of generated dump files are automatically decided to the number of online cpus.
To support --split option for dump on network, it's necessary to add new feature in makedumpfile to make it possible to specify --split option and -F option at the same time. This is going to be done separately.
Signed-off-by: HATAYAMA Daisuke d.hatayama@jp.fujitsu.com --- dracut-kdump.sh | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index d9e65ac..76494ab 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -88,6 +88,8 @@ dump_fs() { local _dev=$(findmnt -k -f -n -r -o SOURCE $1) local _mp=$(findmnt -k -f -n -r -o TARGET $1) + local _savedir="$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR" + local _nr_cpus=$(grep processor /proc/cpuinfo | wc -l)
echo "kdump: dump target is $_dev"
@@ -100,16 +102,23 @@ dump_fs() # Remove -F in makedumpfile case. We don't want a flat format dump here. [[ $CORE_COLLECTOR = *makedumpfile* ]] && CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e "s/-F//g"`
- echo "kdump: saving to $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/" + echo "kdump: saving to $_savedir"
mount -o remount,rw $_mp || return 1 - mkdir -p $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR || return 1 + mkdir -p $_savedir || return 1
- save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/" + save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_savedir"
echo "kdump: saving vmcore" - $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete || return 1 - mv $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore + if [[ $CORE_COLLECTOR = *makedumpfile* && $_nr_cpus > 1 ]] ; then + $CORE_COLLECTOR --split /proc/vmcore $(seq -s " " -f "$_savedir/vmcore-incomplete-%g" $_nr_cpus) || return 1 + for i in $(seq $_nr_cpus); do + mv $_savedir/vmcore-incomplete-$i $_savedir/vmcore-$i + done + else + $CORE_COLLECTOR /proc/vmcore $_savedir/vmcore-incomplete || return 1 + mv $_savedir/vmcore-incomplete $_savedir/vmcore + fi sync
echo "kdump: saving vmcore complete"
On Tue, Mar 25, 2014 at 08:08:48PM +0900, HATAYAMA, Daisuke wrote:
Hello,
This is an RFC patch intended to first review basic design of --split option support.
This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
Hi Hatayama,
Can you give some more details about how --split feature of makedumpfile works. I have never used it. Why should I split the file into multiple files? And how to get back original single file.
Also, I don't think we should be adding --split automatically. I want to stick to user specified core collector and options and not add things silently.
If user wants to take advantage of parallelism, they need to modify nr_cpus and they need to modify core_collector line also and we should document it properly.
Also can't we take advatage of parallelism for compression and while writing compressed data write it to a single file. That way no special configuration will be required and makedumpfile should be able to fork as many threads as number of cpus, do the compression and write the output to a single file.
Thanks Vivek
Another idea is to introduce a new directive to specify the number of vmcores into which we split /proc/vmcore, and then we append --split option if and only if the directive is specified with the value more than 1 cpu.
From e6afa242829768ee0b9e58637444acf3fed4b442 Mon Sep 17 00:00:00 2001 From: HATAYAMA Daisuke d.hatayama@jp.fujitsu.com Date: Tue, 25 Mar 2014 17:09:42 +0900 Subject: [PATCH] Add --split support for dump on filesystem
This commit implement makedumpfile --split option support, allowing filtering and compression in paralell.
In this design, --split option is automatically appended if more than 1 cpu is available. Also, the number of generated dump files are automatically decided to the number of online cpus.
To support --split option for dump on network, it's necessary to add new feature in makedumpfile to make it possible to specify --split option and -F option at the same time. This is going to be done separately.
Signed-off-by: HATAYAMA Daisuke d.hatayama@jp.fujitsu.com
dracut-kdump.sh | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index d9e65ac..76494ab 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -88,6 +88,8 @@ dump_fs() { local _dev=$(findmnt -k -f -n -r -o SOURCE $1) local _mp=$(findmnt -k -f -n -r -o TARGET $1)
local _savedir="$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR"
local _nr_cpus=$(grep processor /proc/cpuinfo | wc -l)
echo "kdump: dump target is $_dev"
@@ -100,16 +102,23 @@ dump_fs() # Remove -F in makedumpfile case. We don't want a flat format dump here. [[ $CORE_COLLECTOR = *makedumpfile* ]] && CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e "s/-F//g"`
- echo "kdump: saving to $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/"
echo "kdump: saving to $_savedir"
mount -o remount,rw $_mp || return 1
- mkdir -p $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR || return 1
- mkdir -p $_savedir || return 1
- save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/"
save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_savedir"
echo "kdump: saving vmcore"
- $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete || return 1
- mv $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore
if [[ $CORE_COLLECTOR = *makedumpfile* && $_nr_cpus > 1 ]] ; then
$CORE_COLLECTOR --split /proc/vmcore $(seq -s " " -f "$_savedir/vmcore-incomplete-%g" $_nr_cpus) || return 1
for i in $(seq $_nr_cpus); do
mv $_savedir/vmcore-incomplete-$i $_savedir/vmcore-$i
done
else
$CORE_COLLECTOR /proc/vmcore $_savedir/vmcore-incomplete || return 1
mv $_savedir/vmcore-incomplete $_savedir/vmcore
fi sync
echo "kdump: saving vmcore complete"
-- 1.8.5.3
kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Wed, 26 Mar 2014 14:05:07 -0400
On Tue, Mar 25, 2014 at 08:08:48PM +0900, HATAYAMA, Daisuke wrote:
Hello,
This is an RFC patch intended to first review basic design of --split option support.
This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
Hi Hatayama,
Can you give some more details about how --split feature of makedumpfile works. I have never used it. Why should I split the file into multiple files? And how to get back original single file.
crash utility supports vmcores splitted by makedumpfile --split. The syntax is:
$ crash vmlinux vmcore-0 vmcore-1 ... vmcore-{N-1}
Also, I don't think we should be adding --split automatically. I want to stick to user specified core collector and options and not add things silently.
If user wants to take advantage of parallelism, they need to modify nr_cpus and they need to modify core_collector line also and we should document it properly.
The problem is that we now don't have a way to specify the number of parallelism in core_collector since we specify it in --split as the number of vmcore arguments.
How about this? We do parallel processing if
- in core_collector makedumpfile is specified with --split option, and - nr_cpus is larger than 1.
i.e., if --split is specified explicitly, we think user intend to do parallel processing.
I'll post a documentation after design is fixed.
Also can't we take advatage of parallelism for compression and while writing compressed data write it to a single file. That way no special configuration will be required and makedumpfile should be able to fork as many threads as number of cpus, do the compression and write the output to a single file.
First, at least, current makedumpfile cannot do it. To do it, we need to use pthread; rigorously, it's not but doing it by fork() is harmful.
Historically, the reason why makedumpfile chose --split was that at that to avoid increasing initramfs by containing libc.so. (But now this is no longer a problem since we often include the commands that link libc.so in initramfs such as scp.)
Also, splitting dump into multiple vmcores has another merit that it's possible to parallelize even I/O into multiple disks. This is necessay when we strongly need full dump.
So, doing it is possible. It's easier to do by pthread. I assume the logic that multiple threads write compressed data into the same buffer and the thread that detects the buffer is full, flushes the buffer. But makedumpfile now doesn't have the feature, we need to newly implement it.
Thanks. HATAYAMA, Daisuke
On Thu, Mar 27, 2014 at 08:04:55AM +0100, HATAYAMA Daisuke wrote:
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Wed, 26 Mar 2014 14:05:07 -0400
On Tue, Mar 25, 2014 at 08:08:48PM +0900, HATAYAMA, Daisuke wrote:
Hello,
This is an RFC patch intended to first review basic design of --split option support.
This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
Hi Hatayama,
Can you give some more details about how --split feature of makedumpfile works. I have never used it. Why should I split the file into multiple files? And how to get back original single file.
crash utility supports vmcores splitted by makedumpfile --split. The syntax is:
$ crash vmlinux vmcore-0 vmcore-1 ... vmcore-{N-1}
What's the advantage of that. I would rather have a way to take all these fragments and come up with a single file and pass that file to crash.
Anyway, saving multiple files and managing those files is not very convinient.
Also, I don't think we should be adding --split automatically. I want to stick to user specified core collector and options and not add things silently.
If user wants to take advantage of parallelism, they need to modify nr_cpus and they need to modify core_collector line also and we should document it properly.
The problem is that we now don't have a way to specify the number of parallelism in core_collector since we specify it in --split as the number of vmcore arguments.
We can do two things.
- We can check of number of cpus available in second kernel in makedumpfile and makedumpfile can fork off threads accordingly.
- Or we can create a new commandline arguments which specifies how many threads to fork off for compression. A user who will be modifying nr_cpus, can also modify this command line parameter.
I think we can in fact have both. First will be the default behavior which can be overridden with an command line option.
How about this? We do parallel processing if
- in core_collector makedumpfile is specified with --split option, and
- nr_cpus is larger than 1.
Instead of checking for nr_cpus, just look into /sys or somewhere else and see how many processors are online. And fork off those many threads accordingly.
i.e., if --split is specified explicitly, we think user intend to do parallel processing.
I'll post a documentation after design is fixed.
--split implies that we are saving to multiple files. And it might also mean that do parallel processing.
So instead of relying on --split, we could probably create a new command line parameter. We want to do parallel processing but when it comes to saving vmcore we still want to write to single vmcore file. Parallel processing can help with faster filtering and faster compression of pages.
Also can't we take advatage of parallelism for compression and while writing compressed data write it to a single file. That way no special configuration will be required and makedumpfile should be able to fork as many threads as number of cpus, do the compression and write the output to a single file.
First, at least, current makedumpfile cannot do it. To do it, we need to use pthread; rigorously, it's not but doing it by fork() is harmful.
Historically, the reason why makedumpfile chose --split was that at that to avoid increasing initramfs by containing libc.so. (But now this is no longer a problem since we often include the commands that link libc.so in initramfs such as scp.)
Yep, now libc is part of initramfs so we should be able to use pthreads.
Also, splitting dump into multiple vmcores has another merit that it's possible to parallelize even I/O into multiple disks. This is necessay when we strongly need full dump.
I can understand need of --split in some cases. But that will be useful only in select corner cases.
If we enable writing to single file with multiple threads doing filtering and compression, this is going to be more useful, I think,
So, doing it is possible. It's easier to do by pthread. I assume the logic that multiple threads write compressed data into the same buffer and the thread that detects the buffer is full, flushes the buffer.
This sounds reasonable.
But makedumpfile now doesn't have the feature, we need to newly implement it.
I agree. This looks like a new feature. It would be great to have it though on large memory machines.
Thanks Vivek
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Thu, 27 Mar 2014 09:18:38 -0400
On Thu, Mar 27, 2014 at 08:04:55AM +0100, HATAYAMA Daisuke wrote:
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Wed, 26 Mar 2014 14:05:07 -0400
On Tue, Mar 25, 2014 at 08:08:48PM +0900, HATAYAMA, Daisuke wrote:
Hello,
This is an RFC patch intended to first review basic design of --split option support.
This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
Hi Hatayama,
Can you give some more details about how --split feature of makedumpfile works. I have never used it. Why should I split the file into multiple files? And how to get back original single file.
crash utility supports vmcores splitted by makedumpfile --split. The syntax is:
$ crash vmlinux vmcore-0 vmcore-1 ... vmcore-{N-1}
What's the advantage of that. I would rather have a way to take all these fragments and come up with a single file and pass that file to crash.
It can be by --reassemble option. But if memory is huge, this operation is costly in time and disk space.
[--reassemble]: Reassemble multiple DUMPFILEs, which are created by --split option, into one DUMPFILE. dumpfile1 and dumpfile2 are reassembled into dumpfile.
Anyway, saving multiple files and managing those files is not very convinient.
Multiple vmcores are not so a big problem if they are explained enough and users understant that. The current design discussed here assumes users specify --split option explicitly in kdump.conf, so they know multiple vmcores are generated. They never get suprized.
Also, I don't think we should be adding --split automatically. I want to stick to user specified core collector and options and not add things silently.
If user wants to take advantage of parallelism, they need to modify nr_cpus and they need to modify core_collector line also and we should document it properly.
The problem is that we now don't have a way to specify the number of parallelism in core_collector since we specify it in --split as the number of vmcore arguments.
We can do two things.
We can check of number of cpus available in second kernel in makedumpfile and makedumpfile can fork off threads accordingly.
Or we can create a new commandline arguments which specifies how many threads to fork off for compression. A user who will be modifying nr_cpus, can also modify this command line parameter.
I think we can in fact have both. First will be the default behavior which can be overridden with an command line option.
It seems you assume new feature...
How about this? We do parallel processing if
- in core_collector makedumpfile is specified with --split option, and
- nr_cpus is larger than 1.
Instead of checking for nr_cpus, just look into /sys or somewhere else and see how many processors are online. And fork off those many threads accordingly.
I see. It's just like the RFC patch.
i.e., if --split is specified explicitly, we think user intend to do parallel processing.
I'll post a documentation after design is fixed.
--split implies that we are saving to multiple files. And it might also mean that do parallel processing.
So instead of relying on --split, we could probably create a new command line parameter. We want to do parallel processing but when it comes to saving vmcore we still want to write to single vmcore file. Parallel processing can help with faster filtering and faster compression of pages.
Also can't we take advatage of parallelism for compression and while writing compressed data write it to a single file. That way no special configuration will be required and makedumpfile should be able to fork as many threads as number of cpus, do the compression and write the output to a single file.
First, at least, current makedumpfile cannot do it. To do it, we need to use pthread; rigorously, it's not but doing it by fork() is harmful.
Historically, the reason why makedumpfile chose --split was that at that to avoid increasing initramfs by containing libc.so. (But now this is no longer a problem since we often include the commands that link libc.so in initramfs such as scp.)
Yep, now libc is part of initramfs so we should be able to use pthreads.
Also, splitting dump into multiple vmcores has another merit that it's possible to parallelize even I/O into multiple disks. This is necessay when we strongly need full dump.
I can understand need of --split in some cases. But that will be useful only in select corner cases.
The usecase might be a mere corner case for you, it's important for us. For example, we sometimes want to use it to debug complicated bugs that relates to a wide range of kernel components, such as a bug relevant to a flow of I/O among qemu/KVM guests and hosts (and so we cannot filter out user-pace memory). In such case there's merit as a final resort even if it needs a lot of disk space and time.
If we enable writing to single file with multiple threads doing filtering and compression, this is going to be more useful, I think,
I of course understand it's useful, but we don't have now.
BTW, this is another topic, but if possible I want to extend kexec-tools to treat multiple disks and support --split option.
So, doing it is possible. It's easier to do by pthread. I assume the logic that multiple threads write compressed data into the same buffer and the thread that detects the buffer is full, flushes the buffer.
This sounds reasonable.
I investigated a little more and found that I have to investigate more how to manage buffer for compression, and in detail, how to divide processing into each thread. The design might not scale well due to lock contension on the buffer.
It seems to me that writing multiple vmcores is not only good for ease of implementation but also rather good as how to divide the processing.
Anyway, benchmark will be needed to discuss this topic in detail.
But makedumpfile now doesn't have the feature, we need to newly implement it.
I agree. This looks like a new feature. It would be great to have it though on large memory machines.
Thanks Vivek
== Thanks. HATAYAMA, Daisuke
On Fri, Mar 28, 2014 at 11:19:13AM +0100, HATAYAMA Daisuke wrote:
[..]
Multiple vmcores are not so a big problem if they are explained enough and users understant that. The current design discussed here assumes users specify --split option explicitly in kdump.conf, so they know multiple vmcores are generated. They never get suprized.
It is relatively harder to manage multiple files. And this will be justified only if there is a huge benefit in creating multiple files instead of one.
If you are trying to save files across different adaptors, it means bringing up additional hardware in second kernel. If adaptors are of different types, then different drivers are used and it contributes to increased unreliability of dump operation.
We don't have any support where one can specify to bring up multiple storage devices. So you must be carrying your own patches to make sure different devices can be brought up. All the code has been written keeping in mind that there will be single device to dump to and single "path" with-in device. Now kdump.conf syntax and backend implementation will get really complex if we try to support split files.
So there has to be really huge benefit to justify the need of supporting split file mode.
[..]
The problem is that we now don't have a way to specify the number of parallelism in core_collector since we specify it in --split as the number of vmcore arguments.
In the new mode where destination is single file, this new option can be used.
In fact even with split, one should be able to use this option. For example, if you are bringing up 6 cpus but writing 2 split files, then you could use 4 threads doing filtering and compression while 2 threads doing writing.
So --split kind of specifies IO level parallelism and only provides weak hint for cpu level parallelism.
We can do two things.
We can check of number of cpus available in second kernel in makedumpfile and makedumpfile can fork off threads accordingly.
Or we can create a new commandline arguments which specifies how many threads to fork off for compression. A user who will be modifying nr_cpus, can also modify this command line parameter.
I think we can in fact have both. First will be the default behavior which can be overridden with an command line option.
It seems you assume new feature...
Yep, I am thinking of a new feature where vmcore is saved to a single file but filtering and compression can happen in parallel depending on number of cpus available.
[..]
Also, splitting dump into multiple vmcores has another merit that it's possible to parallelize even I/O into multiple disks. This is necessay when we strongly need full dump.
I am curious in what cases do you need full dump? Do you enable it by default for your customers? If not, when do you recommend them to capture full dump instead of filtered one.
Capturing full dump on large multi tera byte machines is not practical. It takes a very long time and after saving dump, sending those tera byte files to support is a big headache.
I can understand need of --split in some cases. But that will be useful only in select corner cases.
The usecase might be a mere corner case for you, it's important for us. For example, we sometimes want to use it to debug complicated bugs that relates to a wide range of kernel components, such as a bug relevant to a flow of I/O among qemu/KVM guests and hosts (and so we cannot filter out user-pace memory). In such case there's merit as a final resort even if it needs a lot of disk space and time.
Ok, I get it. So nothing has crashed but if system is not performing well you will ask customer to dump full memory and send them for analysis so that you can traverse through the whole stack.
I am assuming that you are doing this to analyze performance issues? Otherwise if it is guest crash, then just guest dump should be sufficient and one does not have to take host full dump.
So do you enable it by default or you recommend it to specific customers based on their need.
If we enable writing to single file with multiple threads doing filtering and compression, this is going to be more useful, I think,
I of course understand it's useful, but we don't have now.
BTW, this is another topic, but if possible I want to extend kexec-tools to treat multiple disks and support --split option.
That's lot of work. In the code everywhere it is assumed that there is single device and single path. I really don't want to support all that complication till we have proven that it is a huge win for most of the people.
I think supporting intermediate mode of saving to single file while harnessing cpu power for fitltering and compression will be much easier. (If this gives us reasonable speedup).
[..]
I investigated a little more and found that I have to investigate more how to manage buffer for compression, and in detail, how to divide processing into each thread. The design might not scale well due to lock contension on the buffer.
It seems to me that writing multiple vmcores is not only good for ease of implementation but also rather good as how to divide the processing.
It is only good if one is saving full dump. Majority of people might not even require that. They probably want fully filtered dump but want to do it fast on multi tera byte machine.
Anyway, benchmark will be needed to discuss this topic in detail.
Agreed. First we need to implement that new mode and see if it gives us good performance or not.
Thanks Vivek
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Fri, 28 Mar 2014 10:28:07 -0400
On Fri, Mar 28, 2014 at 11:19:13AM +0100, HATAYAMA Daisuke wrote:
[..]
Multiple vmcores are not so a big problem if they are explained enough and users understant that. The current design discussed here assumes users specify --split option explicitly in kdump.conf, so they know multiple vmcores are generated. They never get suprized.
It is relatively harder to manage multiple files. And this will be justified only if there is a huge benefit in creating multiple files instead of one.
If you are trying to save files across different adaptors, it means bringing up additional hardware in second kernel. If adaptors are of different types, then different drivers are used and it contributes to increased unreliability of dump operation.
This is right in crash case.
We don't have any support where one can specify to bring up multiple storage devices. So you must be carrying your own patches to make sure different devices can be brought up. All the code has been written keeping in mind that there will be single device to dump to and single "path" with-in device. Now kdump.conf syntax and backend implementation will get really complex if we try to support split files.
So there has to be really huge benefit to justify the need of supporting split file mode.
[..]
The problem is that we now don't have a way to specify the number of parallelism in core_collector since we specify it in --split as the number of vmcore arguments.
In the new mode where destination is single file, this new option can be used.
In fact even with split, one should be able to use this option. For example, if you are bringing up 6 cpus but writing 2 split files, then you could use 4 threads doing filtering and compression while 2 threads doing writing.
So --split kind of specifies IO level parallelism and only provides weak hint for cpu level parallelism.
We can do two things.
We can check of number of cpus available in second kernel in makedumpfile and makedumpfile can fork off threads accordingly.
Or we can create a new commandline arguments which specifies how many threads to fork off for compression. A user who will be modifying nr_cpus, can also modify this command line parameter.
I think we can in fact have both. First will be the default behavior which can be overridden with an command line option.
It seems you assume new feature...
Yep, I am thinking of a new feature where vmcore is saved to a single file but filtering and compression can happen in parallel depending on number of cpus available.
[..]
Also, splitting dump into multiple vmcores has another merit that it's possible to parallelize even I/O into multiple disks. This is necessay when we strongly need full dump.
I am curious in what cases do you need full dump? Do you enable it by default for your customers? If not, when do you recommend them to capture full dump instead of filtered one.
Capturing full dump on large multi tera byte machines is not practical. It takes a very long time and after saving dump, sending those tera byte files to support is a big headache.
I can understand need of --split in some cases. But that will be useful only in select corner cases.
The usecase might be a mere corner case for you, it's important for us. For example, we sometimes want to use it to debug complicated bugs that relates to a wide range of kernel components, such as a bug relevant to a flow of I/O among qemu/KVM guests and hosts (and so we cannot filter out user-pace memory). In such case there's merit as a final resort even if it needs a lot of disk space and time.
Ok, I get it. So nothing has crashed but if system is not performing well you will ask customer to dump full memory and send them for analysis so that you can traverse through the whole stack.
I am assuming that you are doing this to analyze performance issues? Otherwise if it is guest crash, then just guest dump should be sufficient and one does not have to take host full dump.
In this usecase, we want to see bahaviour among guests and host. If guest crashes with some I/O error due to device failure, it's helpful to see qemu side. Conversely, if host results in abnormal states with too high I/O pressure, it might be helpful to see what guests are doing.
There are a variety of debugging features for qemu/KVM such as tracing, but bugs on I/O among host and guests tend to be very complicated and such feature tend to be not helpful enough. Then, our experience has showend that it's still most efficient that skillful engineer debugs vmcores manually.
So do you enable it by default or you recommend it to specific customers based on their need.
It's OF COURSE for specific customers only. I don't think it possible to offer the setting to every customer. Default should be partial dump. The full dump setting is for developpment and a final resort for debugging.
If we enable writing to single file with multiple threads doing filtering and compression, this is going to be more useful, I think,
I of course understand it's useful, but we don't have now.
BTW, this is another topic, but if possible I want to extend kexec-tools to treat multiple disks and support --split option.
That's lot of work. In the code everywhere it is assumed that there is single device and single path. I really don't want to support all that complication till we have proven that it is a huge win for most of the people.
I think supporting intermediate mode of saving to single file while harnessing cpu power for fitltering and compression will be much easier. (If this gives us reasonable speedup).
Yes, I also think the design needs to be changed for multiple disks... So I'm not thinking optimistically you reacts positive attitude to the feature..
[..]
I investigated a little more and found that I have to investigate more how to manage buffer for compression, and in detail, how to divide processing into each thread. The design might not scale well due to lock contension on the buffer.
It seems to me that writing multiple vmcores is not only good for ease of implementation but also rather good as how to divide the processing.
It is only good if one is saving full dump. Majority of people might not even require that. They probably want fully filtered dump but want to do it fast on multi tera byte machine.
Anyway, benchmark will be needed to discuss this topic in detail.
Agreed. First we need to implement that new mode and see if it gives us good performance or not.
I've already investigated the feature in processing level but I need some more time to make a patch set because supporting pthread need to change makedumpfile to a certain amount. Please wait for now.
== Thanks. HATAYAMA, Daisuke
On Mon, Mar 31, 2014 at 07:21:20PM +0900, HATAYAMA Daisuke wrote:
[..]
Yes, I also think the design needs to be changed for multiple disks...
So I'm not thinking optimistically you reacts positive attitude to the feature..
I am not sure what does this mean?
[..]
Anyway, benchmark will be needed to discuss this topic in detail.
Agreed. First we need to implement that new mode and see if it gives us good performance or not.
I've already investigated the feature in processing level but I need some more time to make a patch set because supporting pthread need to change makedumpfile to a certain amount. Please wait for now.
Sure not a problem. No rush. I was just brain storming to figure out what features are already present and what new features are required. So atleast we agree that dumping to single file while having multiple threads for filtering and compression is a new feature.
Till we get that feature, I guess customers can achieve that parallelism by dumping to multiple files (to same adaptor) and then reassembling those files manually later.
Thanks Vivek
Sorry for delayed response.
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Mon, 31 Mar 2014 08:47:12 -0400
On Mon, Mar 31, 2014 at 07:21:20PM +0900, HATAYAMA Daisuke wrote:
[..]
Yes, I also think the design needs to be changed for multiple disks...
So I'm not thinking optimistically you reacts positive attitude to the feature..
I am not sure what does this mean?
[..]
Anyway, benchmark will be needed to discuss this topic in detail.
Agreed. First we need to implement that new mode and see if it gives us good performance or not.
I've already investigated the feature in processing level but I need some more time to make a patch set because supporting pthread need to change makedumpfile to a certain amount. Please wait for now.
Sure not a problem. No rush. I was just brain storming to figure out what features are already present and what new features are required. So atleast we agree that dumping to single file while having multiple threads for filtering and compression is a new feature.
Till we get that feature, I guess customers can achieve that parallelism by dumping to multiple files (to same adaptor) and then reassembling those files manually later.
Thanks Vivek
So, I'll post a patch to support --split patch as a means of supporting parallelism. Do you agree with this direction?
In the patch, --split is no logner automatically inserted. User should specify it in core_collector explicitly, and then kdump script detects it and appends a multiple vmcore argumetns accordingly. The number of the multiple arguments is the number of cpus running in the 2nd kernel, i.e., the number specified in nr_cpus.
That is, if in /etc/kdump.conf core_collector is specified as
core_collector makedumpfile --split -l -d 31
and the number of online cpus on the 2nd kernel is more than 2, say 3 here, then, the following command is executed:
$ makedumpfile --split -l -d 31 /proc/vmcore vmcore-0-incomplete vmcore-1-incomplete vmcore-2-incomplete
If makedumpfile --split is specified but the number of online cpus on the 2nd kernel is 1, then --split is removed as makedumpfile fails while warning that more vmcore arugments are needed.
I'll post a patch for how-to file to document this properly.
Thanks. HATAYAMA, Daisuke
On Wed, Apr 09, 2014 at 03:46:22PM +0900, HATAYAMA Daisuke wrote:
Hi Hatayama,
So, I'll post a patch to support --split patch as a means of supporting parallelism. Do you agree with this direction?
--split is already an existing parameter which very clearly means that split a file into multiple files. I don't know how --split is implemented, but I am assuming it creates as many threads as there are split files and these threads will do filtering, compression and IO.
Usage of --split requires that we *have* to specify multiple files as output files.
How about creating a new option say --parallel-dump <nr-threads>. This option can take number of threads to create for filetering and compression and possibly for doing IO also (to single file).
--parallel-dump will take only single output file as argument. So dump is always saved in single file.
We can also make argument <nr-threads> optional. If user specifies number of threads, those many threads will be launched. Otherwise makedumpfile can detect how many cpus are online and launch those many threads.
For example.
# makedumpfile --parallel-dump 2 /proc/vmcore /var/crash/saved-vmcore
In this case 2 threads will be launched which will coordinate internally on filtering and compression and possibly on doing IO too.
# maedumpfile --parallel-dump /proc/vmcore /var/cras/saved-vmcore
In this case makedumpfile will determine how many cpus are online and launch as many threads (one thread for each cpu). This is close to "-j" option of "make" utility. Difference is that "-j" without option launches as many threads as possible.
In the patch, --split is no logner automatically inserted. User should specify it in core_collector explicitly, and then kdump script detects it and appends a multiple vmcore argumetns accordingly. The number of the multiple arguments is the number of cpus running in the 2nd kernel, i.e., the number specified in nr_cpus.
That is, if in /etc/kdump.conf core_collector is specified as
core_collector makedumpfile --split -l -d 31
and the number of online cpus on the 2nd kernel is more than 2, say 3 here, then, the following command is executed:
$ makedumpfile --split -l -d 31 /proc/vmcore vmcore-0-incomplete vmcore-1-incomplete vmcore-2-incomplete
I want to avoid modifying core_collector internally by script. I want to honor core_collector as specified in /etc/kdump.conf file. That way user knows exactly what core collector will be used and user can configure it accordingly.
So if user wants dump to be saved into multiple files, they need to explicitly edit /etc/kdump.conf and specify --split as well as name of split files.
Right now we don't seem to have a way to specify destination vmcore file name. If need be, we can possibly create a new option to specify file names and then --split option should work.
If makedumpfile --split is specified but the number of online cpus on the 2nd kernel is 1, then --split is removed as makedumpfile fails while warning that more vmcore arugments are needed.
Again, I don't want scripts to play with user specified core_collector. I want to use it as specified by user.
So if user wants --split dump, they need to configrue it that way.
Once parallel dump is implemented, I think we can possibly change default core collector to include --paralle-dump and that way it will automatically launch multiple threads if there are more than 1 cpu in second kernel booted.
core_collector makedumpfile -l --message-level 1 -d 31 --parallel-dump
Thanks Vivek
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Thu, 10 Apr 2014 10:40:03 -0400
On Wed, Apr 09, 2014 at 03:46:22PM +0900, HATAYAMA Daisuke wrote:
Hi Hatayama,
So, I'll post a patch to support --split patch as a means of supporting parallelism. Do you agree with this direction?
--split is already an existing parameter which very clearly means that split a file into multiple files. I don't know how --split is implemented, but I am assuming it creates as many threads as there are split files and these threads will do filtering, compression and IO.
Usage of --split requires that we *have* to specify multiple files as output files.
How about creating a new option say --parallel-dump <nr-threads>. This option can take number of threads to create for filetering and compression and possibly for doing IO also (to single file).
--parallel-dump will take only single output file as argument. So dump is always saved in single file.
We can also make argument <nr-threads> optional. If user specifies number of threads, those many threads will be launched. Otherwise makedumpfile can detect how many cpus are online and launch those many threads.
For example.
# makedumpfile --parallel-dump 2 /proc/vmcore /var/crash/saved-vmcore
In this case 2 threads will be launched which will coordinate internally on filtering and compression and possibly on doing IO too.
# maedumpfile --parallel-dump /proc/vmcore /var/cras/saved-vmcore
In this case makedumpfile will determine how many cpus are online and launch as many threads (one thread for each cpu). This is close to "-j" option of "make" utility. Difference is that "-j" without option launches as many threads as possible.
This interface is almost the same as what I think.
In the patch, --split is no logner automatically inserted. User should specify it in core_collector explicitly, and then kdump script detects it and appends a multiple vmcore argumetns accordingly. The number of the multiple arguments is the number of cpus running in the 2nd kernel, i.e., the number specified in nr_cpus.
That is, if in /etc/kdump.conf core_collector is specified as
core_collector makedumpfile --split -l -d 31
and the number of online cpus on the 2nd kernel is more than 2, say 3 here, then, the following command is executed:
$ makedumpfile --split -l -d 31 /proc/vmcore vmcore-0-incomplete vmcore-1-incomplete vmcore-2-incomplete
I want to avoid modifying core_collector internally by script. I want to honor core_collector as specified in /etc/kdump.conf file. That way user knows exactly what core collector will be used and user can configure it accordingly.
So if user wants dump to be saved into multiple files, they need to explicitly edit /etc/kdump.conf and specify --split as well as name of split files.
Right now we don't seem to have a way to specify destination vmcore file name. If need be, we can possibly create a new option to specify file names and then --split option should work.
How about the following directive?
# core_collector_vmcore_arguments # # - This directive allows you to specify vmcore file # names to makedumpfile. Default is vmcore. # # A specific usecase of this directive is to specify # multiple vmcore file names for --split option. # See /sbin/makedumpfile --help for --split option. # # This directive is ignored if core_collector is not # used.
Usage of this directive is like this:
core_collector_vmcore_arguments vmcore-0 vmcore-1 vmcore-2
Users know how many arguments they should specify here from documentation.
If makedumpfile --split is specified but the number of online cpus on the 2nd kernel is 1, then --split is removed as makedumpfile fails while warning that more vmcore arugments are needed.
Again, I don't want scripts to play with user specified core_collector. I want to use it as specified by user.
So if user wants --split dump, they need to configrue it that way.
Once parallel dump is implemented, I think we can possibly change default core collector to include --paralle-dump and that way it will automatically launch multiple threads if there are more than 1 cpu in second kernel booted.
core_collector makedumpfile -l --message-level 1 -d 31 --parallel-dump
Thanks Vivek
Default must be no problem because 1 thread case is the same as the current one.
Thanks. HATAYAMA, Daisuke
On Fri, Apr 11, 2014 at 10:47:14AM +0900, HATAYAMA Daisuke wrote:
From: Vivek Goyal vgoyal@redhat.com Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem Date: Thu, 10 Apr 2014 10:40:03 -0400
On Wed, Apr 09, 2014 at 03:46:22PM +0900, HATAYAMA Daisuke wrote:
Hi Hatayama,
So, I'll post a patch to support --split patch as a means of supporting parallelism. Do you agree with this direction?
--split is already an existing parameter which very clearly means that split a file into multiple files. I don't know how --split is implemented, but I am assuming it creates as many threads as there are split files and these threads will do filtering, compression and IO.
Usage of --split requires that we *have* to specify multiple files as output files.
How about creating a new option say --parallel-dump <nr-threads>. This option can take number of threads to create for filetering and compression and possibly for doing IO also (to single file).
--parallel-dump will take only single output file as argument. So dump is always saved in single file.
We can also make argument <nr-threads> optional. If user specifies number of threads, those many threads will be launched. Otherwise makedumpfile can detect how many cpus are online and launch those many threads.
For example.
# makedumpfile --parallel-dump 2 /proc/vmcore /var/crash/saved-vmcore
In this case 2 threads will be launched which will coordinate internally on filtering and compression and possibly on doing IO too.
# maedumpfile --parallel-dump /proc/vmcore /var/cras/saved-vmcore
In this case makedumpfile will determine how many cpus are online and launch as many threads (one thread for each cpu). This is close to "-j" option of "make" utility. Difference is that "-j" without option launches as many threads as possible.
This interface is almost the same as what I think.
So I have a question. How many threads does --split create? Does it create 1 thread right now to dump to multiple devices?
If yes, we could possibly extend the semantics of --split instead of introducing new option --parallel-dump.
#makedumpfile --split [<nr-threads>] /proc/vmcore <dump-files>
Now one can specify number of threads to launch to do parallel processing to save to dump files.
--split should allow even single dump file to be specified.
if nr-threads is not specified, then --split will launch as many threads as thre are cpus available to take max advantage of parallelism.
#makedupfile --split /proc/vmcore dumpfile1 dumpfile2 .... - Launch as many threads as there are cpus.
#makedupfile --split N /proc/vmcore dumpfile1 dumpfile2 .... - Launch N threads.
#makedupfile --split N /proc/vmcore dumpfile - Launch N threads and save dump to 1 file.
#makedupfile --split /proc/vmcore dumpfile - Launch as many threads as therea re cpus and save dump to 1 file.
[..]
Right now we don't seem to have a way to specify destination vmcore file name. If need be, we can possibly create a new option to specify file names and then --split option should work.
How about the following directive?
# core_collector_vmcore_arguments # # - This directive allows you to specify vmcore file # names to makedumpfile. Default is vmcore. # # A specific usecase of this directive is to specify # multiple vmcore file names for --split option. # See /sbin/makedumpfile --help for --split option. # # This directive is ignored if core_collector is not # used.
Usage of this directive is like this:
core_collector_vmcore_arguments vmcore-0 vmcore-1 vmcore-2
Users know how many arguments they should specify here from documentation.
May be something like
vmcore_name vmcore-0 vmcore-1 vmcore-2
Other option could be that we start asking for absolute paths of vmcore files.
absolute_path /var/crash/vmcore-0 /var/crash/vmcore-1 .....
This way one would automatically decide the underlying disk based on path and then last string in path will suggest file path.
But I am not sure, I will have to think more about it.
We can first make the case of 1 dump file work and then look into extending kdump.conf to support multiple files. Above will turn into a more complicated project. With lots of error checks to make sure all paths lead to same device. We really don't have a way to support multiple devices properly.
Thanks Vivek