After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros: - Simple and unobtrusive to either MOM or VDSM - Clean API with no duplication or layering - Maintain flexibility to tighten integration in the future
Cons: - Momd runs as root (like supervdsm) - If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
Based on my experience while working on Plan A and Plan B, I feel that this approach is the best way to start. Once MOM and VSDM are commingled on the node, we can begin the interesting work of providing the actual dynamic policy to manage the system.
Sample code for Plan A follows:
commit 4464c07849cfd921d0e3446961c5b6471dd360d9 Author: Adam Litke agl@us.ibm.com Date: Mon Nov 28 08:46:22 2011 -0600
Integrate with MOM at the system/packaging level
diff --git a/vdsm.spec.in b/vdsm.spec.in index cf12428..14588c6 100644 --- a/vdsm.spec.in +++ b/vdsm.spec.in @@ -151,6 +151,13 @@ rm -rf %{buildroot} /usr/sbin/saslpasswd2 -p -a libvirt vdsm@rhevh < \ /etc/pki/vdsm/keys/libvirt_password
+# install the mom config file and restart momd +if [ -f %{_sysconfdir}/momd.conf ]; then + mv -n %{_sysconfdir}/momd.conf %{_sysconfdir}/momd.conf.vdsmsave +fi +cp %{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/momd.conf +/sbin/service momd condrestart > /dev/null 2>&1 + %preun if [ "$1" -eq 0 ] then @@ -176,6 +183,12 @@ _EOF
/usr/sbin/saslpasswd2 -p -a libvirt -d vdsm@rhevh
+ # Restore old MOM configuration + if [ -f %{_sysconfdir}/momd.conf.vdsmsave ]; then + mv %{_sysconfdir}/momd.conf.vdsmsave %{_sysconfdir}/momd.conf + /sbin/service momd condrestart > /dev/null 2>&1 + fi + %if 0%{?rhel} if /sbin/initctl status libvirtd >/dev/null 2>/dev/null ; then /sbin/initctl stop libvirtd >/dev/null 2>/dev/null @@ -246,6 +259,8 @@ fi %config(noreplace) %{_sysconfdir}/%{vdsm_name}/logger.conf %config(noreplace) %{_sysconfdir}/logrotate.d/vdsm %config(noreplace) %{_sysconfdir}/rwtab.d/vdsm +%{_sysconfdir}/%{vdsm_name}/mom.policy +%{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/sudoers.d/50_vdsm %{_sysconfdir}/cron.hourly/vdsm-logrotate %{_sysconfdir}/cron.d/vdsm-libvirt-logrotate diff --git a/vdsm/Makefile.am b/vdsm/Makefile.am index 7da9cad..a96a323 100644 --- a/vdsm/Makefile.am +++ b/vdsm/Makefile.am @@ -83,7 +83,9 @@ EXTRA_DIST = \ vdsm-restore-net-config.in \ vdsm.rwtab \ vdsm-sosplugin.py.in \ - vdsm-store-net-config.in + vdsm-store-net-config.in \ + mom.policy \ + momd.conf
# Reference: # http://www.gnu.org/software/automake/manual/html_node/Scripts.html @@ -115,7 +117,7 @@ install-data-hook: install-data-local: install-data-init install-data-logger \ install-data-rwtab install-data-logrotate \ install-data-sudoers install-data-sosplugin \ - install-data-libvirtpass + install-data-libvirtpass install-data-mom $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/keys $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/certs $(MKDIR_P) $(DESTDIR)$(vdsmlogdir) @@ -128,7 +130,7 @@ install-data-local: install-data-init install-data-logger \ uninstall-local: uninstall-data-init uninstall-data-logger \ uninstall-data-rwtab uninstall-data-logrotate \ uninstall-data-sudoers uninstall-data-sosplugin \ - uninstall-data-libvirtpass + uninstall-data-libvirtpass uninstall-data-mom
install-data-init: $(MKDIR_P) $(DESTDIR)$(sysconfdir)/rc.d/init.d @@ -191,3 +193,13 @@ install-data-sosplugin:
uninstall-data-sosplugin: $(RM) $(DESTDIR)$(pythondir)/sos/plugins/vdsm.py + +install-data-mom: + $(INSTALL_DATA) mom.policy \ + $(DESTDIR)$(vdsmconfdir)/mom.policy + $(INSTALL_DATA) momd.conf \ + $(DESTDIR)$(vdsmconfdir)/momd.conf + +uninstall-data-mom: + $(RM) $(DESTDIR)$(vdsmconfdir)/mom.policy + $(RM) $(DESTDIR)$(vdsmconfdir)/momd.conf diff --git a/vdsm/mom.policy b/vdsm/mom.policy new file mode 100644 index 0000000..cb31526 --- /dev/null +++ b/vdsm/mom.policy @@ -0,0 +1,155 @@ +### KSM ######################################################################## + +### Constants +# The number of pages to add when increasing pages_to_scan +(defvar ksm_pages_boost 300) + +# The number of pages to subtract when decreasing pages_to_scan +(defvar ksm_pages_decay -50) + +# The min and max number of pages to scan per cycle when ksm is activated +(defvar ksm_npages_min 64) +(defvar ksm_npages_max 1250) + +# The number of ms to sleep between ksmd scans for a 16GB system. Systems with +# more memory will sleep less, while smaller systems will sleep more. +(defvar ksm_sleep_ms_baseline 10) + +# A virtualization host tends to use most of its memory for running guests but +# a certain amount is reserved for the host OS, non virtualization-related work, +# and as a failsafe. When free memory (including memory used for caches) drops +# below this parcentage of total memory, the host is deemed under pressure. and +# KSM will be started to try and free up some memory. +(defvar ksm_free_percent 0.20) + +### Helper functions +(def change_npages (delta) +{ + (defvar newval (+ Host.ksm_pages_to_scan delta)) + (if (> newval ksm_npages_max) (set newval ksm_npages_max) 1) + (if (< newval ksm_npages_min) (set newval ksm_npages_min) 0) + (Host.Control "ksm_pages_to_scan" newval) +}) + +### Main Script +# Methodology: Since running KSM does incur some overhead, try to run it only +# when necessary. If the amount of committed KSM shareable memory is high or if +# free memory is low, enable KSM to try to increase free memory. Large memory +# machines should scan more often than small ones. Likewise, machines under +# memory pressure should scan more aggressively then more idle machines. + +(defvar ksm_pressure_threshold (* Host.mem_available ksm_free_percent)) +(defvar ksm_committed Host.ksm_shareable) + +(if (and (< (+ ksm_pressure_threshold ksm_committed) Host.mem_available) + (> (Host.StatAvg "mem_free") ksm_pressure_threshold)) + (Host.Control "ksm_run" 0) + { # else + (Host.Control "ksm_run" 1) + (Host.Control "ksm_sleep_millisecs" + (/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available)) + (if (< (Host.StatAvg "mem_free") ksm_pressure_threshold) + (change_npages ksm_pages_boost) + (change_npages ksm_pages_decay)) + } +) +### Auto-Balloon ############################################################### + +### Constants +# If the percentage of host free memory drops below this value +# then we will consider the host to be under memory pressure +(defvar pressure_threshold 0.20) + +# If pressure threshold drops below this level, then the pressure +# is critical and more aggressive ballooning will be employed. +(defvar pressure_critical 0.05) + +# This is the minimum percentage of free memory that an unconstrained +# guest would like to maintain +(defvar min_guest_free_percent 0.20) + +# Don't change a guest's memory by more than this percent of total memory +(defvar max_balloon_change_percent 0.05) + +# Only ballooning operations that change the balloon by this percentage +# of current guest memory should be undertaken to avoid overhead +(defvar min_balloon_change_percent 0.0025) + +### Helper functions +# Check if the proposed new balloon value is a large-enough +# change to justify a balloon operation. This prevents us from +# introducing overhead through lots of small ballooning operations +(def change_big_enough (guest new_val) +{ + (if (> (abs (- new_val guest.libvirt_curmem)) + (* min_balloon_change_percent guest.libvirt_curmem)) + 1 0) +}) + +(def shrink_guest (guest) +{ + # Determine the degree of host memory pressure + (if (<= host_free_percent pressure_critical) + # Pressure is critical: + # Force guest to swap by making free memory negative + (defvar guest_free_percent (+ -0.05 host_free_percent)) + # Normal pressure situation + # Scale the guest free memory back according to host pressure + (defvar guest_free_percent (* min_guest_free_percent + (/ host_free_percent pressure_threshold)))) + + # Given current conditions, determine the ideal guest memory size + (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem") + (guest.StatAvg "mem_unused"))) + (defvar balloon_min (+ guest_used_mem + (* guest_free_percent guest.libvirt_maxmem))) + # But do not change it too fast + (defvar balloon_size (* guest.libvirt_curmem + (- 1 max_balloon_change_percent))) + (if (< balloon_size balloon_min) + (set balloon_size balloon_min) + 0) + # Set the new target for the BalloonController. Only set it if the + # value makes sense and is a large enough change to be worth it. + (if (and (<= balloon_size guest.libvirt_maxmem) + (change_big_enough guest balloon_size)) + (guest.Control "balloon_target" balloon_size) + 0) +}) + +(def grow_guest (guest) +{ + # There is only work to do if the guest is ballooned + (if (< guest.libvirt_curmem guest.libvirt_maxmem) { + # Minimally, increase so the guest has its desired free memory + (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem") + (guest.StatAvg "mem_unused"))) + (defvar balloon_min (+ guest_used_mem (* min_guest_free_percent + guest.libvirt_maxmem))) + # Otherwise, increase according to the max balloon change + (defvar balloon_size (* guest.libvirt_curmem + (+ 1 max_balloon_change_percent))) + + # Determine the new target for the BalloonController. Only set + # if the value is a large enough for the change to be worth it. + (if (> balloon_size guest.libvirt_maxmem) + (set balloon_size guest.libvirt_maxmem) 0) + (if (< balloon_size balloon_min) + (set balloon_size balloon_min) 0) + (if (change_big_enough guest balloon_size) + (guest.Control "balloon_target" balloon_size) 0) + } 0) +}) + +### Main script +# Methodology: The goal is to shrink all guests fairly and by an amount +# scaled to the level of host memory pressure. If the host is under +# severe pressure, scale back more aggressively. We don't yet handle +# symptoms of over-ballooning guests or try to balloon idle guests more +# aggressively. When the host is not under memory pressure, slowly +# deflate the balloons. + +(defvar host_free_percent (/ (Host.StatAvg "mem_free") Host.mem_available)) +(if (< host_free_percent pressure_threshold) + (with Guests guest (shrink_guest guest)) + (with Guests guest (grow_guest guest))) diff --git a/vdsm/momd.conf b/vdsm/momd.conf new file mode 100644 index 0000000..4d09f44 --- /dev/null +++ b/vdsm/momd.conf @@ -0,0 +1,83 @@ +### DO NOT REMOVE THIS COMMENT -- MOM Configuration for VDSM ### + +[main] +# The wake up frequency of the main daemon (in seconds) +main-loop-interval: 5 + +# The data collection interval for host statistics (in seconds) +host-monitor-interval: 5 + +# The data collection interval for guest statistics (in seconds) +guest-monitor-interval: 5 + +# The wake up frequency of the guest manager (in seconds). The guest manager +# sets up monitoring and control for newly-created guests and cleans up after +# deleted guests. +guest-manager-interval: 5 + +# The wake up frequency of the policy engine (in seconds). During each +# interval the policy engine evaluates the policy and passes the results +# to each enabled controller plugin. +policy-engine-interval: 10 + +# A comma-separated list of Controller plugins to enable +controllers: Balloon, KSM + +# Sets the maximum number of statistic samples to keep for the purpose of +# calculating moving averages. +sample-history-length: 10 + +# The URI to use when connecting to this host's libvirt interface. If this is +# left blank then the system default URI is used. +libvirt-hypervisor-uri: qemu:///system + +# Set this to an existing, writable directory to enable plotting. For each +# invocation of the program a subdirectory momplot-NNN will be created where NNN +# is a sequence number. Within that directory, tab-delimited data files will be +# created and updated with all data generated by the configured Collectors. +plot-dir: + +# Activate the RPC server on the designated port (-1 to disable). RPC is +# disabled by default until authentication is added to the protocol. +rpc-port: -1 + +# At startup, load a policy from the given file. If empty, no policy is loaded +policy: /etc/vdsm/mom.policy + +[logging] +# Set the destination for program log messages. This can be either 'stdio' or +# a filename. When the log goes to a file, log rotation will be done +# automatically. +log: /var/log/momd.log + +# Set the logging verbosity level. The following levels are supported: +# 5 or debug: Debugging messages +# 4 or info: Detailed messages concerning normal program operation +# 3 or warn: Warning messages (program operation may be impacted) +# 2 or error: Errors that severely impact program operation +# 1 or critical: Emergency conditions +# This option can be specified by number or name. +verbosity: info + +## The following two variables are used only when logging is directed to a file. +# Set the maximum size of a log file (in bytes) before it is rotated. +max-bytes: 2097152 +# Set the maximum number of rotated logs to retain. +backup-count: 5 + +[host] +# A comma-separated list of Collector plugins to use for Host data collection. +collectors: HostMemory, HostKSM + +[guest] +# A comma-separated list of Collector plugins to use for Guest data collection. +collectors: GuestQemuProc, GuestLibvirt + +# Collector-specific configuration for GuestQemuAgent +[Collector: GuestQemuAgent] +# Set the base path where the host-side sockets for guest communication can be +# found. The GuestQemuAgent Collector will try to open files with the following +# names: +# <socket_path>/va-<guest-name>-virtio.sock - for virtio serial +# <socket_path>/va-<guest-name>-isa.sock - for isa serial +socket_path: /var/lib/libvirt/qemu
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
IMHO, having a Momd that is a separate process from VDSM is pretty desirable from the security POV. It is much more practical to write a strict SELinux security policy for a fairly self-contained set of functionality like Momd, than to write a policy for the very broad functionality of VDSM as a whole.
Also, if Momd is separate, and talking to libvirt at all, then we will also be able to take advantage of libvirt's future RBAC controls, to strictly limit what changes Momd can do down to the fine level of individual guests or groups of guests. eg you'll be able to write SELinux policy that allows a process of type 'momd_t' to only perform libvirt operations X & Y, against guests labelled "svirt_BLAH_t" while allowing operations X, Y & Z against guest labelled with "svirt_FOO_t".
Regards, Daniel
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
Very good questions -- thanks for raising them. I've listed it as a con because others have raised it as a concern. MOM runs as root for a few reasons: to connect to the qemu:///system libvirt URI, to connect to guest agent sockets, and (most difficult to mitigate) to reconfigure KSM via sysfs. As MOMs Controllers expand in functionality the need to root access will increase.
The way vdsm works around this problem is that it spawns a 'supervdsm' process that has root privileges. To make MOM work from within a VDSM thread, all root-requiring operations must be routed through the supervdsm API. In my opinion, this is far from optimal and causes the MOM Controller API to become unnecessarily layered. It also peppers the vdsm codebase with bits of privileged code. Over time, this will become increasingly difficult to manage.
IMHO, having a Momd that is a separate process from VDSM is pretty desirable from the security POV. It is much more practical to write a strict SELinux security policy for a fairly self-contained set of functionality like Momd, than to write a policy for the very broad functionality of VDSM as a whole.
I agree with this...
Also, if Momd is separate, and talking to libvirt at all, then we will also be able to take advantage of libvirt's future RBAC controls, to strictly limit what changes Momd can do down to the fine level of individual guests or groups of guests. eg you'll be able to write SELinux policy that allows a process of type 'momd_t' to only perform libvirt operations X & Y, against guests labelled "svirt_BLAH_t" while allowing operations X, Y & Z against guest labelled with "svirt_FOO_t".
... and this. Additionally, when running standalone, MOM can focus on being a system tuning mechanism. Some tuning and stats collection will necessarily go through the VDSM API but some will not need to. I think Plan A represents the UNIX model of orchestrating small components together to perform complex tasks.
On Tue, Nov 29, 2011 at 11:18:42AM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
Very good questions -- thanks for raising them. I've listed it as a con because others have raised it as a concern. MOM runs as root for a few reasons: to connect to the qemu:///system libvirt URI, to connect to guest agent sockets, and (most difficult to mitigate) to reconfigure KSM via sysfs. As MOMs Controllers expand in functionality the need to root access will increase.
FWIW, connecitng to qemu:///system does not require root. Traditionally VDSM configures SASL, so all that would be required is to create a SASL username and password for Momd. Alternatively if the default policykit auth is in effect for libvirtd, the mom RPM could simply drop a policy file into the right location to allow processes under the 'momd' UNIX user to connect.
I don't have a clear answer for the KSM thing & other tunables momd might need to deal with. There is perhaps a gap here for a system tunable daemon to provide an RPC service over DBus, which momd then uses to change sysfs tunables. Or something...
The way vdsm works around this problem is that it spawns a 'supervdsm' process that has root privileges. To make MOM work from within a VDSM thread, all root-requiring operations must be routed through the supervdsm API. In my opinion, this is far from optimal and causes the MOM Controller API to become unnecessarily layered. It also peppers the vdsm codebase with bits of privileged code. Over time, this will become increasingly difficult to manage.
Yeah, I think that is really sub-optimal from the security POV, because both VDSM & supervdsm cover a very broad range of functionality so it will be hard to offer a meaningful level of security policy acround them, as compared to a dedicated Momd daemon.
IMHO, having a Momd that is a separate process from VDSM is pretty desirable from the security POV. It is much more practical to write a strict SELinux security policy for a fairly self-contained set of functionality like Momd, than to write a policy for the very broad functionality of VDSM as a whole.
I agree with this...
Also, if Momd is separate, and talking to libvirt at all, then we will also be able to take advantage of libvirt's future RBAC controls, to strictly limit what changes Momd can do down to the fine level of individual guests or groups of guests. eg you'll be able to write SELinux policy that allows a process of type 'momd_t' to only perform libvirt operations X & Y, against guests labelled "svirt_BLAH_t" while allowing operations X, Y & Z against guest labelled with "svirt_FOO_t".
... and this. Additionally, when running standalone, MOM can focus on being a system tuning mechanism. Some tuning and stats collection will necessarily go through the VDSM API but some will not need to. I think Plan A represents the UNIX model of orchestrating small components together to perform complex tasks.
-- Adam Litke agl@us.ibm.com IBM Linux Technology Center
On Tue, Nov 29, 2011 at 05:44:23PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 11:18:42AM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
Very good questions -- thanks for raising them. I've listed it as a con because others have raised it as a concern. MOM runs as root for a few reasons: to connect to the qemu:///system libvirt URI, to connect to guest agent sockets, and (most difficult to mitigate) to reconfigure KSM via sysfs. As MOMs Controllers expand in functionality the need to root access will increase.
FWIW, connecitng to qemu:///system does not require root. Traditionally VDSM configures SASL, so all that would be required is to create a SASL username and password for Momd. Alternatively if the default policykit auth is in effect for libvirtd, the mom RPM could simply drop a policy file into the right location to allow processes under the 'momd' UNIX user to connect.
Yep. I've already got a patch for MOM to use SASL for libvirt connections.
I don't have a clear answer for the KSM thing & other tunables momd might need to deal with. There is perhaps a gap here for a system tunable daemon to provide an RPC service over DBus, which momd then uses to change sysfs tunables. Or something...
At the end of the day We have to let something be root, why not MOMd? It is already slim and narrowly focused on stats collection and response. Introducing yet another daemon just adds complexity and requires us to change more components each time we want to do something. A "generic" tunable daemon would likely not be part of oVirt and would not adhere to our release cadence; thus decreasing our ability to add new functionality.
The way vdsm works around this problem is that it spawns a 'supervdsm' process that has root privileges. To make MOM work from within a VDSM thread, all root-requiring operations must be routed through the supervdsm API. In my opinion, this is far from optimal and causes the MOM Controller API to become unnecessarily layered. It also peppers the vdsm codebase with bits of privileged code. Over time, this will become increasingly difficult to manage.
Yeah, I think that is really sub-optimal from the security POV, because both VDSM & supervdsm cover a very broad range of functionality so it will be hard to offer a meaningful level of security policy acround them, as compared to a dedicated Momd daemon.
IMHO, having a Momd that is a separate process from VDSM is pretty desirable from the security POV. It is much more practical to write a strict SELinux security policy for a fairly self-contained set of functionality like Momd, than to write a policy for the very broad functionality of VDSM as a whole.
I agree with this...
Also, if Momd is separate, and talking to libvirt at all, then we will also be able to take advantage of libvirt's future RBAC controls, to strictly limit what changes Momd can do down to the fine level of individual guests or groups of guests. eg you'll be able to write SELinux policy that allows a process of type 'momd_t' to only perform libvirt operations X & Y, against guests labelled "svirt_BLAH_t" while allowing operations X, Y & Z against guest labelled with "svirt_FOO_t".
... and this. Additionally, when running standalone, MOM can focus on being a system tuning mechanism. Some tuning and stats collection will necessarily go through the VDSM API but some will not need to. I think Plan A represents the UNIX model of orchestrating small components together to perform complex tasks.
-- Adam Litke agl@us.ibm.com IBM Linux Technology Center
-- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
On Tue, Nov 29, 2011 at 01:11:18PM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 05:44:23PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 11:18:42AM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
Very good questions -- thanks for raising them. I've listed it as a con because others have raised it as a concern. MOM runs as root for a few reasons: to connect to the qemu:///system libvirt URI, to connect to guest agent sockets, and (most difficult to mitigate) to reconfigure KSM via sysfs. As MOMs Controllers expand in functionality the need to root access will increase.
FWIW, connecitng to qemu:///system does not require root. Traditionally VDSM configures SASL, so all that would be required is to create a SASL username and password for Momd. Alternatively if the default policykit auth is in effect for libvirtd, the mom RPM could simply drop a policy file into the right location to allow processes under the 'momd' UNIX user to connect.
Yep. I've already got a patch for MOM to use SASL for libvirt connections.
I don't have a clear answer for the KSM thing & other tunables momd might need to deal with. There is perhaps a gap here for a system tunable daemon to provide an RPC service over DBus, which momd then uses to change sysfs tunables. Or something...
At the end of the day We have to let something be root, why not MOMd? It is already slim and narrowly focused on stats collection and response. Introducing yet another daemon just adds complexity and requires us to change more components each time we want to do something. A "generic" tunable daemon would likely not be part of oVirt and would not adhere to our release cadence; thus decreasing our ability to add new functionality.
I guess my desire is that the bit that runs as root should do as little as possible, ideally simply be a conduit for reading / writing a well defined set of resources. Momd has pluggable policy which can, to some degree, be considered untrusted compared to the rest of Momd. So if it were possible to have the actual policy processing part of momd running separately from the bit of momd that actually interfaces to the kernel, that could be a useful security barrier.
That all said, I think that confining momd with an effective SELinux security policy is more important, than running as non-root. So if we had to choose, I'd focus on the SELinux policy side of things.
Regards, Daniel
On Tue, Nov 29, 2011 at 07:21:08PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 01:11:18PM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 05:44:23PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 11:18:42AM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Agreement is boring, but I prefer this approach, too. Vdsm is begging for a modular breakup as it is; if we import mom into Vdsm, in no time we would be tempted to avoid Vdsm API and create new dependencies.
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
I curious about the 'runs as root' bit. By listing it as a Con here, you imply that if it were a VDSM thread, it could run as non-root ? What is it that Momd has todo that requires root when run outside of VDSM, yet is not a problem when inside VDSM ? IOW can it not be made to run as 'momd' user/group when standalone ?
Very good questions -- thanks for raising them. I've listed it as a con because others have raised it as a concern. MOM runs as root for a few reasons: to connect to the qemu:///system libvirt URI, to connect to guest agent sockets, and (most difficult to mitigate) to reconfigure KSM via sysfs. As MOMs Controllers expand in functionality the need to root access will increase.
FWIW, connecitng to qemu:///system does not require root. Traditionally VDSM configures SASL, so all that would be required is to create a SASL username and password for Momd. Alternatively if the default policykit auth is in effect for libvirtd, the mom RPM could simply drop a policy file into the right location to allow processes under the 'momd' UNIX user to connect.
Yep. I've already got a patch for MOM to use SASL for libvirt connections.
I don't have a clear answer for the KSM thing & other tunables momd might need to deal with. There is perhaps a gap here for a system tunable daemon to provide an RPC service over DBus, which momd then uses to change sysfs tunables. Or something...
At the end of the day We have to let something be root, why not MOMd? It is already slim and narrowly focused on stats collection and response. Introducing yet another daemon just adds complexity and requires us to change more components each time we want to do something. A "generic" tunable daemon would likely not be part of oVirt and would not adhere to our release cadence; thus decreasing our ability to add new functionality.
It does not have to be generic - the standard way to solve this problem is for momd to walk the supervdsm route (but do it properly): start as root, fork, setuid(mom). The root-owned parent can expose a very limitted set of functions (tune ksm, deflate guest X) to its more complex child.
One comment regarding the suggested patch: I do not understand what is so special about Vdsm that it should carry its own momd policy. It is not like momd has a gazillion of competing policies... I'd rather have momd deployed with all of the handful of policies we can device. Vdsm's task would be only to configure the default policy for momd to use on boot.
Dan.
On Fri, Dec 02, 2011 at 02:54:17AM +0200, Dan Kenigsberg wrote:
On Tue, Nov 29, 2011 at 07:21:08PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 01:11:18PM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 05:44:23PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 11:18:42AM -0600, Adam Litke wrote:
On Tue, Nov 29, 2011 at 04:50:19PM +0000, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 10:29:41AM -0600, Adam Litke wrote: > After discussing MOM / VDSM integration at length, two different strategies have > emerged. I will call them Plan A and Plan B: > > Plan A: MOM integration at the OS/Packaging level > Plan B: MOM integration as a new VDSM thread > > This RFC is about Plan A. I will start another thread to discuss Plan B once I > have properly prototyped the idea in code. > > Integration VDSM and MOM at the OS level is by far the simpler and least > intrusive option. As you can see from the included patch, the changes to vdsm > are very limited. In this model, VDSM interacts with MOM in the same way as it > uses libvirt. Upon installation, VDSM installs its own MOM configuration file > and restarts the MOM daemon (which continues to exist as an independent > system-level daemon). Once restarted, MOM will load its policy from the VDSM > configuration directory. > > Pros: > - Simple and unobtrusive to either MOM or VDSM > - Clean API with no duplication or layering > - Maintain flexibility to tighten integration in the future
Agreement is boring, but I prefer this approach, too. Vdsm is begging for a modular breakup as it is; if we import mom into Vdsm, in no time we would be tempted to avoid Vdsm API and create new dependencies.
Heh -- Boring? Maybe. Welcomed by me? Definitely! Thanks for your comments.
One comment regarding the suggested patch: I do not understand what is so special about Vdsm that it should carry its own momd policy. It is not like momd has a gazillion of competing policies... I'd rather have momd deployed with all of the handful of policies we can device. Vdsm's task would be only to configure the default policy for momd to use on boot.
Sure, that's fine. Actually for this patch I just copied the existing MOM policy to demonstrate the idea of vdsm shipping a policy. If we want it to remain with MOM that is fine by me.
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
I can go on this way.
VDSM is not libvirt, it has policies today, there is no need to split it up into two or more.
For completeness, I do think that there is a place for MOM like functionality within the OS. But I think for the best of ovirt project goals, it would be the most efficient to host all in VDSM and keep our actions VM specific.
Thanks, Dor
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros:
- Simple and unobtrusive to either MOM or VDSM
- Clean API with no duplication or layering
- Maintain flexibility to tighten integration in the future
Cons:
- Momd runs as root (like supervdsm)
- If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
Based on my experience while working on Plan A and Plan B, I feel that this approach is the best way to start. Once MOM and VSDM are commingled on the node, we can begin the interesting work of providing the actual dynamic policy to manage the system.
Sample code for Plan A follows:
commit 4464c07849cfd921d0e3446961c5b6471dd360d9 Author: Adam Litkeagl@us.ibm.com Date: Mon Nov 28 08:46:22 2011 -0600
Integrate with MOM at the system/packaging level
diff --git a/vdsm.spec.in b/vdsm.spec.in index cf12428..14588c6 100644 --- a/vdsm.spec.in +++ b/vdsm.spec.in @@ -151,6 +151,13 @@ rm -rf %{buildroot} /usr/sbin/saslpasswd2 -p -a libvirt vdsm@rhevh< \ /etc/pki/vdsm/keys/libvirt_password
+# install the mom config file and restart momd +if [ -f %{_sysconfdir}/momd.conf ]; then
- mv -n %{_sysconfdir}/momd.conf %{_sysconfdir}/momd.conf.vdsmsave
+fi +cp %{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/momd.conf +/sbin/service momd condrestart> /dev/null 2>&1
- %preun if [ "$1" -eq 0 ] then
@@ -176,6 +183,12 @@ _EOF
/usr/sbin/saslpasswd2 -p -a libvirt -d vdsm@rhevh
- # Restore old MOM configuration
- if [ -f %{_sysconfdir}/momd.conf.vdsmsave ]; then
mv %{_sysconfdir}/momd.conf.vdsmsave %{_sysconfdir}/momd.conf
/sbin/service momd condrestart> /dev/null 2>&1
- fi
- %if 0%{?rhel} if /sbin/initctl status libvirtd>/dev/null 2>/dev/null ; then /sbin/initctl stop libvirtd>/dev/null 2>/dev/null
@@ -246,6 +259,8 @@ fi %config(noreplace) %{_sysconfdir}/%{vdsm_name}/logger.conf %config(noreplace) %{_sysconfdir}/logrotate.d/vdsm %config(noreplace) %{_sysconfdir}/rwtab.d/vdsm +%{_sysconfdir}/%{vdsm_name}/mom.policy +%{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/sudoers.d/50_vdsm %{_sysconfdir}/cron.hourly/vdsm-logrotate %{_sysconfdir}/cron.d/vdsm-libvirt-logrotate diff --git a/vdsm/Makefile.am b/vdsm/Makefile.am index 7da9cad..a96a323 100644 --- a/vdsm/Makefile.am +++ b/vdsm/Makefile.am @@ -83,7 +83,9 @@ EXTRA_DIST = \ vdsm-restore-net-config.in \ vdsm.rwtab \ vdsm-sosplugin.py.in \
- vdsm-store-net-config.in
vdsm-store-net-config.in \
mom.policy \
momd.conf
# Reference: # http://www.gnu.org/software/automake/manual/html_node/Scripts.html
@@ -115,7 +117,7 @@ install-data-hook: install-data-local: install-data-init install-data-logger \ install-data-rwtab install-data-logrotate \ install-data-sudoers install-data-sosplugin \
install-data-libvirtpass
$(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/keys $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/certs $(MKDIR_P) $(DESTDIR)$(vdsmlogdir)install-data-libvirtpass install-data-mom
@@ -128,7 +130,7 @@ install-data-local: install-data-init install-data-logger \ uninstall-local: uninstall-data-init uninstall-data-logger \ uninstall-data-rwtab uninstall-data-logrotate \ uninstall-data-sudoers uninstall-data-sosplugin \
uninstall-data-libvirtpass
uninstall-data-libvirtpass uninstall-data-mom
install-data-init: $(MKDIR_P) $(DESTDIR)$(sysconfdir)/rc.d/init.d
@@ -191,3 +193,13 @@ install-data-sosplugin:
uninstall-data-sosplugin: $(RM) $(DESTDIR)$(pythondir)/sos/plugins/vdsm.py
+install-data-mom:
- $(INSTALL_DATA) mom.policy \
$(DESTDIR)$(vdsmconfdir)/mom.policy
- $(INSTALL_DATA) momd.conf \
$(DESTDIR)$(vdsmconfdir)/momd.conf
+uninstall-data-mom:
- $(RM) $(DESTDIR)$(vdsmconfdir)/mom.policy
- $(RM) $(DESTDIR)$(vdsmconfdir)/momd.conf
diff --git a/vdsm/mom.policy b/vdsm/mom.policy new file mode 100644 index 0000000..cb31526 --- /dev/null +++ b/vdsm/mom.policy @@ -0,0 +1,155 @@ +### KSM ########################################################################
+### Constants +# The number of pages to add when increasing pages_to_scan +(defvar ksm_pages_boost 300)
+# The number of pages to subtract when decreasing pages_to_scan +(defvar ksm_pages_decay -50)
+# The min and max number of pages to scan per cycle when ksm is activated +(defvar ksm_npages_min 64) +(defvar ksm_npages_max 1250)
+# The number of ms to sleep between ksmd scans for a 16GB system. Systems with +# more memory will sleep less, while smaller systems will sleep more. +(defvar ksm_sleep_ms_baseline 10)
+# A virtualization host tends to use most of its memory for running guests but +# a certain amount is reserved for the host OS, non virtualization-related work, +# and as a failsafe. When free memory (including memory used for caches) drops +# below this parcentage of total memory, the host is deemed under pressure. and +# KSM will be started to try and free up some memory. +(defvar ksm_free_percent 0.20)
+### Helper functions +(def change_npages (delta) +{
- (defvar newval (+ Host.ksm_pages_to_scan delta))
- (if (> newval ksm_npages_max) (set newval ksm_npages_max) 1)
- (if (< newval ksm_npages_min) (set newval ksm_npages_min) 0)
- (Host.Control "ksm_pages_to_scan" newval)
+})
+### Main Script +# Methodology: Since running KSM does incur some overhead, try to run it only +# when necessary. If the amount of committed KSM shareable memory is high or if +# free memory is low, enable KSM to try to increase free memory. Large memory +# machines should scan more often than small ones. Likewise, machines under +# memory pressure should scan more aggressively then more idle machines.
+(defvar ksm_pressure_threshold (* Host.mem_available ksm_free_percent)) +(defvar ksm_committed Host.ksm_shareable)
+(if (and (< (+ ksm_pressure_threshold ksm_committed) Host.mem_available)
(> (Host.StatAvg "mem_free") ksm_pressure_threshold))
- (Host.Control "ksm_run" 0)
- { # else
(Host.Control "ksm_run" 1)
(Host.Control "ksm_sleep_millisecs"
(/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available))
(if (< (Host.StatAvg "mem_free") ksm_pressure_threshold)
(change_npages ksm_pages_boost)
(change_npages ksm_pages_decay))
- }
+) +### Auto-Balloon ###############################################################
+### Constants +# If the percentage of host free memory drops below this value +# then we will consider the host to be under memory pressure +(defvar pressure_threshold 0.20)
+# If pressure threshold drops below this level, then the pressure +# is critical and more aggressive ballooning will be employed. +(defvar pressure_critical 0.05)
+# This is the minimum percentage of free memory that an unconstrained +# guest would like to maintain +(defvar min_guest_free_percent 0.20)
+# Don't change a guest's memory by more than this percent of total memory +(defvar max_balloon_change_percent 0.05)
+# Only ballooning operations that change the balloon by this percentage +# of current guest memory should be undertaken to avoid overhead +(defvar min_balloon_change_percent 0.0025)
+### Helper functions +# Check if the proposed new balloon value is a large-enough +# change to justify a balloon operation. This prevents us from +# introducing overhead through lots of small ballooning operations +(def change_big_enough (guest new_val) +{
- (if (> (abs (- new_val guest.libvirt_curmem))
(* min_balloon_change_percent guest.libvirt_curmem))
1 0)
+})
+(def shrink_guest (guest) +{
- # Determine the degree of host memory pressure
- (if (<= host_free_percent pressure_critical)
# Pressure is critical:
# Force guest to swap by making free memory negative
(defvar guest_free_percent (+ -0.05 host_free_percent))
# Normal pressure situation
# Scale the guest free memory back according to host pressure
(defvar guest_free_percent (* min_guest_free_percent
(/ host_free_percent pressure_threshold))))
- # Given current conditions, determine the ideal guest memory size
- (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem")
(guest.StatAvg "mem_unused")))
- (defvar balloon_min (+ guest_used_mem
(* guest_free_percent guest.libvirt_maxmem)))
- # But do not change it too fast
- (defvar balloon_size (* guest.libvirt_curmem
(- 1 max_balloon_change_percent)))
- (if (< balloon_size balloon_min)
(set balloon_size balloon_min)
0)
- # Set the new target for the BalloonController. Only set it if the
- # value makes sense and is a large enough change to be worth it.
- (if (and (<= balloon_size guest.libvirt_maxmem)
(change_big_enough guest balloon_size))
(guest.Control "balloon_target" balloon_size)
0)
+})
+(def grow_guest (guest) +{
- # There is only work to do if the guest is ballooned
- (if (< guest.libvirt_curmem guest.libvirt_maxmem) {
# Minimally, increase so the guest has its desired free memory
(defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem")
(guest.StatAvg "mem_unused")))
(defvar balloon_min (+ guest_used_mem (* min_guest_free_percent
guest.libvirt_maxmem)))
# Otherwise, increase according to the max balloon change
(defvar balloon_size (* guest.libvirt_curmem
(+ 1 max_balloon_change_percent)))
# Determine the new target for the BalloonController. Only set
# if the value is a large enough for the change to be worth it.
(if (> balloon_size guest.libvirt_maxmem)
(set balloon_size guest.libvirt_maxmem) 0)
(if (< balloon_size balloon_min)
(set balloon_size balloon_min) 0)
(if (change_big_enough guest balloon_size)
(guest.Control "balloon_target" balloon_size) 0)
- } 0)
+})
+### Main script +# Methodology: The goal is to shrink all guests fairly and by an amount +# scaled to the level of host memory pressure. If the host is under +# severe pressure, scale back more aggressively. We don't yet handle +# symptoms of over-ballooning guests or try to balloon idle guests more +# aggressively. When the host is not under memory pressure, slowly +# deflate the balloons.
+(defvar host_free_percent (/ (Host.StatAvg "mem_free") Host.mem_available)) +(if (< host_free_percent pressure_threshold)
- (with Guests guest (shrink_guest guest))
- (with Guests guest (grow_guest guest)))
diff --git a/vdsm/momd.conf b/vdsm/momd.conf new file mode 100644 index 0000000..4d09f44 --- /dev/null +++ b/vdsm/momd.conf @@ -0,0 +1,83 @@ +### DO NOT REMOVE THIS COMMENT -- MOM Configuration for VDSM ###
+[main] +# The wake up frequency of the main daemon (in seconds) +main-loop-interval: 5
+# The data collection interval for host statistics (in seconds) +host-monitor-interval: 5
+# The data collection interval for guest statistics (in seconds) +guest-monitor-interval: 5
+# The wake up frequency of the guest manager (in seconds). The guest manager +# sets up monitoring and control for newly-created guests and cleans up after +# deleted guests. +guest-manager-interval: 5
+# The wake up frequency of the policy engine (in seconds). During each +# interval the policy engine evaluates the policy and passes the results +# to each enabled controller plugin. +policy-engine-interval: 10
+# A comma-separated list of Controller plugins to enable +controllers: Balloon, KSM
+# Sets the maximum number of statistic samples to keep for the purpose of +# calculating moving averages. +sample-history-length: 10
+# The URI to use when connecting to this host's libvirt interface. If this is +# left blank then the system default URI is used. +libvirt-hypervisor-uri: qemu:///system
+# Set this to an existing, writable directory to enable plotting. For each +# invocation of the program a subdirectory momplot-NNN will be created where NNN +# is a sequence number. Within that directory, tab-delimited data files will be +# created and updated with all data generated by the configured Collectors. +plot-dir:
+# Activate the RPC server on the designated port (-1 to disable). RPC is +# disabled by default until authentication is added to the protocol. +rpc-port: -1
+# At startup, load a policy from the given file. If empty, no policy is loaded +policy: /etc/vdsm/mom.policy
+[logging] +# Set the destination for program log messages. This can be either 'stdio' or +# a filename. When the log goes to a file, log rotation will be done +# automatically. +log: /var/log/momd.log
+# Set the logging verbosity level. The following levels are supported: +# 5 or debug: Debugging messages +# 4 or info: Detailed messages concerning normal program operation +# 3 or warn: Warning messages (program operation may be impacted) +# 2 or error: Errors that severely impact program operation +# 1 or critical: Emergency conditions +# This option can be specified by number or name. +verbosity: info
+## The following two variables are used only when logging is directed to a file. +# Set the maximum size of a log file (in bytes) before it is rotated. +max-bytes: 2097152 +# Set the maximum number of rotated logs to retain. +backup-count: 5
+[host] +# A comma-separated list of Collector plugins to use for Host data collection. +collectors: HostMemory, HostKSM
+[guest] +# A comma-separated list of Collector plugins to use for Guest data collection. +collectors: GuestQemuProc, GuestLibvirt
+# Collector-specific configuration for GuestQemuAgent +[Collector: GuestQemuAgent] +# Set the base path where the host-side sockets for guest communication can be +# found. The GuestQemuAgent Collector will try to open files with the following +# names: +#<socket_path>/va-<guest-name>-virtio.sock - for virtio serial +#<socket_path>/va-<guest-name>-isa.sock - for isa serial +socket_path: /var/lib/libvirt/qemu
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Dan.
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
As I said before, I do think that there is a place for MOM like functionality within the OS. But I think for the *best of ovirt project goals*, it would be the most efficient to host all in VDSM and keep our actions VM specific.
I'm not that saying this to keep Dan.k interested :) Dor
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
As I said before, I do think that there is a place for MOM like functionality within the OS. But I think for the *best of ovirt project goals*, it would be the most efficient to host all in VDSM and keep our actions VM specific.
I'm not that saying this to keep Dan.k interested :) Dor
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
As I said before, I do think that there is a place for MOM like functionality within the OS. But I think for the *best of ovirt project goals*, it would be the most efficient to host all in VDSM and keep our actions VM specific.
I'm not that saying this to keep Dan.k interested :) Dor
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
<snip>
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
here I tend to agree with Dor; as I expressed on this list before, I do not find the suggested MOM setPolicy() api as very useful, and never found peace with its policy definition language. I am not very keen to expose it to the world. Maybe I'm narrow-minded, imagining only the ovirt-engine use-case.
Dan.
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote: >After discussing MOM / VDSM integration at length, two different strategies have >emerged. I will call them Plan A and Plan B: > >Plan A: MOM integration at the OS/Packaging level >Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
We let's give this plan a try. Based on other planning and dicussions, vdsm is going to gain quite a few new threads: QMF agent threads, REST API server threads, MOM threads. A good place to start poking might be to ensure that we can handle the additional complexity that comes with these extra threads.
----- Original Message -----
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote:
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote: >On 11/29/2011 06:29 PM, Adam Litke wrote: >>After discussing MOM / VDSM integration at length, two >>different strategies have >>emerged. I will call them Plan A and Plan B: >> >>Plan A: MOM integration at the OS/Packaging level >>Plan B: MOM integration as a new VDSM thread > >I think a form of plan B is more appropriate: > >In general we can look at MOM vs VDSM just like micro kernel >vs linux >kernel approach. MOM can be independent project but then it >will need to >expose much more apis for VDSM and wise verse. > >For example, take live migration, there is no point of MOM >balloon a >guest while it is migrating. So either you ignore that which >is bad or >now need to listen to VDSM events on VM migration. > >Think about hot plug vcpu/pci-device to a VM - if before MOM >used some >SLA for the VM, now it will need to change to cope w/ the new >resources, >again more api/events for that. > >Another thing - all of the settings for per VM >KSM/THP/Swap/Balloon - >all will need to propagate from the vdsm api towards MOM.
Indeed, disagreement is much more interesting! I think that the information that Vdsm is expected to provide to momd is quite limitted and slowly-changing. We may be better off defining an API for Vdsm to notify momd of VM state changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
We let's give this plan a try. Based on other planning and dicussions, vdsm is going to gain quite a few new threads: QMF agent threads, REST API server threads, MOM threads. A good place to start poking might be to ensure that we can handle the additional complexity that comes with these extra threads.
Which plan?
In general, I think that Mom APIs should be kept separate from vdsm and vdsm would interact with a public mom API. On the other hand, I do not think mom should be a daemon as that quickly leads to it keeping a lot of state that needs to be synchronized between vdsm and mom. You gave libvirt as an example, but it is a very bad one as the architecture there is sub par at most. Suffice it to see that the #loc when vdsm communicated directly with qemu was smaller than it is now going through libvirt without additional direct functionality (there is a lot that we gained using libvirt in the form of svirt and other configurations libvirt does, but this should not have affected the integration at all). The reason for the code bloat is having to synchronize 2 stateful daemons. I believe that were libvirt actually a lib things would have been much simpler and more straight-forward. I think the same applies to mom.
Because mom is a generic policy engine then it stands to reason that any mom-client would want the following:
1. send mom client's current state and rules/policy 2. mom would compute required state changes 3. run client specific changes.
Doing number 3 would require either integrating with the client's public API or just pass back to the client a list of required changes and have it execute them. I don't think that using the public API in this scenario is the right thing to do.
I think that the current mom implementation mixes the client and the engine functionality. If you separate current mom into 2 projects, the rule engine and the rest (data collectors and execution of actions) you will find that the engine needs no special privileges at all. The current mapping problem is that you're trying to use the client you already wrote with vdsm which is wrong, as vdsm already has collectors of its own and knows the problem specific actions that it can run. The confusion stems from the fact that the problem domain is the same (throttling VMs).
-- Adam Litke agl@us.ibm.com IBM Linux Technology Center
vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
On 12/08/2011 12:33 PM, Ayal Baron wrote:
----- Original Message -----
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote:
On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg wrote: > On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote: >> On 11/29/2011 06:29 PM, Adam Litke wrote: >>> After discussing MOM / VDSM integration at length, two >>> different strategies have >>> emerged. I will call them Plan A and Plan B: >>> >>> Plan A: MOM integration at the OS/Packaging level >>> Plan B: MOM integration as a new VDSM thread >> >> I think a form of plan B is more appropriate: >> >> In general we can look at MOM vs VDSM just like micro kernel >> vs linux >> kernel approach. MOM can be independent project but then it >> will need to >> expose much more apis for VDSM and wise verse. >> >> For example, take live migration, there is no point of MOM >> balloon a >> guest while it is migrating. So either you ignore that which >> is bad or >> now need to listen to VDSM events on VM migration. >> >> Think about hot plug vcpu/pci-device to a VM - if before MOM >> used some >> SLA for the VM, now it will need to change to cope w/ the new >> resources, >> again more api/events for that. >> >> Another thing - all of the settings for per VM >> KSM/THP/Swap/Balloon - >> all will need to propagate from the vdsm api towards MOM. > > Indeed, disagreement is much more interesting! I think that the > information > that Vdsm is expected to provide to momd is quite limitted and > slowly-changing. > We may be better off defining an API for Vdsm to notify momd of > VM state > changes, than to entangle mom within Vdsm.
Yep. We will need to write a MOM GuestVDSM Collector in order to gather statistics from vdsm guests. This Collector could also fetch guest events using the vdsm API.
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
We let's give this plan a try. Based on other planning and dicussions, vdsm is going to gain quite a few new threads: QMF agent threads, REST API server threads, MOM threads. A good place to start poking might be to ensure that we can handle the additional complexity that comes with these extra threads.
Which plan?
A plan of first hosting MOM inside VDSM w/o committing to support public APIs. After definition/stabilization period it will be possible/sensible to spin off MOM for independence.
AFAIK, Dan and Adam are on board this plan, I can't understand from the email what's your opinion about it.
It doesn't matter how small you might keep MOM, it will take time to define the right APIs and to commit for long term support. If miraculously one might manage to do this really fast, then no problem at all. Reading previous posts, it seems like VDSM will get benefit of that too for its internal APIs
In general, I think that Mom APIs should be kept separate from vdsm and vdsm would interact with a public mom API. On the other hand, I do not think mom should be a daemon as that quickly leads to it keeping a lot of state that needs to be synchronized between vdsm and mom. You gave libvirt as an example, but it is a very bad one as the architecture there is sub par at most. Suffice it to see that the #loc when vdsm communicated directly with qemu was smaller than it is now going through libvirt without additional direct functionality (there is a lot that we gained using libvirt in the form of svirt and other configurations libvirt does, but this should not have affected the integration at all). The reason for the code bloat is having to synchronize 2 stateful daemons. I believe that were libvirt actually a lib things would have been much simpler and more straight-forward. I think the same applies to mom.
Because mom is a generic policy engine then it stands to reason that any mom-client would want the following:
- send mom client's current state and rules/policy
- mom would compute required state changes
- run client specific changes.
Doing number 3 would require either integrating with the client's public API or just pass back to the client a list of required changes and have it execute them. I don't think that using the public API in this scenario is the right thing to do.
I think that the current mom implementation mixes the client and the engine functionality. If you separate current mom into 2 projects, the rule engine and the rest (data collectors and execution of actions) you will find that the engine needs no special privileges at all. The current mapping problem is that you're trying to use the client you already wrote with vdsm which is wrong, as vdsm already has collectors of its own and knows the problem specific actions that it can run. The confusion stems from the fact that the problem domain is the same (throttling VMs).
-- Adam Litkeagl@us.ibm.com IBM Linux Technology Center
vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
----- Original Message -----
On 12/08/2011 12:33 PM, Ayal Baron wrote:
----- Original Message -----
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote:
On 12/02/2011 04:15 PM, Adam Litke wrote: > On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg > wrote: >> On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote: >>> On 11/29/2011 06:29 PM, Adam Litke wrote: >>>> After discussing MOM / VDSM integration at length, two >>>> different strategies have >>>> emerged. I will call them Plan A and Plan B: >>>> >>>> Plan A: MOM integration at the OS/Packaging level >>>> Plan B: MOM integration as a new VDSM thread >>> >>> I think a form of plan B is more appropriate: >>> >>> In general we can look at MOM vs VDSM just like micro kernel >>> vs linux >>> kernel approach. MOM can be independent project but then it >>> will need to >>> expose much more apis for VDSM and wise verse. >>> >>> For example, take live migration, there is no point of MOM >>> balloon a >>> guest while it is migrating. So either you ignore that which >>> is bad or >>> now need to listen to VDSM events on VM migration. >>> >>> Think about hot plug vcpu/pci-device to a VM - if before MOM >>> used some >>> SLA for the VM, now it will need to change to cope w/ the >>> new >>> resources, >>> again more api/events for that. >>> >>> Another thing - all of the settings for per VM >>> KSM/THP/Swap/Balloon - >>> all will need to propagate from the vdsm api towards MOM. >> >> Indeed, disagreement is much more interesting! I think that >> the >> information >> that Vdsm is expected to provide to momd is quite limitted >> and >> slowly-changing. >> We may be better off defining an API for Vdsm to notify momd >> of >> VM state >> changes, than to entangle mom within Vdsm. > > Yep. We will need to write a MOM GuestVDSM Collector in order > to gather > statistics from vdsm guests. This Collector could also fetch > guest events using > the vdsm API. >
All these APIs are nice and required regardless if vdsm hosts MOM or MOM is independent. Still, there is a *huge* difference in terms of development speed and overhead by committing to external APIs.
Correct me if I'm wrong but the only value of keeping MOM separate is that it will be used as a general purpose policy tool for the OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
We let's give this plan a try. Based on other planning and dicussions, vdsm is going to gain quite a few new threads: QMF agent threads, REST API server threads, MOM threads. A good place to start poking might be to ensure that we can handle the additional complexity that comes with these extra threads.
Which plan?
A plan of first hosting MOM inside VDSM w/o committing to support public APIs. After definition/stabilization period it will be possible/sensible to spin off MOM for independence.
AFAIK, Dan and Adam are on board this plan, I can't understand from the email what's your opinion about it.
It doesn't matter how small you might keep MOM, it will take time to define the right APIs and to commit for long term support. If miraculously one might manage to do this really fast, then no problem at all. Reading previous posts, it seems like VDSM will get benefit of that too for its internal APIs
we can build it as a library from the get go, that would immediately make sure we're passing things the right way and it wouldn't mean that mom has to commit to a stable API. My take on it is that we shouldn't integrate with yet another daemon competing on the same resources (VMs) and that full integration as part of vdsm is wrong as well.
In general, I think that Mom APIs should be kept separate from vdsm and vdsm would interact with a public mom API. On the other hand, I do not think mom should be a daemon as that quickly leads to it keeping a lot of state that needs to be synchronized between vdsm and mom. You gave libvirt as an example, but it is a very bad one as the architecture there is sub par at most. Suffice it to see that the #loc when vdsm communicated directly with qemu was smaller than it is now going through libvirt without additional direct functionality (there is a lot that we gained using libvirt in the form of svirt and other configurations libvirt does, but this should not have affected the integration at all). The reason for the code bloat is having to synchronize 2 stateful daemons. I believe that were libvirt actually a lib things would have been much simpler and more straight-forward. I think the same applies to mom.
Because mom is a generic policy engine then it stands to reason that any mom-client would want the following:
- send mom client's current state and rules/policy
- mom would compute required state changes
- run client specific changes.
Doing number 3 would require either integrating with the client's public API or just pass back to the client a list of required changes and have it execute them. I don't think that using the public API in this scenario is the right thing to do.
I think that the current mom implementation mixes the client and the engine functionality. If you separate current mom into 2 projects, the rule engine and the rest (data collectors and execution of actions) you will find that the engine needs no special privileges at all. The current mapping problem is that you're trying to use the client you already wrote with vdsm which is wrong, as vdsm already has collectors of its own and knows the problem specific actions that it can run. The confusion stems from the fact that the problem domain is the same (throttling VMs).
-- Adam Litkeagl@us.ibm.com IBM Linux Technology Center
vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
On Thu, Dec 08, 2011 at 10:04:45AM -0500, Ayal Baron wrote:
----- Original Message -----
On 12/08/2011 12:33 PM, Ayal Baron wrote:
----- Original Message -----
On Tue, Dec 06, 2011 at 12:42:24PM +0200, Dor Laor wrote:
On 12/05/2011 03:55 PM, Adam Litke wrote:
On Sun, Dec 04, 2011 at 11:07:43PM +0200, Dor Laor wrote: > On 12/02/2011 04:15 PM, Adam Litke wrote: >> On Fri, Dec 02, 2011 at 03:08:24AM +0200, Dan Kenigsberg >> wrote: >>> On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote: >>>> On 11/29/2011 06:29 PM, Adam Litke wrote: >>>>> After discussing MOM / VDSM integration at length, two >>>>> different strategies have >>>>> emerged. I will call them Plan A and Plan B: >>>>> >>>>> Plan A: MOM integration at the OS/Packaging level >>>>> Plan B: MOM integration as a new VDSM thread >>>> >>>> I think a form of plan B is more appropriate: >>>> >>>> In general we can look at MOM vs VDSM just like micro kernel >>>> vs linux >>>> kernel approach. MOM can be independent project but then it >>>> will need to >>>> expose much more apis for VDSM and wise verse. >>>> >>>> For example, take live migration, there is no point of MOM >>>> balloon a >>>> guest while it is migrating. So either you ignore that which >>>> is bad or >>>> now need to listen to VDSM events on VM migration. >>>> >>>> Think about hot plug vcpu/pci-device to a VM - if before MOM >>>> used some >>>> SLA for the VM, now it will need to change to cope w/ the >>>> new >>>> resources, >>>> again more api/events for that. >>>> >>>> Another thing - all of the settings for per VM >>>> KSM/THP/Swap/Balloon - >>>> all will need to propagate from the vdsm api towards MOM. >>> >>> Indeed, disagreement is much more interesting! I think that >>> the >>> information >>> that Vdsm is expected to provide to momd is quite limitted >>> and >>> slowly-changing. >>> We may be better off defining an API for Vdsm to notify momd >>> of >>> VM state >>> changes, than to entangle mom within Vdsm. >> >> Yep. We will need to write a MOM GuestVDSM Collector in order >> to gather >> statistics from vdsm guests. This Collector could also fetch >> guest events using >> the vdsm API. >> > > All these APIs are nice and required regardless if vdsm hosts > MOM > or > MOM is independent. Still, there is a *huge* difference in > terms > of > development speed and overhead by committing to external APIs. > > Correct me if I'm wrong but the only value of keeping MOM > separate > is that it will be used as a general purpose policy tool for > the > OS.
This is one advantage, but cetainly not the only one. More importantly, as pointed out by Dan K. and Dan B., keeping it separate will encourage modularization which is greatly needed in vdsm. As part of this modularization, it will be easier to see, specifically, what MOM is allowed to do; making writing a SELinux policy for the policy engine much easier.
It's not a reason to commit to two separate remote APIs that will be supported for a very long period. Modularization and internal apis should be achieved regardless. Moreover, since there is no modularization today, committing too early for new apis might cause us pains in the future.
So my offer is to do the modularization and define apis between mom and vdsm but keep all internal. After a year we'll be able to judge whether we got the right set and might be worth to spin off MOM.
We let's give this plan a try. Based on other planning and dicussions, vdsm is going to gain quite a few new threads: QMF agent threads, REST API server threads, MOM threads. A good place to start poking might be to ensure that we can handle the additional complexity that comes with these extra threads.
Which plan?
A plan of first hosting MOM inside VDSM w/o committing to support public APIs. After definition/stabilization period it will be possible/sensible to spin off MOM for independence.
AFAIK, Dan and Adam are on board this plan, I can't understand from the email what's your opinion about it.
Well, that was not my original intent, but I'm perfectly fine with Vdsm using `import mom` and gaining all its capabilities via an internal - but well-defined - api. I hope mom would not have to spawn a root-owned process for that, but even that is acceptable to me.
It doesn't matter how small you might keep MOM, it will take time to define the right APIs and to commit for long term support. If miraculously one might manage to do this really fast, then no problem at all. Reading previous posts, it seems like VDSM will get benefit of that too for its internal APIs
we can build it as a library from the get go, that would immediately make sure we're passing things the right way and it wouldn't mean that mom has to commit to a stable API. My take on it is that we shouldn't integrate with yet another daemon competing on the same resources (VMs) and that full integration as part of vdsm is wrong as well.
Dan.
On Thu, Dec 01, 2011 at 11:05:55AM +0200, Dor Laor wrote:
On 11/29/2011 06:29 PM, Adam Litke wrote:
After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
I think a form of plan B is more appropriate:
In general we can look at MOM vs VDSM just like micro kernel vs linux kernel approach. MOM can be independent project but then it will need to expose much more apis for VDSM and wise verse.
Using either Plan A or Plan B MOM will need to support data collection from vdsm and controlling/tuning via vdsm. The principle difference is whether MOM is required to interact with vdsm using a stable, external API (Plan B) or if we allow it to tightly couple with vdsm and call internal functions (Plan A). We all seem to agree that a public vdsm API should be created. MOM can be the perfect early-adopter.
For example, take live migration, there is no point of MOM balloon a guest while it is migrating. So either you ignore that which is bad or now need to listen to VDSM events on VM migration.
Think about hot plug vcpu/pci-device to a VM - if before MOM used some SLA for the VM, now it will need to change to cope w/ the new resources, again more api/events for that.
These are interesting cases and tie into the discussion about a VDSM public API. I can say with certainty that ISV applications would care about VM migration and device hotplug events. If these are made a part of the external API then MOM can easily listen for them and react appropriately. Yes, this will force us to think carefully about which APIs to expose but in the end that is a good thing.
Another thing - all of the settings for per VM KSM/THP/Swap/Balloon
- all will need to propagate from the vdsm api towards MOM.
Currently MOM does not control individual guest OS tunables, but if it gains that ability, it will be a part of the policy. MOM already provides an API for dynamically replacing the policy.
I can go on this way.
VDSM is not libvirt, it has policies today, there is no need to split it up into two or more.
For completeness, I do think that there is a place for MOM like functionality within the OS. But I think for the best of ovirt project goals, it would be the most efficient to host all in VDSM and keep our actions VM specific.
Thanks, Dor
vdsm-devel@lists.stg.fedorahosted.org