After discussing MOM / VDSM integration at length, two different strategies have emerged. I will call them Plan A and Plan B:
Plan A: MOM integration at the OS/Packaging level Plan B: MOM integration as a new VDSM thread
This RFC is about Plan A. I will start another thread to discuss Plan B once I have properly prototyped the idea in code.
Integration VDSM and MOM at the OS level is by far the simpler and least intrusive option. As you can see from the included patch, the changes to vdsm are very limited. In this model, VDSM interacts with MOM in the same way as it uses libvirt. Upon installation, VDSM installs its own MOM configuration file and restarts the MOM daemon (which continues to exist as an independent system-level daemon). Once restarted, MOM will load its policy from the VDSM configuration directory.
Pros: - Simple and unobtrusive to either MOM or VDSM - Clean API with no duplication or layering - Maintain flexibility to tighten integration in the future
Cons: - Momd runs as root (like supervdsm) - If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
Based on my experience while working on Plan A and Plan B, I feel that this approach is the best way to start. Once MOM and VSDM are commingled on the node, we can begin the interesting work of providing the actual dynamic policy to manage the system.
Sample code for Plan A follows:
commit 4464c07849cfd921d0e3446961c5b6471dd360d9 Author: Adam Litke agl@us.ibm.com Date: Mon Nov 28 08:46:22 2011 -0600
Integrate with MOM at the system/packaging level
diff --git a/vdsm.spec.in b/vdsm.spec.in index cf12428..14588c6 100644 --- a/vdsm.spec.in +++ b/vdsm.spec.in @@ -151,6 +151,13 @@ rm -rf %{buildroot} /usr/sbin/saslpasswd2 -p -a libvirt vdsm@rhevh < \ /etc/pki/vdsm/keys/libvirt_password
+# install the mom config file and restart momd +if [ -f %{_sysconfdir}/momd.conf ]; then + mv -n %{_sysconfdir}/momd.conf %{_sysconfdir}/momd.conf.vdsmsave +fi +cp %{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/momd.conf +/sbin/service momd condrestart > /dev/null 2>&1 + %preun if [ "$1" -eq 0 ] then @@ -176,6 +183,12 @@ _EOF
/usr/sbin/saslpasswd2 -p -a libvirt -d vdsm@rhevh
+ # Restore old MOM configuration + if [ -f %{_sysconfdir}/momd.conf.vdsmsave ]; then + mv %{_sysconfdir}/momd.conf.vdsmsave %{_sysconfdir}/momd.conf + /sbin/service momd condrestart > /dev/null 2>&1 + fi + %if 0%{?rhel} if /sbin/initctl status libvirtd >/dev/null 2>/dev/null ; then /sbin/initctl stop libvirtd >/dev/null 2>/dev/null @@ -246,6 +259,8 @@ fi %config(noreplace) %{_sysconfdir}/%{vdsm_name}/logger.conf %config(noreplace) %{_sysconfdir}/logrotate.d/vdsm %config(noreplace) %{_sysconfdir}/rwtab.d/vdsm +%{_sysconfdir}/%{vdsm_name}/mom.policy +%{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/sudoers.d/50_vdsm %{_sysconfdir}/cron.hourly/vdsm-logrotate %{_sysconfdir}/cron.d/vdsm-libvirt-logrotate diff --git a/vdsm/Makefile.am b/vdsm/Makefile.am index 7da9cad..a96a323 100644 --- a/vdsm/Makefile.am +++ b/vdsm/Makefile.am @@ -83,7 +83,9 @@ EXTRA_DIST = \ vdsm-restore-net-config.in \ vdsm.rwtab \ vdsm-sosplugin.py.in \ - vdsm-store-net-config.in + vdsm-store-net-config.in \ + mom.policy \ + momd.conf
# Reference: # http://www.gnu.org/software/automake/manual/html_node/Scripts.html @@ -115,7 +117,7 @@ install-data-hook: install-data-local: install-data-init install-data-logger \ install-data-rwtab install-data-logrotate \ install-data-sudoers install-data-sosplugin \ - install-data-libvirtpass + install-data-libvirtpass install-data-mom $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/keys $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/certs $(MKDIR_P) $(DESTDIR)$(vdsmlogdir) @@ -128,7 +130,7 @@ install-data-local: install-data-init install-data-logger \ uninstall-local: uninstall-data-init uninstall-data-logger \ uninstall-data-rwtab uninstall-data-logrotate \ uninstall-data-sudoers uninstall-data-sosplugin \ - uninstall-data-libvirtpass + uninstall-data-libvirtpass uninstall-data-mom
install-data-init: $(MKDIR_P) $(DESTDIR)$(sysconfdir)/rc.d/init.d @@ -191,3 +193,13 @@ install-data-sosplugin:
uninstall-data-sosplugin: $(RM) $(DESTDIR)$(pythondir)/sos/plugins/vdsm.py + +install-data-mom: + $(INSTALL_DATA) mom.policy \ + $(DESTDIR)$(vdsmconfdir)/mom.policy + $(INSTALL_DATA) momd.conf \ + $(DESTDIR)$(vdsmconfdir)/momd.conf + +uninstall-data-mom: + $(RM) $(DESTDIR)$(vdsmconfdir)/mom.policy + $(RM) $(DESTDIR)$(vdsmconfdir)/momd.conf diff --git a/vdsm/mom.policy b/vdsm/mom.policy new file mode 100644 index 0000000..cb31526 --- /dev/null +++ b/vdsm/mom.policy @@ -0,0 +1,155 @@ +### KSM ######################################################################## + +### Constants +# The number of pages to add when increasing pages_to_scan +(defvar ksm_pages_boost 300) + +# The number of pages to subtract when decreasing pages_to_scan +(defvar ksm_pages_decay -50) + +# The min and max number of pages to scan per cycle when ksm is activated +(defvar ksm_npages_min 64) +(defvar ksm_npages_max 1250) + +# The number of ms to sleep between ksmd scans for a 16GB system. Systems with +# more memory will sleep less, while smaller systems will sleep more. +(defvar ksm_sleep_ms_baseline 10) + +# A virtualization host tends to use most of its memory for running guests but +# a certain amount is reserved for the host OS, non virtualization-related work, +# and as a failsafe. When free memory (including memory used for caches) drops +# below this parcentage of total memory, the host is deemed under pressure. and +# KSM will be started to try and free up some memory. +(defvar ksm_free_percent 0.20) + +### Helper functions +(def change_npages (delta) +{ + (defvar newval (+ Host.ksm_pages_to_scan delta)) + (if (> newval ksm_npages_max) (set newval ksm_npages_max) 1) + (if (< newval ksm_npages_min) (set newval ksm_npages_min) 0) + (Host.Control "ksm_pages_to_scan" newval) +}) + +### Main Script +# Methodology: Since running KSM does incur some overhead, try to run it only +# when necessary. If the amount of committed KSM shareable memory is high or if +# free memory is low, enable KSM to try to increase free memory. Large memory +# machines should scan more often than small ones. Likewise, machines under +# memory pressure should scan more aggressively then more idle machines. + +(defvar ksm_pressure_threshold (* Host.mem_available ksm_free_percent)) +(defvar ksm_committed Host.ksm_shareable) + +(if (and (< (+ ksm_pressure_threshold ksm_committed) Host.mem_available) + (> (Host.StatAvg "mem_free") ksm_pressure_threshold)) + (Host.Control "ksm_run" 0) + { # else + (Host.Control "ksm_run" 1) + (Host.Control "ksm_sleep_millisecs" + (/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available)) + (if (< (Host.StatAvg "mem_free") ksm_pressure_threshold) + (change_npages ksm_pages_boost) + (change_npages ksm_pages_decay)) + } +) +### Auto-Balloon ############################################################### + +### Constants +# If the percentage of host free memory drops below this value +# then we will consider the host to be under memory pressure +(defvar pressure_threshold 0.20) + +# If pressure threshold drops below this level, then the pressure +# is critical and more aggressive ballooning will be employed. +(defvar pressure_critical 0.05) + +# This is the minimum percentage of free memory that an unconstrained +# guest would like to maintain +(defvar min_guest_free_percent 0.20) + +# Don't change a guest's memory by more than this percent of total memory +(defvar max_balloon_change_percent 0.05) + +# Only ballooning operations that change the balloon by this percentage +# of current guest memory should be undertaken to avoid overhead +(defvar min_balloon_change_percent 0.0025) + +### Helper functions +# Check if the proposed new balloon value is a large-enough +# change to justify a balloon operation. This prevents us from +# introducing overhead through lots of small ballooning operations +(def change_big_enough (guest new_val) +{ + (if (> (abs (- new_val guest.libvirt_curmem)) + (* min_balloon_change_percent guest.libvirt_curmem)) + 1 0) +}) + +(def shrink_guest (guest) +{ + # Determine the degree of host memory pressure + (if (<= host_free_percent pressure_critical) + # Pressure is critical: + # Force guest to swap by making free memory negative + (defvar guest_free_percent (+ -0.05 host_free_percent)) + # Normal pressure situation + # Scale the guest free memory back according to host pressure + (defvar guest_free_percent (* min_guest_free_percent + (/ host_free_percent pressure_threshold)))) + + # Given current conditions, determine the ideal guest memory size + (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem") + (guest.StatAvg "mem_unused"))) + (defvar balloon_min (+ guest_used_mem + (* guest_free_percent guest.libvirt_maxmem))) + # But do not change it too fast + (defvar balloon_size (* guest.libvirt_curmem + (- 1 max_balloon_change_percent))) + (if (< balloon_size balloon_min) + (set balloon_size balloon_min) + 0) + # Set the new target for the BalloonController. Only set it if the + # value makes sense and is a large enough change to be worth it. + (if (and (<= balloon_size guest.libvirt_maxmem) + (change_big_enough guest balloon_size)) + (guest.Control "balloon_target" balloon_size) + 0) +}) + +(def grow_guest (guest) +{ + # There is only work to do if the guest is ballooned + (if (< guest.libvirt_curmem guest.libvirt_maxmem) { + # Minimally, increase so the guest has its desired free memory + (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem") + (guest.StatAvg "mem_unused"))) + (defvar balloon_min (+ guest_used_mem (* min_guest_free_percent + guest.libvirt_maxmem))) + # Otherwise, increase according to the max balloon change + (defvar balloon_size (* guest.libvirt_curmem + (+ 1 max_balloon_change_percent))) + + # Determine the new target for the BalloonController. Only set + # if the value is a large enough for the change to be worth it. + (if (> balloon_size guest.libvirt_maxmem) + (set balloon_size guest.libvirt_maxmem) 0) + (if (< balloon_size balloon_min) + (set balloon_size balloon_min) 0) + (if (change_big_enough guest balloon_size) + (guest.Control "balloon_target" balloon_size) 0) + } 0) +}) + +### Main script +# Methodology: The goal is to shrink all guests fairly and by an amount +# scaled to the level of host memory pressure. If the host is under +# severe pressure, scale back more aggressively. We don't yet handle +# symptoms of over-ballooning guests or try to balloon idle guests more +# aggressively. When the host is not under memory pressure, slowly +# deflate the balloons. + +(defvar host_free_percent (/ (Host.StatAvg "mem_free") Host.mem_available)) +(if (< host_free_percent pressure_threshold) + (with Guests guest (shrink_guest guest)) + (with Guests guest (grow_guest guest))) diff --git a/vdsm/momd.conf b/vdsm/momd.conf new file mode 100644 index 0000000..4d09f44 --- /dev/null +++ b/vdsm/momd.conf @@ -0,0 +1,83 @@ +### DO NOT REMOVE THIS COMMENT -- MOM Configuration for VDSM ### + +[main] +# The wake up frequency of the main daemon (in seconds) +main-loop-interval: 5 + +# The data collection interval for host statistics (in seconds) +host-monitor-interval: 5 + +# The data collection interval for guest statistics (in seconds) +guest-monitor-interval: 5 + +# The wake up frequency of the guest manager (in seconds). The guest manager +# sets up monitoring and control for newly-created guests and cleans up after +# deleted guests. +guest-manager-interval: 5 + +# The wake up frequency of the policy engine (in seconds). During each +# interval the policy engine evaluates the policy and passes the results +# to each enabled controller plugin. +policy-engine-interval: 10 + +# A comma-separated list of Controller plugins to enable +controllers: Balloon, KSM + +# Sets the maximum number of statistic samples to keep for the purpose of +# calculating moving averages. +sample-history-length: 10 + +# The URI to use when connecting to this host's libvirt interface. If this is +# left blank then the system default URI is used. +libvirt-hypervisor-uri: qemu:///system + +# Set this to an existing, writable directory to enable plotting. For each +# invocation of the program a subdirectory momplot-NNN will be created where NNN +# is a sequence number. Within that directory, tab-delimited data files will be +# created and updated with all data generated by the configured Collectors. +plot-dir: + +# Activate the RPC server on the designated port (-1 to disable). RPC is +# disabled by default until authentication is added to the protocol. +rpc-port: -1 + +# At startup, load a policy from the given file. If empty, no policy is loaded +policy: /etc/vdsm/mom.policy + +[logging] +# Set the destination for program log messages. This can be either 'stdio' or +# a filename. When the log goes to a file, log rotation will be done +# automatically. +log: /var/log/momd.log + +# Set the logging verbosity level. The following levels are supported: +# 5 or debug: Debugging messages +# 4 or info: Detailed messages concerning normal program operation +# 3 or warn: Warning messages (program operation may be impacted) +# 2 or error: Errors that severely impact program operation +# 1 or critical: Emergency conditions +# This option can be specified by number or name. +verbosity: info + +## The following two variables are used only when logging is directed to a file. +# Set the maximum size of a log file (in bytes) before it is rotated. +max-bytes: 2097152 +# Set the maximum number of rotated logs to retain. +backup-count: 5 + +[host] +# A comma-separated list of Collector plugins to use for Host data collection. +collectors: HostMemory, HostKSM + +[guest] +# A comma-separated list of Collector plugins to use for Guest data collection. +collectors: GuestQemuProc, GuestLibvirt + +# Collector-specific configuration for GuestQemuAgent +[Collector: GuestQemuAgent] +# Set the base path where the host-side sockets for guest communication can be +# found. The GuestQemuAgent Collector will try to open files with the following +# names: +# <socket_path>/va-<guest-name>-virtio.sock - for virtio serial +# <socket_path>/va-<guest-name>-isa.sock - for isa serial +socket_path: /var/lib/libvirt/qemu