FYI: https://groups.google.com/forum/#!topic/ansible-project/Mj6vmhqMED8
just beware before you do "yum upgrade".
On Wed, 26 Feb 2014 13:45:56 +0100 Miroslav Suchý msuchy@redhat.com wrote:
FYI: https://groups.google.com/forum/#!topic/ansible-project/Mj6vmhqMED8
just beware before you do "yum upgrade".
I'd be interested in short reproducers here... the changes between 1.4.3 and 1.4.5 are really minor:
1.4.5 "Could This Be Magic" - February 12, 2014 fixed issue with permissions being incorrect on fireball/accelerate keys when the umask setting was too loose.
1.4.4 "Could This Be Magic" - January 6, 2014 fixed a minor issue with newer versions of pip dropping the "use-mirrors" parameter.
You seem to be somehow missing the: /usr/share/ansible/utilities/ path ?
Both debug and wait_for are in that dir, and your errors show it's not looking in there?
kevin
On 02/27/2014 12:21 AM, Kevin Fenzi wrote:
I'd be interested in short reproducers here... the changes between 1.4.3 and 1.4.5 are really minor:
That is really strange. The problem disappear after I downgraded. I was not able to reproduce it on another machine. And today it start happen on copr production machine again (with old ansible). And then suddenly disappear. And it suddenly work even with new version and I swear I did not change anything relevant. Nothing to see here, go home :) Well unless I will be able to reproduce it.
It happen again.But now I have more traces and hints.
This morning (9:33 UTC) I get Nagios alert that: WARN: datanommer has not seen a copr message in 6 hours, 10 minutes, 39 seconds which means that sometime between 3:30 UTC and 4:30 UTC something happen.
I logged to copr-be and to my surprise: ansible-playbook -vvvv -c ssh /home/copr/provision/builderpb.yml ERROR: debug is not a legal parameter in an Ansible task or handler without changing anything over night.
To my surprise I find that: rpm -V ansible ... missing /usr/share/ansible/utilities missing /usr/share/ansible/utilities/accelerate missing /usr/share/ansible/utilities/debug missing /usr/share/ansible/utilities/fail missing /usr/share/ansible/utilities/include_vars missing /usr/share/ansible/utilities/pause missing /usr/share/ansible/utilities/set_fact missing /usr/share/ansible/utilities/wait_for
I.e. Whole content of /usr/share/ansible/utilities is missing. I quickly reinstall ansible package and everything started working again.
Now I have to find the cause otherwise I expect that it happen again this night.
I checked syslog and only relevant informations are: 1) Feb 28 03:46:22 dhcp-client03 systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 24347 (find) Feb 28 03:46:22 dhcp-client03 systemd[1]: Mounting Arbitrary Executable File Formats File System... Feb 28 03:46:22 dhcp-client03 systemd[1]: Mounted Arbitrary Executable File Formats File System.
2) Feb 28 04:04:05 dhcp-client03 systemd-logind[291]: New session 24 of user root. Feb 28 04:04:05 dhcp-client03 ansible-yum: Invoked with CHECKMODE=True name=cloud-utils list=None disable_gpg_check=False conf_file=None state=present disablerepo=None enablerepo=None Feb 28 04:04:05 dhcp-client03 systemd-logind[291]: Removed session 24. Feb 28 04:04:05 dhcp-client03 systemd-logind[291]: New session 25 of user root. Feb 28 04:04:05 dhcp-client03 ansible-command: Invoked with executable=None shell=False args=growpart /dev/vda 2 removes=None creates=None chdir=None Feb 28 04:04:06 dhcp-client03 systemd-logind[291]: Removed session 25. Feb 28 04:04:06 dhcp-client03 systemd-logind[291]: New session 26 of user root. Feb 28 04:04:06 dhcp-client03 ansible-setup: Invoked with CHECKMODE=True filter=* fact_path=/etc/ansible/facts.d Feb 28 04:04:06 dhcp-client03 systemd-logind[291]: Removed session 26. Feb 28 04:04:07 dhcp-client03 systemd-logind[291]: New session 27 of user root. Feb 28 04:04:07 dhcp-client03 ansible-yum: Invoked with CHECKMODE=True name=fedmsg,libsemanage-python,python-psutil list=None disable_gpg_check=False conf_file=None state=installed disablerepo=None pkg=fedmsg,libsemanage-python,python-psutil enablerepo=None Feb 28 04:04:42 dhcp-client03 systemd-logind[291]: Removed session 27.
I am not sure about the first one.
The second one is some ansible playbook (can it be that nirik check of differences?) But I'm really clueless how it can remove /usr/share/ansible/utilities/* Does somebody have some idea?
On Fri, 28 Feb 2014 11:15:00 +0100 Miroslav Suchý msuchy@redhat.com wrote:
It happen again.But now I have more traces and hints.
...snip...
I am not sure about the first one.
The second one is some ansible playbook (can it be that nirik check of differences?) But I'm really clueless how it can remove /usr/share/ansible/utilities/* Does somebody have some idea?
Well, yeah, there is a nightly cron now that runs 'ansible-playbook --check --diff' on all the playbooks.
It really shouldn't make any changes to the hosts and I don't know why it would. ;(
However, we can run it manually with -vvv against the copr-be playbook when you are available and see if it does this and why?
Just ping me on irc when you are available to watch the copr-be end.
kevin
On 02/28/2014 06:52 PM, Kevin Fenzi wrote:
Just ping me on irc when you are available to watch the copr-be end.
Will do.
In the meantime I 'solved' it by "chattr +i" on those files.
On Mon, 03 Mar 2014 08:42:50 +0100 Miroslav Suchý msuchy@redhat.com wrote:
On 02/28/2014 06:52 PM, Kevin Fenzi wrote:
Just ping me on irc when you are available to watch the copr-be end.
Will do.
In the meantime I 'solved' it by "chattr +i" on those files.
:)
I did look at this this weekend and it was not the --check --diff run doing it, but I was unable to figure out what was. ;(
Perhaps we could setup incron to watch changes in that dir and record what process is doing it. It's very strange. ;(
kevin
infrastructure@lists.fedoraproject.org