Hi,
In a dev environement on RHEL 6.3, I am running mash (latest git) every 5 minutes against 20 koji tags.(both 1.6 and 1.7) I am running sequentially the "mash tag" After few hours my rpmdb is corrupted and I need to execute db_recover -h /var/lib/rpm. It seems "createrepo" (0.9.8-5.el6) doesn't release the locks correctly everytime. (/usr/lib/rpm/rpmdb_stat -CA) Does someone already ran into this issue before I investigate more ?
cheers,
On 11/15/2012 02:58 PM, Thomas wrote:
Hi,
In a dev environement on RHEL 6.3, I am running mash (latest git) every 5 minutes against 20 koji tags.(both 1.6 and 1.7) I am running sequentially the "mash tag" After few hours my rpmdb is corrupted and I need to execute db_recover -h /var/lib/rpm. It seems "createrepo" (0.9.8-5.el6) doesn't release the locks correctly everytime. (/usr/lib/rpm/rpmdb_stat -CA) Does someone already ran into this issue before I investigate more ?
What kind of errors you're getting from rpmdb? Nearly all the issues reported as "rpmdb corruption" are something else, such as unclean shutdown from either a crash or getting forcefully killed while inside Berkeley DB calls. Which does prevent rpmdb opens until dealt with (db_recover being one possibility), but it isn't corruption per-se.
- Panu -
When I run /usr/lib/rpm/rpmdb_stat -CA :
Default locking region information: 24857 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 160 Number of lock object partitions 0 Number of current locks 20 Maximum number of locks at any one time 5 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 999 Number of current lockers 1000 Maximum number of lockers at any one time 0 Number of current lock objects 5 Maximum number of lock objects at any one time 1 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 90021 Total number of locks requested 90021 Total number of locks released 0 Total number of locks upgraded 13509 Total number of locks downgraded 18 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 752KB The size of the lock region 46 The number of partition locks that required waiting (0%) 20 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 65 The number of locker allocations that required waiting (0%) 0 The number of region locks that required waiting (0%) 1 Maximum hash bucket length =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
It seems the Number of current lockers is the issue. So the db may no be corrupted but just out of lockers. Any idea on why ?
On 11/15/2012 04:30 PM, Thomas wrote:
When I run /usr/lib/rpm/rpmdb_stat -CA :
Default locking region information: 24857 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 160 Number of lock object partitions 0 Number of current locks 20 Maximum number of locks at any one time 5 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 999 Number of current lockers 1000 Maximum number of lockers at any one time 0 Number of current lock objects 5 Maximum number of lock objects at any one time 1 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 90021 Total number of locks requested 90021 Total number of locks released 0 Total number of locks upgraded 13509 Total number of locks downgraded 18 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 752KB The size of the lock region 46 The number of partition locks that required waiting (0%) 20 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 65 The number of locker allocations that required waiting (0%) 0 The number of region locks that required waiting (0%) 1 Maximum hash bucket length =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
It seems the Number of current lockers is the issue. So the db may no be corrupted but just out of lockers.
Yup, running out of lockers is not corruption, its just out of resources. It does prevent further rpmdb opens unless dealt with though...
Any idea on why ?
Well, something is leaving open rpmdb / iterator handles around. To find what that something is, 'fuser -uv /var/lib/rpm/*' should give clues.
One possibility is something (maybe mash) hitting this: http://rpm.org/ticket/820. While the dangling iterators issue is entirely avoidable with careful programming, older rpm versions (such as the one in RHEL 6) isn't doing a very good job of managing its resources. IIRC yum's API has or at least had some corners where it was all too easy to trigger this issue which arguably is a bug in rpm.
Like Bill noted, an easy workaround should be running as non-root, as unprivileged rpmdb accesses uses a private "locker room" which is wiped out from existance after use so stale locks from unclosed / dangling iterators dont get to pile up.
- Panu -
Indeed running as non-root fixes the issue.
Thanks for the detailed description.
On Fri, Nov 16, 2012 at 9:51 AM, Panu Matilainen pmatilai@laiskiainen.orgwrote:
On 11/15/2012 04:30 PM, Thomas wrote:
When I run /usr/lib/rpm/rpmdb_stat -CA :
Default locking region information: 24857 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 160 Number of lock object partitions 0 Number of current locks 20 Maximum number of locks at any one time 5 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 999 Number of current lockers 1000 Maximum number of lockers at any one time 0 Number of current lock objects 5 Maximum number of lock objects at any one time 1 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 90021 Total number of locks requested 90021 Total number of locks released 0 Total number of locks upgraded 13509 Total number of locks downgraded 18 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 752KB The size of the lock region 46 The number of partition locks that required waiting (0%) 20 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 65 The number of locker allocations that required waiting (0%) 0 The number of region locks that required waiting (0%) 1 Maximum hash bucket length =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-**=-=-=-=-=-=-=-=-=-=
It seems the Number of current lockers is the issue. So the db may no be corrupted but just out of lockers.
Yup, running out of lockers is not corruption, its just out of resources. It does prevent further rpmdb opens unless dealt with though...
Any idea on why ?
Well, something is leaving open rpmdb / iterator handles around. To find what that something is, 'fuser -uv /var/lib/rpm/*' should give clues.
One possibility is something (maybe mash) hitting this: http://rpm.org/ticket/820. While the dangling iterators issue is entirely avoidable with careful programming, older rpm versions (such as the one in RHEL 6) isn't doing a very good job of managing its resources. IIRC yum's API has or at least had some corners where it was all too easy to trigger this issue which arguably is a bug in rpm.
Like Bill noted, an easy workaround should be running as non-root, as unprivileged rpmdb accesses uses a private "locker room" which is wiped out from existance after use so stale locks from unclosed / dangling iterators dont get to pile up.
- Panu -
-- buildsys mailing list buildsys@lists.fedoraproject.**org buildsys@lists.fedoraproject.org https://admin.fedoraproject.**org/mailman/listinfo/buildsyshttps://admin.fedoraproject.org/mailman/listinfo/buildsys
Thomas (alphacc@gmail.com) said:
Hi,
In a dev environement on RHEL 6.3, I am running mash (latest git) every 5 minutes against 20 koji tags.(both 1.6 and 1.7) I am running sequentially the "mash tag" After few hours my rpmdb is corrupted and I need to execute db_recover -h /var/lib/rpm. It seems "createrepo" (0.9.8-5.el6) doesn't release the locks correctly everytime. (/usr/lib/rpm/rpmdb_stat -CA) Does someone already ran into this issue before I investigate more ?
It's sidestepping issues entirely, but you might consider running mash as non-root. (Unless you're already doing so and it's still happening?)
Bill
buildsys@lists.fedoraproject.org