Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=d6…
Commit: d67b9cf67ffbd9f538b85fbb220bfb79d7f13ace
Parent: 0000000000000000000000000000000000000000
Author: Fabio M. Di Nitto <fdinitto(a)redhat.com>
AuthorDate: 2010-02-23 04:30 +0000
Committer: Fabio M. Di Nitto <fdinitto(a)redhat.com>
CommitterDate: 2010-02-23 04:30 +0000
annotated tag: cluster-3.0.8 has been created
at d67b9cf67ffbd9f538b85fbb220bfb79d7f13ace (tag)
tagging 5c55d7d3d16caf313091daa1995a988f4c807a05 (commit)
replaces cluster-3.0.7
cluster-3.0.8 release
Abhijith Das (7):
libgfs2: Bug 459630 - GFS2: changes needed to gfs2-utils due to gfs2meta fs changes in bz 457798
gfs_jadd: Bug 555363 - gfs_jadd does not resolve symbolic links
gfs2_convert: gfs2_convert should fix statfs file
mount.gfs2: Better error reporting when mounting a gfs fs without enough journals
Merge branch 'STABLE3' of ssh://git.fedoraproject.org/git/cluster into mySTABLE3
gfs-kernel: Flock on GFS fs file will error with "Resource tempory unavailable" for EWOULDBLOCK
gfs2_convert: gfs2_convert doesn't convert jdata files correctly
Bob Peterson (61):
Remove nvbuf_list and use fewer buffers
Eliminate bad_block linked block list
Simplify bitmap/block list structures
Streamline the bitmap code by always using 4-bit size per block
Misc blocklist optimizations
Separate eattr_block list from the rest for efficiency
gfs2: remove update_flags everywhere
fsck.gfs2: give comfort when processing lots of data blocks
fsck.gfs2: make query() count errors_found, errors_fixed
Attach buffers to rgrp_list structs
Make struct_out functions operate on bh's
Attach bh's to inodes
gfs2: Remove buf_lists
fsck.gfs2: Verify rgrps free space against bitmap
libgfs2: Consistent naming for blockmap functions
Move duplicate code from libgfs2 to fsck.gfs2
libgfs2, fsck.gfs2: simplify block_query code
gfs2: libgfs2 and fsck.gfs2 cleanups
libgfs2: fs_bits speed up bitmap operations
libgfs2: gfs2_log reform
fsck.gfs2: convert dup_list to a rbtree
fsck.gfs2: convert dir_info list to rbtree
fsck.gfs2: convert inode hash to rbtree
fsck.gfs2: pass1 should use gfs2_special_add not _set
libgfs2: Remove unneeded sdp parameter in gfs2_block_set
libgfs2: dir_split_leaf needs to zero out the new leaf
libgfs2: dir_split_leaf needs to check for allocation failure
libgfs2: Set block range based on rgrps, not device size
fsck.gfs2: should use the libgfs2 is_system_directory
fsck.gfs2: Journal replay should report what it's doing
fsck.gfs2: fix directories that have odd number of pointers.
libgfs2: Get rid of useless constants
fsck.gfs2: link.c should log why it's making a change for debugging
fsck.gfs2: Enforce consistent behavior in directory processing
fsck.gfs2: enforce consistency between bitmap and blockmap
fsck.gfs2: metawalk needs to check for no valid leaf blocks
fsck.gfs2: metawalk was not checking many directories
fsck.gfs2: separate check_data function in check_metatree
lost+found link count and connections were not properly managed
fsck.gfs2: reprocess lost+found and other inode metadata when blocks are added
Misc cleanups
fsck.gfs2: Check for massive amounts of pointer corruption
fsck.gfs2: use gfs2_meta_inval vs. gfs2_inval_inode
Eliminate unnecessary block_list from gfs2_edit
fsck.gfs2: rename gfs2_meta_other to gfs2_meta_rgrp.
Create a standard metadata delete interface
fsck.gfs2: cleanup: refactor pass3
fsck.gfs2: Make pass1 undo its work for unrecoverable inodes
fsck.gfs2: Overhaul duplicate reference processing
fsck.gfs2: invalidate invalid mode inodes
fsck.gfs2: Force intermediate lost+found inode updates
fsck.gfs2: Free metadata list memory we don't need
fsck.gfs2: Don't add extended attrib blocks to list twice
fsck.gfs2: small parameter passing optimization
fsck.gfs2: Free, don't invalidate, dinodes with bad depth
Misc cleanups
fsck.gfs2: If journal replay fails, give option to reinitialize journal
Fix white space errors
fsck.gfs2 fails on root fs: Device X is busy.
gfs2_edit savemeta: Don't release indirect buffers too soon
fsck.gfs2: Use fsck.ext3's method of dealing with root mounts
Christine Caulfield (4):
cman: use the typed objdb calls
cman: don't set token_retransmits_before_loss_const
cman: disable gfs plock_ownership when upgrading
config: Add schema entry for clvmd
David Teigland (15):
man pages: cluster.conf
cluster.rng: updates
man pages: fence_node, fenced
man pages: dlm_controld
cluster.rng: dlm updates
man pages: groupd
man pages: fenced
man pages: group_tool
cluster.rng: group/groupd_compat
man pages: gfs_controld
cluster.rng: gfs_controld
cluster.rng: fence, fencedevices
man pages: dlm_tool
man pages: gfs_control
dlm_controld: check all messages against enable options
Dyna Ares (1):
config: Make broadcast attr reflect documentation
Fabio M. Di Nitto (10):
release: don't build gfs-utils tarball
fence agents: man page clean up
cman init: propagate errors from fence_tool operations
gfs2: make init script LSB compliant
fence agents: fix several agents build
dlm_controld: fix linking
nss: fix linking
build: fix out-of-tree build of fence agents
release script rework
logrotate: fix logrotate default actions and set sane defaults
Jonathan E. Brassow (1):
rgmanager: halvm: Check ownership before stripping tags
Lon Hohberger (17):
config: Make nodeid attribute required
config: Make nodeid required in ldif schema
config: Fix license for value-list.[ch]
config: Sync LDIF w/ cluster.rng
rgmanager: isAlive error logging for file systems
config: Add fence_virt to cluster.rng
config: Update LDIF schema based on recent RelaxNG changes
rgmanager: Make relocate-to-offline consistent
qdisk: Fix logt_print which used to be perror()
resource-agents: SAPDatabase: remove $TEMPFILE
qdisk: Autoconfigure default timings
Revert "qdisk: Autoconfigure default timings"
rgmanager: Make VF timeout scale with token timeout
rgmanager: Clean up build warnings
qdisk: Autoconfigure default timings
qdiskd: Autoconfigure votes based on node count
qdisk: Fix uninitialized variable
Marek 'marx' Grac (2):
fencing: Add vendor URL to man pages
resource agents: Handle multiline pid files
Ryan O'Hara (2):
Remove open3 calls and replace with simple qx commands. This avoids
Always remove leading zeros from key value.
Shane Bradley (1):
resource-agents: Kill correct PIDs during force_unmount
Tatsuo Kawasaki (1):
qdisk: mkqdisk argument positioning
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=77…
Commit: 77279c9976e222473c041f3896046f1134ee34fe
Parent: 704fd5bb382ff452fc8403b9a2322e73b918619d
Author: Lon Hohberger <lhh(a)redhat.com>
AuthorDate: Fri Feb 19 12:18:58 2010 -0500
Committer: Lon Hohberger <lhh(a)redhat.com>
CommitterDate: Fri Feb 19 12:18:58 2010 -0500
qdiskd: Autoconfigure votes based on node count
This makes qdiskd easier to use for most configurations
and does not override existing configurations.
Signed-off-by: Lon Hohberger <lhh(a)redhat.com>
---
cman/man/qdisk.5 | 5 +++--
cman/qdisk/main.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/cman/man/qdisk.5 b/cman/man/qdisk.5
index 1a74470..abd80dc 100644
--- a/cman/man/qdisk.5
+++ b/cman/man/qdisk.5
@@ -73,7 +73,7 @@ the amount of synchronous I/O contention on the shared quorum disk.
* Cluster node IDs must be statically configured in cluster.conf and
must be numbered from 1..16 (there can be gaps, of course).
-* Cluster node votes should be more or less equal.
+* Cluster node votes should all be 1.
* CMAN must be running before the qdisk program can operate in full
capacity. If CMAN is not running, qdisk will wait for it.
@@ -239,7 +239,8 @@ exceed \fBtko\fP.
\fIvotes\fP\fB="\fP3\fB"\fP
.in 12
This is the number of votes the quorum daemon advertises to CMAN when it
-has a high enough score.
+has a high enough score. The default is the number of nodes in the cluster
+minus 1. For example, in a 4 node cluster, the default is 3.
.in 9
\fIlog_level\fP\fB="\fP4\fB"\fP
diff --git a/cman/qdisk/main.c b/cman/qdisk/main.c
index 85a0563..5d984dc 100644
--- a/cman/qdisk/main.c
+++ b/cman/qdisk/main.c
@@ -1469,6 +1469,40 @@ get_dynamic_config_data(qd_ctx *ctx, int ccsfd)
static int
+auto_qdisk_votes(int desc)
+{
+ int x, ret = 0;
+ char buf[128];
+ char *name;
+
+ if (desc < 0) {
+ return 1;
+ }
+
+ while (++x) {
+ snprintf(buf, sizeof(buf)-1,
+ "/cluster/clusternodes/clusternode[%d]/@name", x);
+
+ name = NULL;
+ if (ccs_get(desc, buf, &name) != 0)
+ break;
+
+ free(name);
+ ret = x;
+ }
+
+ --ret;
+ if (ret <= 0) {
+ ret = 1;
+ }
+
+ logt_print(LOG_DEBUG, "Setting votes to %d\n", ret);
+
+ return (ret);
+}
+
+
+static int
get_static_config_data(qd_ctx *ctx, int ccsfd)
{
char *val = NULL;
@@ -1574,6 +1608,8 @@ get_static_config_data(qd_ctx *ctx, int ccsfd)
free(val);
if (ctx->qc_votes < 0)
ctx->qc_votes = 0;
+ } else {
+ ctx->qc_votes = auto_qdisk_votes(ccsfd);
}
/* Get device */
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=70…
Commit: 704fd5bb382ff452fc8403b9a2322e73b918619d
Parent: e3fdd916da5ad54e37bbc457f336fb482dbe3b14
Author: Lon Hohberger <lhh(a)redhat.com>
AuthorDate: Fri Feb 19 09:45:17 2010 -0500
Committer: Lon Hohberger <lhh(a)redhat.com>
CommitterDate: Fri Feb 19 11:56:47 2010 -0500
qdisk: Autoconfigure default timings
Qdiskd has historically had bad default timings in STABLE3
and STABLE2. This patch makes qdiskd scale automatically
with the Totem token timeout.
Signed-off-by: Lon Hohberger <lhh(a)redhat.com>
---
cman/man/qdisk.5 | 30 +++++++++++++++++++++++++++---
cman/qdisk/disk.h | 2 +-
cman/qdisk/main.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 70 insertions(+), 7 deletions(-)
diff --git a/cman/man/qdisk.5 b/cman/man/qdisk.5
index f578e92..1a74470 100644
--- a/cman/man/qdisk.5
+++ b/cman/man/qdisk.5
@@ -80,7 +80,8 @@ capacity. If CMAN is not running, qdisk will wait for it.
* CMAN's eviction timeout should be at least 2x the quorum daemon's
to give the quorum daemon adequate time to converge on a master during a
-failure + load spike situation.
+failure + load spike situation. See section 3.3.1 for specific
+details.
* For 'all-but-one' failure operation, the total number of votes assigned
to the quorum device should be equal to or greater than the total number
@@ -211,6 +212,7 @@ This is the frequency of read/write cycles, in seconds.
\fItko\fP\fB="\fP10\fB"\fP
.in 12
This is the number of cycles a node must miss in order to be declared dead.
+The default for this number is dependent on the configured token timeout.
.in 9
\fItko_up\fP\fB="\fPX\fB"\fP
@@ -289,7 +291,7 @@ This option requires careful tuning of the CMAN timeout, the qdiskd
timeout, and CMAN's quorum_dev_poll value. As a rule of thumb,
CMAN's quorum_dev_poll value should be equal to Totem's token timeout
and qdiskd's timeout (interval*tko) should be less than half of
-Totem's token timeout.
+Totem's token timeout. See section 3.3.1 for more information.
This option only takes effect if there are no heuristics
configured. Usage of this option in configurations with more than
@@ -372,7 +374,7 @@ label. This is useful in configurations where the block device name
differs on a per-node basis.
.in 9
-\fIcman_label\fP\fB="\fPmylabel\fB"/>\fP
+\fIcman_label\fP\fB="\fPmylabel\fB"\fP
.in 12
This overrides the label advertised to CMAN if present. If specified,
the quorum daemon will register with this name instead of the actual
@@ -391,6 +393,28 @@ qdiskd is running. This option is ignored if io_timeout is set to 1.
\fB/>\fP
.in 0
+.SH "3.3.1. Quorum Disk Timings"
+Qdiskd should not be used in environments requiring failure detection
+times of less than approximately 10 seconds.
+
+Qdiskd will attempt to automatically configure timings based on the
+totem timeout and the TKO. If configuring manually, Totem's token
+timeout \fBmust\fP be set to a value at least 1 interval greater than
+the the following function:
+
+ interval * (tko + master_wait + upgrade_wait)
+
+So, if you have an interval of 2, a tko of 7, master_wait of 2 and
+upgrade_wait of 2, the token timeout should be at least 24 seconds
+(24000 msec).
+
+It is recommended to have at least 3 intervals to reduce the risk of
+quorum loss during heavy I/O load. As a rule of thumb, using a totem
+timeout more than 2x of qdiskd's timeout will result in good behavior.
+
+An improper timing configuration will cause CMAN to give up on qdiskd,
+causing a temporary loss of quorum during master transition.
+
.SH "3.2. The <heuristic> tag"
This tag is a child of the <quorumd> tag. Heuristics may not be changed
while qdiskd is running.
diff --git a/cman/qdisk/disk.h b/cman/qdisk/disk.h
index c5b3d18..8678ca7 100644
--- a/cman/qdisk/disk.h
+++ b/cman/qdisk/disk.h
@@ -246,7 +246,7 @@ typedef struct {
int qc_max_error_cycles;
int qc_master; /* Master?! */
int qc_config;
- int qc_pad;
+ int qc_token_timeout;
disk_node_state_t qc_disk_status;
disk_node_state_t qc_status;
run_flag_t qc_flags;
diff --git a/cman/qdisk/main.c b/cman/qdisk/main.c
index eb3ab3c..85a0563 100644
--- a/cman/qdisk/main.c
+++ b/cman/qdisk/main.c
@@ -24,6 +24,7 @@
#include <ccs.h>
#include <liblogthread.h>
#include "score.h"
+#include "../daemon/cman.h"
#include <sys/syslog.h>
#define LOG_DAEMON_NAME "qdiskd"
@@ -1472,6 +1473,7 @@ get_static_config_data(qd_ctx *ctx, int ccsfd)
{
char *val = NULL;
char query[256];
+ int qdisk_fo;
if (ccsfd < 0)
return -1;
@@ -1486,14 +1488,36 @@ get_static_config_data(qd_ctx *ctx, int ccsfd)
if (ctx->qc_interval < 1)
ctx->qc_interval = 1;
}
+
+ snprintf(query, sizeof(query), "/cluster/totem/@token");
+ if (ccs_get(ccsfd, query, &val) == 0) {
+ ctx->qc_token_timeout = atoi(val);
+ free(val);
+ if (ctx->qc_token_timeout < 10000) {
+ logt_print(LOG_DEBUG, "Token timeout %d is too fast "
+ "to use with qdiskd!\n",
+ ctx->qc_token_timeout);
+ }
+ } else {
+ ctx->qc_token_timeout = DEFAULT_TOKEN_TIMEOUT;
+ }
/* Get tko */
snprintf(query, sizeof(query), "/cluster/quorumd/@tko");
if (ccs_get(ccsfd, query, &val) == 0) {
ctx->qc_tko = atoi(val);
free(val);
- if (ctx->qc_tko < 3)
- ctx->qc_tko = 3;
+ } else {
+ ctx->qc_tko = ((ctx->qc_token_timeout / 1000) -
+ ctx->qc_interval) / 2;
+ logt_print(LOG_DEBUG, "Auto-configured TKO as %d based on "
+ "token=%d interval=%d\n", ctx->qc_tko,
+ ctx->qc_token_timeout, ctx->qc_interval);
+ }
+
+ if (ctx->qc_tko < 4) {
+ logt_print(LOG_ERR, "Quorum disk TKO (%d) is too low!\n",
+ ctx->qc_tko);
}
/* Get up-tko (transition off->online) */
@@ -1527,7 +1551,22 @@ get_static_config_data(qd_ctx *ctx, int ccsfd)
}
if (ctx->qc_master_wait <= ctx->qc_tko_up)
ctx->qc_master_wait = ctx->qc_tko_up + 1;
-
+
+ qdisk_fo = ctx->qc_interval * (ctx->qc_master_wait +
+ ctx->qc_upgrade_wait +
+ ctx->qc_tko) * 1000;
+ if (qdisk_fo >= ctx->qc_token_timeout) {
+ logt_print(LOG_WARNING, "Quorum disk timings are too slow for "
+ "configured token timeout\n");
+ logt_print(LOG_WARNING, " * Totem Token timeout: %dms\n",
+ ctx->qc_token_timeout);
+ logt_print(LOG_WARNING, " * Min. Master recovery time: %dms\n",
+ qdisk_fo);
+ logt_print(LOG_WARNING,
+ "Please set token timeout to at least %dms\n",
+ qdisk_fo + (ctx->qc_interval * 1000));
+ }
+
/* Get votes */
snprintf(query, sizeof(query), "/cluster/quorumd/@votes");
if (ccs_get(ccsfd, query, &val) == 0) {