Hi Rich,

 

One correction in step-4 “recreation of “cn=replica”  entry for the suffix.  As per the example given below,  suffix is “o=USA”

 

-          Recreate the “cn=replica” entry for the suffix as below.

dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config

changetype: add

objectClass: nsds5replica

objectClass: top

nsDS5ReplicaRoot: o=USA

nsDS5ReplicaType: 3

nsDS5Flags: 1

nsDS5ReplicaId: 10  -- Please assign the same “nsDS5ReplicaId value what master was having. In my case, Original master replica ID was 10.

nsds5ReplicaPurgeDelay: 1

nsds5ReplicaTombstonePurgeInterval: -1

cn: replica

 

Regards,

Jyoti

 

From: Das, Jyoti Ranjan (STSD)
Sent: Monday, October 31, 2011 2:38 PM
To: 'Rich Megginson'; General discussion list for the 389 Directory server project.
Subject: RE: [389-users] Data inconsitency during replication

 

Hi Rich,

 

Thanks a lot for your response. Please find the sample reproducer details below. I am not sure about how to log a bug. I will explore and do it.

 

 

Reproducer:

 

 

Step-1:

 

Have a topology like Master replicating to Slave and Slave replication to consumer.

 

Master -> Slave-> Consumer.

 

Step-2:

Make sure that all are on sync at this time. Let’s take an example all are the on sync up to CSN5 (5 records are added to master from CSN1 to CSN5).

 

Step-3:

 

Delete the replication agreement from Master to Slave and also from Slave to consumer.

 

Step-4:

 

Promote the Slave to master.  Promotion steps are given below.

 

-          Delete Supplier DN (cn=suppdn,cn=config) from Slave

-          Delete “cn=replica” entry for the suffix “o=USA” using ldapmodify. As a result, it will delete the changelog file.

Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config

changetype: delete

-          Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below

EX: dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

replace: nsslapd-state

nsslapd-state: backend

 

dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

delete: nsslapd-referral

-          Recreate the “cn=replica” entry for the suffix as below.

dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config

changetype: add

objectClass: nsds5replica

objectClass: top

nsDS5ReplicaRoot: o=SWIFT

nsDS5ReplicaType: 3

nsDS5Flags: 1

nsDS5ReplicaId: 10  -- Please assign the same “nsDS5ReplicaId value what master was having. In my case, Original master replica ID was 10.

nsds5ReplicaPurgeDelay: 1

nsds5ReplicaTombstonePurgeInterval: -1

cn: replica

-          Restart  slapd process. Now Slave become Master.

 

Is there anything am I missing during promotion operation or it’s not the right way to do the promotion operation?

 

Step -5:

 

Add the replication agreement between Slave(newly promoted Master) and Consumer . At this time both Slave and consumer are on sync up to CSN5. During agreement creation please do not initialize the consumer.

 

           Slave(newly promoted as master) - > consumer.

 

Step-6:

 

Add another 5 more entries to Slave which was promoted above as Master. Let’s assume CSN numbers for these 5 entries are from CSN6 to CSN10.

 

Step-7:

 

Now, you will see, among the last 5 entries only last few will gets replicated without halting the replication.

 

 

Regards,

Jyoti

 

 

 

From: Rich Megginson [mailto:rmeggins@redhat.com]
Sent: Friday, October 28, 2011 10:54 PM
To: General discussion list for the 389 Directory server project.
Cc: Das, Jyoti Ranjan (STSD)
Subject: Re: [389-users] Data inconsitency during replication

 

On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:

Hi,

 

I am new to 389 directory server. Could you please help me in the below mentioned query?

Thank you very much in advance.

 

Problem statement:

 

Data loss during the replication between Supplier and consumer when master changelog db file is being deleted due to some reason , consumer is imported with some stale data and consumer doesn’t want initialization during the new replication agreement. The test scenario is given below.

 

Test scenario:

Steps:

Topology

Supplier -----------Replication agreement-----------------> Hub

Both replicas are in sync at this time as mentioned below.   

Let’s take this sample example: Five entries has been added starting from CSN1 to CSN5

Take a db2ldif with “-r” option from the Hub replica.

Add another 5 entries in the supplier. Let’s take their CSN numbers are starting from CSN6 to CSN10

Delete the replication agreements

Before or after CSN6 to CSN10 have been replicated to the Hub?

Delete the master changelog db file from the changelogdb directory.

Supplier or Hub?

Add another 5 entries in the supplier. Let’s take their CSN numbers are staring  from CSN11 to CSN15

Import the ldif file  taken in Step-2 in the Hub replica(  it’s a initialization of consumer with the stale data)

Create the replication agreement between master and hub with the “do not initialize” option.

Now we will see the data loss starting from CSN6 to CSN14. Only entry with CSN15 will be replicated to the consumer and also will continue further with successful replication

 

 

Questions:

Is this a correct approach in this scenario to continue with replication even if there are data losses instead of halting the replication?

From the code analysis:

File: “ ldapserver/ldap/servers/plugins/replication/cl5_api.c”

If the requested CSN number is now found in the changelog db file and also not there in the purge list, it makes the following assumption and continues with replication

 

/* there is a special case which can occur just after migration - in this case,

  the consumer RUV will contain the last state of the supplier before migration,

  but the supplier will have an empty changelog, or the supplier changelog will

  not contain any entries within the consumer min and max CSN - also, since

  the purge RUV contains no CSNs, the changelog has never been purged

  ASSUMPTIONS - it is assumed that the supplier had no pending changes to send

  to any consumers; that is, we can assume that no changes were lost due to

  either changelog purging or database reload - bug# 603061 - richm@netscape.com */

 

                 Is it a correct approach in this scenario to halt the replication with a fatal error message in the error log file?

Probably, but then this code would have to be a lot smarter to figure out that the problem is due to stale data being imported into the consumer.  Please file a bug with exact steps to reproduce this problem.

 

 

Regards,

Jyoti

 

 

 
 
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users