Hi,
In the VDSM code about some SPM operations like HSM.deleteImage(), It is
found that VDSM doesn't check if the operation will be launched on a SPM
host or not. It only checks if the storage pool is already acquired by
one SPM host, but not necessary the same host as the SPM operation is
delivered to. The code is like this:
HSM.deleteImage()
{
...
HSM._spmSchedule()
{
self.validateSPM(spUUID) <--- Only check if the storage pool was
acquired by one host, but not necessary this host
}
...
}
So it really depends on the node management application AKA ovirt-engine
to dispatch the SPM operations to the right VDSM host. And the VDSM host
itself doesn't check if it is the SPM host which can execute the
operation. To me, it is a bit broken. When the engine query the VDSM
host who is the SPM host, it can get the right one. However, the host
may be broken for some reason after the engine believes it is the SPM
host and the host loses the SPM privilege, another host will take the
SPM role. Then the engine continue to send the SPM operations to the
broken host. As a result, the SPM operation will be launched on a
non-SPM host. So I think there is a small window of racing to corrupt
the VDSM hosts meta data. I think VDSM host should check if it is SPM
before the SPM job is scheduled. If the host lost the SPM role already,
It should fail the RPC call from the engine to let the engine to retry
the operation after engine knows the failure of the former call.
--
---
舒明 Shu Ming
Open Virtualization Engineerning; CSTL, IBM Corp.
Tel: 86-10-82451626 Tieline: 9051626 E-mail: shuming(a)cn.ibm.com or shuming(a)linux.vnet.ibm.com
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC