간혹 하이퍼바이저가 NonOperational 상태에 빠지고 Up이 되지 않는 경우가 발생할때가 있는데 vdsm.log에 아래와 같은 Permission denied메시지가 출력된다.
2019-03-06 13:35:06,342+0000 INFO (jsonrpc/0) [storage.StoragePool] updating pool 95f329c1-3361-420a-b04e-d4c81a84adda backend from type NoneType instance 0x7f889a421dc0 to type StoragePoolMemoryBackend instance 0x7f884c0f4838 (sp:157)
2019-03-06 13:35:06,342+0000 INFO (jsonrpc/0) [storage.StoragePool] Connect host #2 to the storage pool 95f329c1-3361-420a-b04e-d4c81a84adda with master domain: 3b47449f-e149-4867-a027-3f5deaebade5 (ver = 1) (sp:688)
2019-03-06 13:35:06,790+0000 INFO (jsonrpc/0) [vdsm.api] FINISH connectStoragePool error=[Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata' from=::ffff:10.12.69.248,58394, flow_id=78dfc5cd, task_id=6b2419ab-eceb-43b7-99b9-d8b96a0764c8 (api:52)
2019-03-06 13:35:06,791+0000 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='6b2419ab-eceb-43b7-99b9-d8b96a0764c8') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "<string>", line 2, in connectStoragePool
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1034, in connectStoragePool
spUUID, hostID, msdUUID, masterVersion, domainsMap)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1096, in _connectStoragePool
res = pool.connect(hostID, msdUUID, masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 700, in connect
self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1274, in __rebuild
self.setMasterDomain(msdUUID, masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1491, in setMasterDomain
domain = sdCache.produce(msdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1652, in findDomain
return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 962, in __init__
manifest = self.manifestClass(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 430, in __init__
metadata = selectMetadata(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 399, in selectMetadata
if len(mdProvider) > 0:
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 67, in __len__
return len(self.keys())
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 112, in keys
return list(self.__iter__())
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 109, in __iter__
self._dict.__iter__())
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 227, in __iter__
with self._accessWrapper():
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 154, in _accessWrapper
self.refresh()
File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 232, in refresh
lines = self._metaRW.readlines()
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 340, in readlines
refresh=self._needs_refresh())
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 376, in _needs_refresh
lv_size = fsutils.size(self.metavol)
File "/usr/lib/python2.7/site-packages/vdsm/storage/fsutils.py", line 30, in size
with io.open(filename, "rb") as f:
IOError: [Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata'
2019-03-06 13:35:06,791+0000 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='6b2419ab-eceb-43b7-99b9-d8b96a0764c8') aborting: Task is aborted: u"[Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata'" - code 100 (task:1181)
2019-03-06 13:35:06,791+0000 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH connectStoragePool error=[Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata' (dispatcher:87)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
return m(self, *a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare
raise self.error
IOError: [Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata'
2019-03-06 13:35:06,791+0000 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call StoragePool.connect failed (error 302) in 0.45 seconds (__init__:312)
IOError: [Errno 13] Permission denied: '/dev/3b47449f-e149-4867-a027-3f5deaebade5/metadata'
부분을 보면 알겠지만 이 내용은 말 그대로 스토리지 메타데이터 저장소에 접근을 할 수 없어서 하이퍼바이저가 NonOperational상태로 빠지게 되는것이다.
원인은 대부분 device장치의 소유권/권한 문제인경우가 대부분이다.
원래 해당 metadata나 기타 다른 디렉터리의 권한이 vdsm.qemu 또는 vdsm.sanlock등의 소유권이 설정되어있어야 정상인데 가끔 문제가 되는 시스템은 해당 디렉토리 소유권한이 root.root 등으로 잘못 되어있는경우에 이런 오류가 발생한다. selinux권한과 관련있는경우가 대부분이다.
해결방법은 selinux의 태깅을 새롭게 라벨링해주는것이다.
$ touch /.autorelabel $ reboot
리부팅하면 selinux tagging을 새롭게 설정해주고 부팅된 후 해당 장치/디렉토리의 권한을 다시 확인해보면 된다.