We are using Recovery Services to backup a Centos Linux VM and every two days the Recovery Service Jobs hangs running the "Take Snapshot" sub task with error details "Could not communicate with the VM agent for snapshot status. Snapshot VM sub task timed out". The VM state is "running" but all access paths are blocked (no ssh, http, ..). So we must restart the VM to restore usual services.
On the first night after the restart the Backup Job runs without troubles. On the second night the VM freeze. We are not able to find any error messages in the log files (Linux and Azure).
Here is the last message in extension.log:
2015/06/03 04:04:39 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]2015-06-03 04:04:39.822409 Info doing freeze now...
2015/06/03 04:04:39
Here is the last message in CommandExecution.log:
2015/06/03 04:04:39 Found RuntimeSettings for Microsoft.Azure.RecoveryServices.VMSnapshotLinux V 1.0.1.0
2015/06/03 04:04:39 Spawned main/handle.py -enable PID 36012
Last message in waagent.log:
2015/06/03 04:04:39 AgentBackupLinuxExtension started to handle.
2015/06/03 04:04:39 [AgentBackupLinuxExtension-0.0]cwd is /var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.1.0
2015/06/03 04:04:39 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]Change log file to /var/log/azure/Microsoft.Azure.RecoveryServices.VMSnapshotLinux/1.0.1.0/extension.log
Any input would be welcome.
Thanks