Hi
I am not able to run the startup script in my VM instance. First I added the script in instance metadata then restarted the instance and got below error.
`
systemd[1]: google-startup-scripts.service: Main process exited, code=exited,
status=1/FAILURE
systemd[1]: google-startup-scripts.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Google Compute Engine Startup Scripts.
systemd[1]: Startup finished in 6.527s (kernel) + 59.028s (userspace) = 1min 5.555s.
`
And my script is
#! /bin/bash
apt-get remove --auto-remove sshguard
apt-get purge --auto-remove sshguard
Note : I am not able to connect with my instance through SSH . I was trying to debug the issue by removing 'sshguard' with startup script.
Thanks and regards
Solved! Go to Solution.
Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).
Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.
Stop the problematic instance and create a copy of its boot disk.
Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.
export PROB_INSTANCE_NAME=VM_NAME
Stop the problematic instance
gcloud compute instances stop "$PROB_INSTANCE_NAME"
Get the name of the boot disk for the problem instance.
export PROB_INSTANCE_DISK="$(gcloud compute instances describe \ "$PROB_INSTANCE_NAME" --format='json' | jq -r \ '.disks[] | select(.boot == true) | .source')"
Create a snapshot of the boot disk.
export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot" gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \ --snapshot-names "$DISK_SNAPSHOT"
Create a new disk from the snapshot.
export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk" gcloud compute disks create "$NEW_DISK" \ --source-snapshot="$DISK_SNAPSHOT"
Delete the snapshot:
gcloud compute snapshots delete "$DISK_SNAPSHOT"
Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.
gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
Connect to the rescue instance using SSH
gcloud compute ssh rescue
Run the following steps on the rescue instance.
Mount the root volume of the new disk.
export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol" DEV="/dev/sdb1" sudo mkdir "$NEW_DISK_MOUNT_POINT" sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"
For chrooting run the additional commands
sudo mount -t sysfs none /tmp/sdb-root-vol/sys; sudo mount -t proc none /tmp/sdb-root-vol/proc; sudo mount --bind /dev/ /tmp/sdb-root-vol/dev; sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts; sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;
Change root
sudo chroot /tmp/sdb-root-vol
Now root has been changed to the mounted disk
Intall the dependencies and remove sshguard
sudo apt install python3-distro sudo apt install python3-netifaces sudo apt-get remove --auto-remove sshguard sudo apt-get purge --auto-remove sshguard
Exit chroot
Unmount disk
sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"
Disconnect the attached disk from the instance
Attach this disk as a bootable disk to the problematic instance
Try to connect with the system through SSH.
Delete the temporary instance 'rescue' and snapshots.
[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery
1st - Try to run the command as sudo.
2nd - Instead of writing the script on Metadata section, write it on Automation section.
Good luck
Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).
Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.
Stop the problematic instance and create a copy of its boot disk.
Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.
export PROB_INSTANCE_NAME=VM_NAME
Stop the problematic instance
gcloud compute instances stop "$PROB_INSTANCE_NAME"
Get the name of the boot disk for the problem instance.
export PROB_INSTANCE_DISK="$(gcloud compute instances describe \ "$PROB_INSTANCE_NAME" --format='json' | jq -r \ '.disks[] | select(.boot == true) | .source')"
Create a snapshot of the boot disk.
export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot" gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \ --snapshot-names "$DISK_SNAPSHOT"
Create a new disk from the snapshot.
export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk" gcloud compute disks create "$NEW_DISK" \ --source-snapshot="$DISK_SNAPSHOT"
Delete the snapshot:
gcloud compute snapshots delete "$DISK_SNAPSHOT"
Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.
gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
Connect to the rescue instance using SSH
gcloud compute ssh rescue
Run the following steps on the rescue instance.
Mount the root volume of the new disk.
export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol" DEV="/dev/sdb1" sudo mkdir "$NEW_DISK_MOUNT_POINT" sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"
For chrooting run the additional commands
sudo mount -t sysfs none /tmp/sdb-root-vol/sys; sudo mount -t proc none /tmp/sdb-root-vol/proc; sudo mount --bind /dev/ /tmp/sdb-root-vol/dev; sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts; sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;
Change root
sudo chroot /tmp/sdb-root-vol
Now root has been changed to the mounted disk
Intall the dependencies and remove sshguard
sudo apt install python3-distro sudo apt install python3-netifaces sudo apt-get remove --auto-remove sshguard sudo apt-get purge --auto-remove sshguard
Exit chroot
Unmount disk
sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"
Disconnect the attached disk from the instance
Attach this disk as a bootable disk to the problematic instance
Try to connect with the system through SSH.
Delete the temporary instance 'rescue' and snapshots.
[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery
Hi @rohithkp ,
Great to know that you were able to solve the issue!
Thank you for sharing the detailed steps. You helped me and the community learn about the workflow for a practical use case for disk snapshots.