Referances

rohithkp · 06-14-2022 08:13 AM

Hi

I am not able to run the startup script in my VM instance. First I added the script in instance metadata then restarted the instance and got below error.

`
 systemd[1]: google-startup-scripts.service: Main process exited, code=exited, 
 status=1/FAILURE
 systemd[1]: google-startup-scripts.service: Failed with result 'exit-code'.
 systemd[1]: Failed to start Google Compute Engine Startup Scripts.
 systemd[1]: Startup finished in 6.527s (kernel) + 59.028s (userspace) = 1min 5.555s.
`

And my script is

#! /bin/bash 
apt-get remove --auto-remove sshguard 
apt-get purge --auto-remove sshguard

Note : I am not able to connect with my instance through SSH . I was trying to debug the issue by removing 'sshguard' with startup script.

Thanks and regards

rohithkp

Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).

Steps followed to resolve the issue

Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.

Stop the problematic instance and create a copy of its boot disk.

Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.
```
export PROB_INSTANCE_NAME=VM_NAME
```

Stop the problematic instance

gcloud compute instances stop "$PROB_INSTANCE_NAME"

Get the name of the boot disk for the problem instance.

export PROB_INSTANCE_DISK="$(gcloud compute instances describe \
"$PROB_INSTANCE_NAME" --format='json' |  jq -r \
'.disks[] | select(.boot == true) | .source')"

Create a snapshot of the boot disk.

export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot"

gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \
   --snapshot-names "$DISK_SNAPSHOT"

Create a new disk from the snapshot.

export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk"

gcloud compute disks create "$NEW_DISK" \
   --source-snapshot="$DISK_SNAPSHOT"

Delete the snapshot:

gcloud compute snapshots delete "$DISK_SNAPSHOT"

Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.
```
gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
```
Connect to the rescue instance using SSH
```
gcloud compute ssh rescue
```

Run the following steps on the rescue instance.

Mount the root volume of the new disk.

export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol"
DEV="/dev/sdb1"
sudo mkdir "$NEW_DISK_MOUNT_POINT"
sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"

For chrooting run the additional commands

sudo mount -t sysfs none /tmp/sdb-root-vol/sys;
sudo mount -t proc none /tmp/sdb-root-vol/proc;
sudo mount --bind /dev/ /tmp/sdb-root-vol/dev;
sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts;
sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;

Change root
```
sudo chroot /tmp/sdb-root-vol
```
Now root has been changed to the mounted disk

Intall the dependencies and remove sshguard

sudo apt install python3-distro
sudo apt install python3-netifaces
sudo apt-get remove --auto-remove sshguard 
sudo apt-get purge --auto-remove sshguard

Exit chroot
Unmount disk

      sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"

Disconnect the attached disk from the instance
Attach this disk as a bootable disk to the problematic instance
Try to connect with the system through SSH.
- At this time I was able to connect with the instance without any issue.
Delete the temporary instance 'rescue' and snapshots.

Referances

[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery

View solution in original post

paullexandre51

1st - Try to run the command as sudo.
2nd - Instead of writing the script on Metadata section, write it on Automation section.
Good luck

rohithkp

Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).

Steps followed to resolve the issue

Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.

Stop the problematic instance and create a copy of its boot disk.

Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.
```
export PROB_INSTANCE_NAME=VM_NAME
```

Stop the problematic instance

gcloud compute instances stop "$PROB_INSTANCE_NAME"

Get the name of the boot disk for the problem instance.

export PROB_INSTANCE_DISK="$(gcloud compute instances describe \
"$PROB_INSTANCE_NAME" --format='json' |  jq -r \
'.disks[] | select(.boot == true) | .source')"

Create a snapshot of the boot disk.

export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot"

gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \
   --snapshot-names "$DISK_SNAPSHOT"

Create a new disk from the snapshot.

export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk"

gcloud compute disks create "$NEW_DISK" \
   --source-snapshot="$DISK_SNAPSHOT"

Delete the snapshot:

gcloud compute snapshots delete "$DISK_SNAPSHOT"

Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.
```
gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
```
Connect to the rescue instance using SSH
```
gcloud compute ssh rescue
```

Run the following steps on the rescue instance.

Mount the root volume of the new disk.

export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol"
DEV="/dev/sdb1"
sudo mkdir "$NEW_DISK_MOUNT_POINT"
sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"

For chrooting run the additional commands

sudo mount -t sysfs none /tmp/sdb-root-vol/sys;
sudo mount -t proc none /tmp/sdb-root-vol/proc;
sudo mount --bind /dev/ /tmp/sdb-root-vol/dev;
sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts;
sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;

Change root
```
sudo chroot /tmp/sdb-root-vol
```
Now root has been changed to the mounted disk

Intall the dependencies and remove sshguard

sudo apt install python3-distro
sudo apt install python3-netifaces
sudo apt-get remove --auto-remove sshguard 
sudo apt-get purge --auto-remove sshguard

Exit chroot
Unmount disk

      sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"

Disconnect the attached disk from the instance
Attach this disk as a bootable disk to the problematic instance
Try to connect with the system through SSH.
- At this time I was able to connect with the instance without any issue.
Delete the temporary instance 'rescue' and snapshots.

Referances

[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery

kumards

Hi @rohithkp ,
Great to know that you were able to solve the issue!
Thank you for sharing the detailed steps. You helped me and the community learn about the workflow for a practical use case for disk snapshots.

Unable to start Startup script in VM instance

Steps followed to resolve the issue

Referances

Steps followed to resolve the issue

Referances