Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Unable to start Startup script in VM instance

Hi 

I am not able to run the startup script in my VM instance. First I added the script in instance metadata then restarted the instance and got below error.

 

 

 

`
 systemd[1]: google-startup-scripts.service: Main process exited, code=exited, 
 status=1/FAILURE
 systemd[1]: google-startup-scripts.service: Failed with result 'exit-code'.
 systemd[1]: Failed to start Google Compute Engine Startup Scripts.
 systemd[1]: Startup finished in 6.527s (kernel) + 59.028s (userspace) = 1min 5.555s.
`

 

 

And my script is

 

 

#! /bin/bash 
apt-get remove --auto-remove sshguard 
apt-get purge --auto-remove sshguard

 

 

Note : I am not able to connect with my instance through SSH . I was trying to debug the issue by removing 'sshguard' with startup script.

Thanks and regards

Solved Solved
0 3 5,142
1 ACCEPTED SOLUTION

Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).

Steps followed to resolve the issue

  • Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.

  • Stop the problematic instance and create a copy of its boot disk.

    • Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.

      export PROB_INSTANCE_NAME=VM_NAME
    • Stop the problematic instance

      gcloud compute instances stop "$PROB_INSTANCE_NAME"
    • Get the name of the boot disk for the problem instance.

      export PROB_INSTANCE_DISK="$(gcloud compute instances describe \
      "$PROB_INSTANCE_NAME" --format='json' |  jq -r \
      '.disks[] | select(.boot == true) | .source')"
    • Create a snapshot of the boot disk.

      export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot"
      
      gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \
         --snapshot-names "$DISK_SNAPSHOT"
    • Create a new disk from the snapshot.

      export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk"
      
      gcloud compute disks create "$NEW_DISK" \
         --source-snapshot="$DISK_SNAPSHOT"
    • Delete the snapshot:

      gcloud compute snapshots delete "$DISK_SNAPSHOT"
  • Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.

    gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
  • Connect to the rescue instance using SSH

    gcloud compute ssh rescue
  • Run the following steps on the rescue instance.

    • Mount the root volume of the new disk.

      export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol"
      DEV="/dev/sdb1"
      sudo mkdir "$NEW_DISK_MOUNT_POINT"
      sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"
    • For chrooting run the additional commands

      sudo mount -t sysfs none /tmp/sdb-root-vol/sys;
      sudo mount -t proc none /tmp/sdb-root-vol/proc;
      sudo mount --bind /dev/ /tmp/sdb-root-vol/dev;
      sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts;
      sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;
    • Change root

      sudo chroot /tmp/sdb-root-vol
    • Now root has been changed to the mounted disk

    • Intall the dependencies and remove sshguard

      sudo apt install python3-distro
      sudo apt install python3-netifaces
      sudo apt-get remove --auto-remove sshguard 
      sudo apt-get purge --auto-remove sshguard
    • Exit chroot

    • Unmount disk

    •       sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"
  • Disconnect the attached disk from the instance

  • Attach this disk as a bootable disk to the problematic instance

  • Try to connect with the system through SSH.

    • At this time I was able to connect with the instance without any issue.
  • Delete the temporary instance 'rescue' and snapshots.

Referances

[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery

View solution in original post

3 REPLIES 3

1st - Try to run the command as sudo.
2nd - Instead of writing the script on Metadata section, write it on Automation section.
Good luck

Finally resolved the Issue.
When inspected the server logs, found some dependencies were corrupted/missing ( 'python3-distro' and 'python3-netifaces' ). A new service called 'sshguard' was also running in the instance.This might be another reason preventing the SSH connections. Startup scripts were not executing aswell .(Probably due to the missing python modules).

Steps followed to resolve the issue

  • Create a new instance to serve as the rescue instance. Name this instance rescue. This rescue instance does not need to run the same Linux OS as the problematic instance. This example uses Debian 9 on the rescue instance.

  • Stop the problematic instance and create a copy of its boot disk.

    • Set a variable name for the problematic instance. This makes it easier to reference the instance in later steps.

      export PROB_INSTANCE_NAME=VM_NAME
    • Stop the problematic instance

      gcloud compute instances stop "$PROB_INSTANCE_NAME"
    • Get the name of the boot disk for the problem instance.

      export PROB_INSTANCE_DISK="$(gcloud compute instances describe \
      "$PROB_INSTANCE_NAME" --format='json' |  jq -r \
      '.disks[] | select(.boot == true) | .source')"
    • Create a snapshot of the boot disk.

      export DISK_SNAPSHOT="${PROB_INSTANCE_NAME}-snapshot"
      
      gcloud compute disks snapshot "$PROB_INSTANCE_DISK" \
         --snapshot-names "$DISK_SNAPSHOT"
    • Create a new disk from the snapshot.

      export NEW_DISK="${PROB_INSTANCE_NAME}-new-disk"
      
      gcloud compute disks create "$NEW_DISK" \
         --source-snapshot="$DISK_SNAPSHOT"
    • Delete the snapshot:

      gcloud compute snapshots delete "$DISK_SNAPSHOT"
  • Attach the new disk to the rescue instance and mount the root volume for the rescue instance. Because this procedure only attaches one additional disk, the device identifier of the new disk is /dev/sdb. Ubuntu labels its root volume 1 by default, so the volume identifier should be /dev/sdb1. For custom cases, use lsblk to determine the volume identifier.

    gcloud compute instances attach-disk rescue --disk "$NEW_DISK"
  • Connect to the rescue instance using SSH

    gcloud compute ssh rescue
  • Run the following steps on the rescue instance.

    • Mount the root volume of the new disk.

      export NEW_DISK_MOUNT_POINT="/tmp/sdb-root-vol"
      DEV="/dev/sdb1"
      sudo mkdir "$NEW_DISK_MOUNT_POINT"
      sudo mount "$DEV" "$NEW_DISK_MOUNT_POINT"
    • For chrooting run the additional commands

      sudo mount -t sysfs none /tmp/sdb-root-vol/sys;
      sudo mount -t proc none /tmp/sdb-root-vol/proc;
      sudo mount --bind /dev/ /tmp/sdb-root-vol/dev;
      sudo mount --bind /dev/pts /tmp/sdb-root-vol/dev/pts;
      sudo mount -o bind /etc/resolv.conf /tmp/sdb-root-vol/etc/resolv.conf;
    • Change root

      sudo chroot /tmp/sdb-root-vol
    • Now root has been changed to the mounted disk

    • Intall the dependencies and remove sshguard

      sudo apt install python3-distro
      sudo apt install python3-netifaces
      sudo apt-get remove --auto-remove sshguard 
      sudo apt-get purge --auto-remove sshguard
    • Exit chroot

    • Unmount disk

    •       sudo umount "$NEW_DISK_MOUNT_POINT" && sudo rmdir "$NEW_DISK_MOUNT_POINT"
  • Disconnect the attached disk from the instance

  • Attach this disk as a bootable disk to the problematic instance

  • Try to connect with the system through SSH.

    • At this time I was able to connect with the instance without any issue.
  • Delete the temporary instance 'rescue' and snapshots.

Referances

[1] https://cloud.google.com/compute/docs/images/install-guest-environment#ubuntu_1
[2] https://cloud.google.com/compute/docs/images/install-guest-environment#update-guest
[3] https://stackoverflow.com/questions/56652200/modulenotfounderror-no-module-named-distro
[4] https://stackoverflow.com/questions/19332554/importerror-no-module-named-netifaces
[5] https://cloud.google.com/compute/docs/disks/snapshot-best-practices
[6] https://cloud.google.com/architecture/disaster-recovery

Hi @rohithkp ,
Great to know that you were able to solve the issue!
Thank you for sharing the detailed steps. You helped me and the community learn about the workflow for a practical use case for disk snapshots.