Solved: Re: Network failure changing network from legacy t...

jmcausing · 12-24-2022 05:16 PM

Hi and good day! I need some assistance to please how to troubleshoot this networking issue.

Please note that this is only happening in Ubuntu Bionic & Focal. No network issue if I use Debian image.

There is no network/no IP after changing the legacy network to a shared VPC.

root@instance-1-eng-2819:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 42:01:0a:80:00:2c brd ff:ff:ff:ff:ff:ff
    altname enp0s4
root@instance-1-eng-2819:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:80:00:2c brd ff:ff:ff:ff:ff:ff
    altname enp0s4

I got these from Serial port 1 (console)

oot@instance-1-eng-2819:~# Dec 25 01:02:53 instance-1-eng-2819 OSConfigAgent[4112]: 2022-12-25T01:02:53.4655Z OSConfigAgent Critical main.go:100: Error parsing metadata, agent cannot start: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=0&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Dec 25 01:02:53 instance-1-eng-2819 systemd[1]: google-osconfig-agent.service: Main process exited, code=exited, status=1/FAILURE
Dec 25 01:02:53 instance-1-eng-2819 systemd[1]: google-osconfig-agent.service: Failed with result 'exit-code'.
Dec 25 01:02:54 instance-1-eng-2819 systemd[1]: google-osconfig-agent.service: Scheduled restart job, restart counter is at 76.
Dec 25 01:02:54 instance-1-eng-2819 systemd[1]: Stopped Google OSConfig Agent.
Dec 25 01:02:54 instance-1-eng-2819 systemd[1]: Started Google OSConfig Agent.
Dec 25 01:03:39 instance-1-eng-2819 systemd[1]: google-guest-agent.service: State 'stop-sigterm' timed out. Killing.

I also saw a Permission Denied from the logs:

Dec 24 23:45:19 instance-1-eng-2819 dhclient[419]: execve (/bin/true, ...): Permission denied
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: Listening on LPF/ens4/42:01:0a:80:00:2c
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: Sending on   LPF/ens4/42:01:0a:80:00:2c
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: Sending on   Socket/fallback
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 3 (xid=0xbe742848)
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: DHCPOFFER of 10.128.0.44 from 169.254.169.254
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: DHCPREQUEST for 10.128.0.44 on ens4 to 255.255.255.255 port 67 (xid=0x482874be)
Dec 24 23:45:19 instance-1-eng-2819 dhclient[415]: DHCPACK of 10.128.0.44 from 169.254.169.254 (xid=0xbe742848)
Dec 24 23:45:19 instance-1-eng-2819 dhclient[420]: execve (/bin/true, ...): Permission denied

So I think the network failure occurred because it can't communicate to gcp metadata 169.254.169.254.

Is this an Ubuntu-specific issue because it works fine using Debian's image?

UPDATE:

It turns out that after changing the network in GCP, it didn't get/update the new MAC address.

Current MAC addr if ens4

ip a | grep link/ether
    link/ether 42:01:0a:80:00:2c brd ff:ff:ff:ff:ff:ff

Netplan mac addr:

cat /etc/netplan/50-cloud-init.yaml | grep mac 
2 macaddress: 42:01:0a:f0:02:42

After matching (changed netplan mac addr) the mac addr, I got the IP and network up.

ip a | grep ens4
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    inet 10.128.0.44/32 scope global dynamic ens4
root@instance-1-eng-2819:~# ping google.com
PING google.com (142.251.161.100) 56(84) bytes of data.
64 bytes from ig-in-f100.1e100.net (142.251.161.100): icmp_seq=1 ttl=109 time=1.83 ms
64 bytes from ig-in-f100.1e100.net (142.251.161.100): icmp_seq=2 ttl=109 time=1.24 ms

So it looks like the issue now is:

cloud-init getting the wrong MAC addr after changing the network VPC.

jmcausing

I found a fix. running cloud-init clean before updating the network VPC fixed it.

View solution in original post

jmcausing

I found a fix. running cloud-init clean before updating the network VPC fixed it.

celestialsahil

iam unable to ssh to the vm instance, can you please help explaining where the cloud-init clean command should be run, and what process is happening while i run the cloud-init clean command.

Network failure changing network from legacy to shared VPC (only happening in Ubuntu Bionic & Focal)