8 Replies Latest reply on Feb 25, 2016 11:31 PM by jcuzella

    Edison lost it's IP, seems to be IPv6 only now

    MarkOg

      I received two Edisons today. Both seemed to be in perfect working order, flashing (Yocto 05-15, the most recent one) and configuration went fine and I could even connect over SSH to blink some LEDs.

       

      However, I lost contact with one of my Edisons. One is called 'eddy1', the other 'eddy2'. After connecting with the lost one over USB (well, tty over 115200), the following showed up.

       

      CGqgvTv.jpg

       

      As far as I know, the Edison isn't even IPv6 capable. Why did eddy1 lose it's IPv4 address and how do I get it back?

      Ofcourse, I could re-flash the whole thing, but I'd prefer to find a better alternative. Flashing is always a last resort.

        • 1. Re: Edison lost it's IP, seems to be IPv6 only now
          PalmerCliff

          Hi MarkOg, I'd recommend reflashing.  It has a great chance of fixing any problems that occurred, and it's certainly going to get you up and running faster.

          HTH

          Cliff

          • 2. Re: Edison lost it's IP, seems to be IPv6 only now
            MarkOg

            Hi Cliff,

             

            There wasn't much data on the modules now, so reflashing is a possible solution (but as said, only as last resort). If this would happen again while there's much more information on the modules, reflashing may very well not be an acceptable solution. The problem has a cause, which either must be fixed or prevented.

            • 3. Re: Edison lost it's IP, seems to be IPv6 only now
              massca

              Hi MarkOg,

              I had the same problem. I'm not sure, but it seems to me a bug in this build related to the length of the Edison name. I solved the problem re-flashing and using a longer name, at least 8 chars, although I didn't do many test on it.

               

              Before flash you can also try a new configure with a longer name.

               

              -max

              • 4. Re: Edison lost it's IP, seems to be IPv6 only now
                MarkOg

                After reflashing, the edison required a password of at least 8 characters. This requirement wasn't there in the past. The names haven't changed, but after this discovery I decided to reflash the one without problems as well. Now they both require (and have) a password of at least 8 characters.

                 

                They haven't lost their IP yet, approaching 24h uptime.

                • 5. Re: Edison lost it's IP, seems to be IPv6 only now
                  MarkOg

                  Every once in a while, it still loses it's IP. It can take an hour, or two, or way over 5 days.

                  I'm not sure what's triggering it, but I think either the wireless chip is unstable or the OS isn't handling it the way it should.

                   

                  Most of the times logging in over USB and issuing a reboot will do the trick. However, the first time and this morning, a simple reboot wasn't enough and a re-flash was required.

                  • 6. Re: Edison lost it's IP, seems to be IPv6 only now
                    jcuzella

                    Not to just add a "me too", but I can verify that you're not the only one experiencing this bug.  It's been happening off & on for a couple days now on the current Edison project I'm working on. Usually a reboot via /sbin/reboot or systemctl halt and then unplug/plug cycle will fix it for a period of time.  Until it inevitably happens again!

                     

                    After looking at what's going on, I suspect that this has something to do with the interaction of these services:


                    • wpa_supplicant_wlan0_event.service : Runs wpa_cli in "action_file" daemon mode.  Executes actions from wpa_supplicant.service by passing events to a script (in our case: /etc/wpa_supplicant/wpa_cli-actions.sh )
                    • wpa_supplicant.service : The main wpa_supplicant daemon that's actually associating with the AP.  Reads config from: /etc/wpa_supplicant/wpa_supplicant.conf  (`configure_edison --wifi` also outputs to this file in order to configure the WiFi connection )

                     

                    In my wpa_supplicant logs (via: systemctl status wpa_supplicant.service), I see that it has been connecting & disconnecting over a short time period (actual BSSID, SSID, and freq are scrubbed for anonymity):


                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: WPA: Key negotiation completed with 12:34:56:78:9a:bc [PTK=CCMP GTK=TKIP]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-CONNECTED - Connection to 12:34:56:78:9a:bc completed (auth) [id=0 id_str=]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-DISCONNECTED bssid=12:34:56:78:9a:bc reason=0

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Trying to associate with 12:34:56:78:9a:bc (SSID='Example_AP' freq=1234 MHz)

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Associated with 12:34:56:78:9a:bc

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: WPA: Key negotiation completed with 12:34:56:78:9a:bc [PTK=CCMP GTK=TKIP]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-CONNECTED - Connection to 12:34:56:78:9a:bc completed (auth) [id=0 id_str=]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-DISCONNECTED bssid=12:34:56:78:9a:bc reason=0

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Trying to associate with 12:34:56:78:9a:bc (SSID='Example_AP' freq=5123 MHz)

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Associated with 12:34:56:78:9a:bc

                     

                    When I look at this log data from JournalD, it has very many of this same ( DISCONNECT, CONNECT ) pattern repeated over and over... (..and over 11 times in a single second!):

                     

                    $ journalctl --since='2016-02-26 04:40:08' --until='2016-02-26 04:40:10' --unit=wpa_supplicant.service --no-pager | wc -l

                    58

                    $ journalctl --since='2016-02-26 04:40:08' --until='2016-02-26 04:40:10' --unit=wpa_supplicant.service --no-pager | grep -c CTRL-EVENT-DISCONNECTED

                    11

                    $ journalctl --since='2016-02-26 04:40:08' --until='2016-02-26 04:40:10' --unit=wpa_supplicant.service --no-pager | grep -c CTRL-EVENT-CONNECTED

                    11

                    $ journalctl --since='2016-02-26 04:40:08' --until='2016-02-26 04:40:10' --unit=wpa_supplicant.service --no-pager | head

                    -- Logs begin at Sat 2000-01-01 00:00:11 UTC, end at Fri 2016-02-26 05:02:17 UTC. --

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Trying to associate with 12:34:56:78:9a:bc (SSID='Example_AP' freq=5123 MHz)

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Associated with 12:34:56:78:9a:bc

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: WPA: Key negotiation completed with 12:34:56:78:9a:bc [PTK=CCMP GTK=TKIP]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-CONNECTED - Connection to 12:34:56:78:9a:bc completed (auth) [id=0 id_str=]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-DISCONNECTED bssid=12:34:56:78:9a:bc reason=0

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Trying to associate with 12:34:56:78:9a:bc (SSID='Example_AP' freq=5123 MHz)

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: Associated with 12:34:56:78:9a:bc

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: WPA: Key negotiation completed with 12:34:56:78:9a:bc [PTK=CCMP GTK=TKIP]

                    Feb 26 04:40:09 my-edison wpa_supplicant[188]: wlan0: CTRL-EVENT-CONNECTED - Connection to 12:34:56:78:9a:bc completed (auth) [id=0 id_str=]

                     

                    In the wpa_supplicant_wlan0_event.service logs around the same time, I see:

                     

                    $ journalctl --since='-20m' --until='2016-02-26 04:40:10' --unit=wpa_supplicant_wlan0_event.service

                    Feb 26 04:39:59 my-edison udhcpc[908]: Sending discover...

                    Feb 26 04:39:59 my-edison wpa_cli[286]: Sending discover...

                    Feb 26 04:40:02 my-edison udhcpc[908]: Sending discover...

                    Feb 26 04:40:02 my-edison wpa_cli[286]: Sending discover...

                    Feb 26 04:40:05 my-edison udhcpc[908]: Sending discover...

                    Feb 26 04:40:05 my-edison wpa_cli[286]: Sending discover...

                    ( ... Log output ends here for some reason ... )


                     

                    The action_file script has this for each event:

                     

                    if [ "$CMD" = "CONNECTED" ]; then

                        kill_daemon udhcpc /var/run/udhcpc-$IFNAME.pid

                        udhcpc -i $IFNAME -p /var/run/udhcpc-$IFNAME.pid -S

                    fi

                     

                     

                    if [ "$CMD" = "DISCONNECTED" ]; then

                        kill_daemon udhcpc /var/run/udhcpc-$IFNAME.pid

                        ifconfig $IFNAME 0.0.0.0

                    fi

                     

                    Watching the process list and journalctl logs later on, I see that it's executing the script (/etc/wpa_supplicant/wpa_cli-actions.sh) for CONNECTED and DISCONNECTED events (again, sensitive info has been anonymized):

                     

                    Feb 26 05:01:12 my-edison wpa_cli[286]: event CONNECTED received from wpa_supplicant

                    Feb 26 05:01:12 my-edison udhcpc[1645]: udhcpc (v1.22.1) started

                    Feb 26 05:01:12 my-edison wpa_cli[286]: udhcpc (v1.22.1) started

                    Feb 26 05:01:12 my-edison udhcpc[1645]: Sending discover...

                    Feb 26 05:01:12 my-edison wpa_cli[286]: Sending discover...

                    Feb 26 05:01:12 my-edison udhcpc[1645]: Sending select for 10.1.2.3...

                    Feb 26 05:01:12 my-edison wpa_cli[286]: Sending select for 10.1.2.3...

                    Feb 26 05:01:12 my-edison udhcpc[1645]: Lease of 10.1.2.3 obtained, lease time 86400

                    Feb 26 05:01:12 my-edison wpa_cli[286]: Lease of 10.1.2.3 obtained, lease time 86400

                    Feb 26 05:01:12 my-edison systemd-timesyncd[174]: Network configuration changed, trying to establish connection.

                    Feb 26 05:01:12 my-edison wpa_cli[286]: /etc/udhcpc.d/50default: Adding DNS 10.1.2.1

                    Feb 26 05:01:12 my-edison systemd-timesyncd[174]: Using NTP server 216.239.38.15:123 (time4.google.com).

                    Feb 26 05:01:13 my-edison systemd-timesyncd[174]: interval/delta/delay/jitter/drift 64s/-0.023s/0.033s/0.221s/+320ppm

                    Feb 26 05:01:13 my-edison wpa_cli[286]: event DISCONNECTED received from wpa_supplicant

                    Feb 26 05:01:13 my-edison wpa_cli[286]: 1659 root      2656 S    udhcpc -i wlan0 -p /var/run/udhcpc-wlan0.pid -S

                    Feb 26 05:01:13 my-edison udhcpc[1659]: Received SIGTERM

                    Feb 26 05:01:13 my-edison wpa_cli[286]: rm: can't remove '/var/run/udhcpc-wlan0.pid': No such file or directory

                    Feb 26 05:01:13 my-edison wpa_cli[286]: event CONNECTED received from wpa_supplicant

                    Feb 26 05:01:13 my-edison udhcpc[1670]: udhcpc (v1.22.1) started

                    Feb 26 05:01:13 my-edison wpa_cli[286]: udhcpc (v1.22.1) started

                    Feb 26 05:01:13 my-edison udhcpc[1670]: Sending discover...

                    Feb 26 05:01:13 my-edison wpa_cli[286]: Sending discover...

                    [...SNIP...]

                    [... Then a couple minutes later on ...]

                    Feb 26 05:01:15 my-edison udhcpc[1767]: Sending select for 192.168.1.132...

                    Feb 26 05:01:15 my-edison wpa_cli[286]: Sending select for 192.168.1.132...

                    Feb 26 05:01:15 my-edison udhcpc[1767]: Lease of 192.168.1.132 obtained, lease time 86400

                    Feb 26 05:01:15 my-edison wpa_cli[286]: Lease of 192.168.1.132 obtained, lease time 86400

                    Feb 26 05:01:15 my-edison wpa_cli[286]: /etc/udhcpc.d/50default: Adding DNS 192.168.1.1

                    Feb 26 05:01:15 my-edison wpa_cli[286]: event DISCONNECTED received from wpa_supplicant

                    Feb 26 05:01:15 my-edison wpa_cli[286]: 1781 root      2656 S    udhcpc -i wlan0 -p /var/run/udhcpc-wlan0.pid -S

                    Feb 26 05:01:15 my-edison udhcpc[1781]: Received SIGTERM

                    Feb 26 05:01:15 my-edison wpa_cli[286]: rm: can't remove '/var/run/udhcpc-wlan0.pid': No such file or directory

                    Feb 26 05:02:17 my-edison systemd-timesyncd[174]: Using NTP server [2001:4860:4802:38::f]:123 (time4.google.com).

                     

                    So, it appears that this is happening usually rather slow... but in my case it is triggering the events very fast.  I suspect that when it's doing this and trying to start & stop udhcpc very many times quickly in a row, it's overloading the edison (however I have no concrete proof yet... just a theory).  Later on it seems to recover shortly and then ends up killing udhcpc with SIGTERM. It then stays in this state.


                    So the problem is twofold:


                    1. WiFi is for some reason disconnecting & re-connecting very fast
                    2. Events to start & kill udhcpc are being triggered very fast

                     

                    This causes the Edison to eventually usually end up in a state where it has re-associated with the WiFi AP, and then either only ends up with IPv6 address and no IPv4, OR no IP addresses at all, as in this example:

                     

                    $ iwconfig wlan0

                    wlan0     IEEE 802.11abgn  ESSID:"Example_AP"

                              Mode:Managed  Frequency:5.123 GHz  Access Point: 12:34:56:78:9a:bc

                              Bit Rate=150 Mb/s   Tx-Power=31 dBm

                              Retry  long limit:7   RTS thr:off   Fragment thr:off

                              Encryption key:off

                              Power Management:on

                              Link Quality=70/70  Signal level=-40 dBm

                              Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0

                              Tx excessive retries:15875  Invalid misc:331   Missed beacon:0

                    $ ifconfig wlan0

                    wlan0     Link encap:Ethernet  HWaddr ab:cd:ef:00:01:02

                              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

                              RX packets:1093 errors:0 dropped:0 overruns:0 frame:0

                              TX packets:1680 errors:0 dropped:0 overruns:0 carrier:0

                              collisions:0 txqueuelen:1000

                              RX bytes:251015 (245.1 KiB)  TX bytes:373905 (365.1 KiB)

                     

                    So two things can hopefully be fixed:

                     

                    1. wpa_supplicant needs to stop disconnecting & re-connecting so fast (or at least just stop disconnecting immediately after it reconnects the first time!)
                    2. Event script could be modified to rate-limit how fast it kills & restarts udhcpc

                     

                    So far I've only had the time to diagnose it this far... no real solution yet.   Maybe some Intel Engineers are watching this thread & can chime in?

                    • 7. Re: Edison lost it's IP, seems to be IPv6 only now
                      jcuzella

                      I should also add a P.S. to any Intel folks: any help debugging this would be awesome!

                      • 8. Re: Edison lost it's IP, seems to be IPv6 only now
                        jcuzella

                        I have just been successful in recovering the Edison from this state WITHOUT a reboot!

                        First, Over the USB serial console, I connect to the Edison:

                         

                        $ EDISON_SERIAL_DEVICE=$(find /dev -iname 'cu.usbserial-*' 2>/dev/null | head -n1)

                        $ screen -L $EDISON_SERIAL_DEVICE 115200

                         

                        Then, I stop the CONNECT / DISCONNECT event handling service:

                         

                         

                        $ systemctl stop wpa_supplicant_wlan0_event.service ; journalctl -xn --unit=wpa_supplicant_wlan0_event.service -f

                         

                        Feb 26 05:01:15 my-edison wpa_cli[286]: rm: can't remove '/var/run/udhcpc-wlan0.pid': No such file or directory

                        Feb 26 05:59:47 my-edison systemd[1]: Stopping Daemon to receive the wpa_supplicant event...

                        Feb 26 05:59:49 my-edison wpa_cli[286]: eloop: could not process SIGINT or SIGTERM in two seconds. Looks like there

                        Feb 26 05:59:49 my-edison wpa_cli[286]: is a bug that ends up in a busy loop that prevents clean shutdown.

                        Feb 26 05:59:49 my-edison wpa_cli[286]: Killing program forcefully.

                        Feb 26 05:59:49 my-edison systemd[1]: wpa_supplicant_wlan0_event.service: main process exited, code=exited, status=1/FAILURE

                        Feb 26 05:59:49 my-edison systemd[1]: Stopped Daemon to receive the wpa_supplicant event.

                        Feb 26 05:59:49 my-edison systemd[1]: Unit wpa_supplicant_wlan0_event.service entered failed state.

                         

                         

                        Then, I ensure that there are also no stray udhcpc processes lying around:

                         

                         

                        $ ps  | grep -i udhcpc | grep -v grep

                        [... NO OUTPUT... THERE ARE NONE! ...]

                        # NOTE: If there were... kill them with:

                        kill $(ps | grep -i udhcpc | grep -v grep | awk '{ print $1 }')

                        # Or perhaps:

                        ps | grep -i udhcpc | grep -v grep | awk '{ print $1 }' | xargs kill

                         

                         

                        Then I am able to:

                         

                        1. First restart wpa_supplicant_wlan0_event.service
                        2. Then tell wpa_supplicant to reconnect via: wpa_cli reconnect
                        3. Verify in JournalD logs for what happens directly after I quickly execute these commands in succession

                         

                         

                        $ systemctl start wpa_supplicant_wlan0_event.service ;  wpa_cli  reconnect ; journalctl -xn --unit=wpa_supplicant_wlan0_event.service -f

                         

                         

                        Feb 26 06:00:52 my-edison systemd[1]: Starting Daemon to receive the wpa_supplicant event...

                        Feb 26 06:00:52 my-edison systemd[1]: Started Daemon to receive the wpa_supplicant event.

                         

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: event CONNECTED received from wpa_supplicant

                        Feb 26 06:01:12 my-edison udhcpc[1897]: udhcpc (v1.22.1) started

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: udhcpc (v1.22.1) started

                        Feb 26 06:01:12 my-edison udhcpc[1897]: Sending discover...

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: Sending discover...

                        Feb 26 06:01:12 my-edison udhcpc[1897]: Sending select for 10.1.2.3...

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: Sending select for 10.1.2.3...

                        Feb 26 06:01:12 my-edison udhcpc[1897]: Lease of 10.1.2.3 obtained, lease time 86400

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: Lease of 10.1.2.3 obtained, lease time 86400

                        Feb 26 06:01:12 my-edison wpa_cli[1891]: /etc/udhcpc.d/50default: Adding DNS 192.168.1.1

                         

                         

                        SUCCESS! It is able to:

                         

                         

                        1. Start the wpa_cli event handler script via wpa_supplicant_wlan0_event.service

                        2. Receive the CONNECTED event from wpa_supplicant

                        3. Start udhcpc

                        4. Get an IP lease over DHCP

                         

                         

                        The Edison is now re-connected and also has an IP:

                         

                         

                        $ wpa_cli status

                         

                        Selected interface 'wlan0'

                        bssid=12:34:56:78:9a:bc

                        ssid=Example_AP

                        id=0

                        mode=station

                        pairwise_cipher=CCMP

                        group_cipher=TKIP

                        key_mgmt=WPA2-PSK

                        wpa_state=COMPLETED

                        ip_address=10.1.2.3

                        p2p_device_address=ab:00:00:00:12:34

                        address=ab:cd:ef:00:01:02

                         

                        $ iwconfig wlan0

                        wlan0    IEEE 802.11abgn  ESSID:"Example_AP"

                                  Mode:Managed  Frequency:5.123 GHz  Access Point: 12:34:56:78:9a:bc

                                  Bit Rate=150 Mb/s  Tx-Power=31 dBm

                                  Retry  long limit:7  RTS thr:off  Fragment thr:off

                                  Encryption key:off

                                  Power Management:on

                                  Link Quality=70/70  Signal level=-40 dBm

                                  Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0

                                  Tx excessive retries:16036  Invalid misc:331  Missed beacon:0

                         

                         

                        $ ifconfig wlan0

                        wlan0    Link encap:Ethernet  HWaddr ab:cd:ef:00:01:02

                                  inet addr:10.1.2.3  Bcast:10.1.2.255  Mask:255.255.255.0

                                  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

                                  RX packets:2237 errors:0 dropped:0 overruns:0 frame:0

                                  TX packets:2438 errors:0 dropped:0 overruns:0 carrier:0

                                  collisions:0 txqueuelen:1000

                                  RX bytes:389580 (380.4 KiB)  TX bytes:582464 (568.8 KiB)

                         

                         

                        And finally, I am able to ping it from another local terminal on my laptop to verify it's on the network:

                         

                         

                        $ ping 10.1.2.3

                        PING 10.1.2.3 (10.1.2.3): 56 data bytes

                        64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=31.626 ms

                        64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=338.899 ms

                        ^C

                        --- 10.1.2.3 ping statistics ---

                        2 packets transmitted, 2 packets received, 0.0% packet loss

                        round-trip min/avg/max/stddev = 31.626/185.262/338.899/153.637 ms


                        Note that I consider this a WORKAROUND only!   The complete solution would be to prevent the disconnect & perhaps rate-limit the event handler script so it doesn't try to start and stop udhcpc so fast.