6 Replies Latest reply on Jul 18, 2016 4:08 PM by Intel Corporation

    SPI bug (program freezes and becomes unkillable) during Edison startup

    celskeggs

      TL;DR: I ran into a SPI issue on the Edison involving my program hanging and becoming unkillable if it starts early enough in the startup process. The workaround is forcing it to wait until later to start.

       

      I'm using libmraa through its python wrapper. My code looks, very approximately (these are the relevant snippets of the larger project), like this:

      # to init SPI
      # fix power-on SPI glitches
      os.system("echo on >/sys/devices/pci0000\:00/0000\:00\:07.1/power/control")
      spi = mraa.Spi(0)
      spi.mode(mraa.SPI_MODE3)
      spi.frequency(8000000)
      
      # to write SPI, which occurs about once every 10 milliseconds
      byte_array = bytearray(396)
      # ... populate byte array ...
      spi.write(byte_array)
      

      I manage this code with (more or less) the following systemd unit:

      [Unit]
      Description=CFRS Main Application
      Requires=bluetooth.target bluetooth.service pulseaudio.service
      After=bluetooth.target bluetooth.service
      
      [Service]
      ExecStart=/usr/bin/python2 /home/root/cfrs/main.py
      Restart=always
      
      [Install]
      WantedBy=multi-user.target
      

      The program becomes immediately unresponsive from other devices attempting to contact it over its serial interface, which runs in a different thread from the SPI handler. When I attempt to run "systemctl stop" on this unit, "systemctl stop" waits indefinitely, and I have to kill it (^C). At this point, running "systemctl status" on this unit displays a result similar to the following:

       

      root@cfrs-edison-alpha:~# systemctl status cfrs
      ==> cfrs.service - CFRS Main Application
         Loaded: loaded (/usr/lib/systemd/system/cfrs.service; enabled)
         Active: deactivating (stop-sigterm) since Fri 2016-07-01 21:37:42 UTC; 1min 3s ago
       Main PID: 204 (python2)
         CGroup: /system.slice/cfrs.service
                 ==> 204 [python2]
      

       

      This means that it's attempting to kill the process, but it's not working. I can then try "killall -9 python2", but the program stays there:

       

      root@cfrs-edison-alpha:~# ps | grep python
         204 root         0 Z    [python2]

       

      After a while, the kernel prints out the following on the serial console:

       

      [  240.630970] INFO: task kworker/u4:2:74 blocked for more than 120 seconds.
      [  240.631063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  240.632216] INFO: task python2:290 blocked for more than 120 seconds.
      [  240.632277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

       

      Here is the complete log showing this issue: http://pastebin.com/Wv3cd28Q - this includes startup messages, the dmesg output from the errors that occur, and some more attempts at poking the process and understanding why it isn't working. The program does not appear to ever recover, and a hard reboot is necessary - a soft reboot would also work, except that it waits a couple of minutes to try to kill the program first.

       

      I resolved this with the workaround of delaying the program's start until later. I changed the After line of the systemd unit to the following:

       

      After=bluetooth.target bluetooth.service multi-user.target
      

       

      This works around the issue, because the program will wait until the rest of the system has started up, at which point the issue does not appear to occur.

       

      I'm using the release iot-devkit-prof-dev-image-edison-20160606.zip, which contains kernel 3.10.98-poky-edison+, which is a recent enough release that it seems like it should include any recent SPI fixes.

       

      Is there any way to actually resolve this problem? I would rather have the program start up as soon as possible, rather than waiting a bunch of extra time.

        • 1. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi celskeggs,

          Is it possible to have your original code to check this behavior? We would like to run some tests on your issue to see what the problem is exactly. Any other detail that you could provide would be very helpful.

          Regards,
          -Pablo

          • 2. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
            celskeggs

            Hi!

             

            Unfortunately, my original code is large and not something I'm allowed to share. I've put together a short example that demonstrates the issue:

             

            import os
            import threading
            import time
            
            import mraa
            
            LIGHT_STRIP_LENGTH = 96
            FRAME_PERIOD = (10.0 / 1000)  # 100 Hz
            
            
            class LightExample:
                def __init__(self):
                    # fix power-on SPI glitches
                    os.system("echo on >/sys/devices/pci0000\:00/0000\:00\:07.1/power/control")
                    self.spi = mraa.Spi(0)
                    self.spi.mode(mraa.SPI_MODE3)
                    self.spi.frequency(8000000)
            
                    threading.Thread(target=self._loop).start()
            
                def _loop(self):
                    while True:
                        self.write_update([(0.9, 255, 0, 0)] * LIGHT_STRIP_LENGTH)
                        time.sleep(FRAME_PERIOD)
            
                def write_words(self, words):
                    ba = bytearray(4 * len(words))
                    for i, word in enumerate(words):
                        ba[i * 4:i * 4 + 4] = ((word >> 24) & 0xFF, (word >> 16) & 0xFF, (word >> 8) & 0xFF, (word >> 0) & 0xFF)
                    self.spi.write(ba)
            
                def write_update(self, colors):
                    words = [0x00000000]
                    for bright, r, g, b in colors:
                        assert 0.0 <= bright <= 1.0 and 0 <= r < 256 and 0 <= g < 256 and 0 <= b < 256
                        words += [(0b111 << 29 | int(bright * 31) << 24 | b << 16 | g << 8 | r << 0)]
                    words += [0xFFFFFFFF] * int((len(colors) + 63) / 64)
                    self.write_words(words)
            
            
            if __name__ == "__main__":
                e = LightExample()
                time.sleep(60)
            
            

             

            Put that in /home/root/example.py.

             

            [Unit]
            Description=Example Application
            Requires=bluetooth.target bluetooth.service pulseaudio.service
            After=bluetooth.target bluetooth.service
            
            [Service]
            ExecStart=/usr/bin/python2 /home/root/example.py
            Restart=always
            
            [Install]
            WantedBy=multi-user.target
            
            
            

             

            Put that in /usr/lib/systemd/system/example.service. (You may need to create a folder first.) Enable the service with "systemctl enable example".

             

            Reboot the system, and try stopping the service with "systemctl stop example".

            • 3. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hi celskeggs,

              We are still investigating your case, we’ll let you know once we have some updates. Thank you for your patience.

              Regards,
              -Pablo

              • 4. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
                Intel Corporation
                This message was posted on behalf of Intel Corporation

                Hi Celskeggs,

                We haven’t been able to reproduce the issue, the system service is not able to start as it enters a fail state, we followed the steps exactly as you mentioned in here but no success. Is there any other detail that you can provide, like your image version, python version or any external hardware connected, board used, any specific external power supply, etc. Thanks in advance.

                Regards,
                -Pablo

                • 5. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
                  celskeggs

                  The system service should not enter a fail state, even after being stopped - that's probably a mismatch in your replication of my environment, not the bug being resolved.

                   

                  As I stated earlier, I'm using the release iot-devkit-prof-dev-image-edison-20160606.zip, which contains kernel 3.10.98-poky-edison+. External hardware included, at various times: unpowered USB hub, USB soundcard, FTDI serial cable, device connected to UART, SPI (APA102) lightstrip connected. I used a 12VDC 1A power supply (model SM-333B). I used python 2.7.3. I removed the following packages from my device: clloader xdk-daemon ofono wyliodrin-server redis mosquitto-dev iotkit-comm-c-dev iotkit-comm-c iotkit-comm-js mosquitto tinyb-dev tinyb connman bluez5-dev bluez5 bluez5-obex (with the opkg remove command) and compiled and installed bluez-5.40 on the device itself. I also installed pyserial 3.1.1. I used the Edison Arduino Breakout board.

                   

                  My development environment has changed significantly in the past few weeks. I tried to reproduce this issue again on a fresh device, but wasn't easily able to do so - the service stopped successfully, but did not enter a fail state. I've moved on and need to be doing other work and can't spend the time necessary to make the issue occur again.

                   

                  The issue isn't resolved for us, but the workaround works well enough and we don't have the time to assist with finding the real cause.

                   

                  Thank you for your help!

                  1 of 1 people found this helpful
                  • 6. Re: SPI bug (program freezes and becomes unkillable) during Edison startup
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hi Celskeggs,

                    It’s good to know that you were able to continue with a workaround. Please let us know if you go back to investigate this issue at some point, and we’ll be more than glad to help you. Hopefully we’ll have better luck next time.

                    Regards,
                    -Pablo

                    1 of 1 people found this helpful