TL;DR: I ran into a SPI issue on the Edison involving my program hanging and becoming unkillable if it starts early enough in the startup process. The workaround is forcing it to wait until later to start.
I'm using libmraa through its python wrapper. My code looks, very approximately (these are the relevant snippets of the larger project), like this:
# to init SPI # fix power-on SPI glitches os.system("echo on >/sys/devices/pci0000\:00/0000\:00\:07.1/power/control") spi = mraa.Spi(0) spi.mode(mraa.SPI_MODE3) spi.frequency(8000000) # to write SPI, which occurs about once every 10 milliseconds byte_array = bytearray(396) # ... populate byte array ... spi.write(byte_array)
I manage this code with (more or less) the following systemd unit:
[Unit] Description=CFRS Main Application Requires=bluetooth.target bluetooth.service pulseaudio.service After=bluetooth.target bluetooth.service [Service] ExecStart=/usr/bin/python2 /home/root/cfrs/main.py Restart=always [Install] WantedBy=multi-user.target
The program becomes immediately unresponsive from other devices attempting to contact it over its serial interface, which runs in a different thread from the SPI handler. When I attempt to run "systemctl stop" on this unit, "systemctl stop" waits indefinitely, and I have to kill it (^C). At this point, running "systemctl status" on this unit displays a result similar to the following:
root@cfrs-edison-alpha:~# systemctl status cfrs ==> cfrs.service - CFRS Main Application Loaded: loaded (/usr/lib/systemd/system/cfrs.service; enabled) Active: deactivating (stop-sigterm) since Fri 2016-07-01 21:37:42 UTC; 1min 3s ago Main PID: 204 (python2) CGroup: /system.slice/cfrs.service ==> 204 [python2]
This means that it's attempting to kill the process, but it's not working. I can then try "killall -9 python2", but the program stays there:
root@cfrs-edison-alpha:~# ps | grep python 204 root 0 Z [python2]
After a while, the kernel prints out the following on the serial console:
[ 240.630970] INFO: task kworker/u4:2:74 blocked for more than 120 seconds. [ 240.631063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.632216] INFO: task python2:290 blocked for more than 120 seconds. [ 240.632277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Here is the complete log showing this issue: http://pastebin.com/Wv3cd28Q - this includes startup messages, the dmesg output from the errors that occur, and some more attempts at poking the process and understanding why it isn't working. The program does not appear to ever recover, and a hard reboot is necessary - a soft reboot would also work, except that it waits a couple of minutes to try to kill the program first.
I resolved this with the workaround of delaying the program's start until later. I changed the After line of the systemd unit to the following:
After=bluetooth.target bluetooth.service multi-user.target
This works around the issue, because the program will wait until the rest of the system has started up, at which point the issue does not appear to occur.
I'm using the release iot-devkit-prof-dev-image-edison-20160606.zip, which contains kernel 3.10.98-poky-edison+, which is a recent enough release that it seems like it should include any recent SPI fixes.
Is there any way to actually resolve this problem? I would rather have the program start up as soon as possible, rather than waiting a bunch of extra time.