Aside BSP 1.0.1 which was developed thinking on the Galileo board, the following BSPs are designed to be used in a Quark environment, but not specifically for the Galileo Platform. So the BSP that you’re using now may present this kind of issues when being used in this environment. We’ll investigate your issue and we’ll let you know when we have more updates.
Could you please tell us what processes are you running? We would like to know under what conditions you get this "overrun" messages.
The faults are happening on our own board, not on a Galileo board. Our board's design leverages the Galileo design.
The faults happen when the processor is busy. During normal operation our board has a number of things running almost all the time.
We have a sensor that streams data in bursts over the ttyS0 port. The bursts can be many per second.
The software on the Quark takes the sensor data and does limited processing before sending it over the Ethernet connection.
The board also has a USB webcam which enumerates as /dev/video0. A low frame rate video stream is sent over the Ethernet connection.
I think I understand your problem now, have you checked this thread before Problem:configuration of 8250/16550 uart driver and its effect on ethernet? Another user had a similar request, and he needed to configure the UART driver and xbolshe provided a possible solution. I would suggest you to check the thread, you might find some useful information.
I am also seeing this same issue with the exact symptoms as dfwJones. I reviewed the link you suggested, but I don't see the relevance to the ttyS0 overrun issue. In our application, we need both the serial and Ethernet interfaces active. The behavior I'm seeing is exactly what you might see if the serial buffers were too shallow or interrupts are being disabled by some other process for too long.
Intel Quark processor has only one thread/one core (Intel® Quark™ SoC X1000 (16K Cache, 400 MHz) Specifications).
If a heavy task does not allow to switch to the Linux driver in a time, tty overruns are expected.
Intel Quark has a FIFO buffer with 16 bytes length for operations. And there is no way to increase it.
For now an internal buffer length is 4095 bytes. It is possible to increase it.
But I guess an increased buffer will not fix a requirement to get a data from FIFO buffer.
by the way, may you provide several shorts in the time of the command below when a heavy task is executed in case of 3.8.7 and 3.14.28 kernel?
Command cat /proc/interrupts
May you provide more information about a serial port speed and actual data rate?
I've spent a lot of time digging deeper into the problem. I have three separate versions of the kernel; 3.8.7, 3.14.28, 3.19.8. The problem happens on both 3.14.28 and 3.19.8, but does not happen on 3.8.7.
When we see the problem, we get system messages like this:
[ 334.896442] ttyS0: 10 input overrun(s)
[ 336.293599] ttyS0: 13 input overrun(s)
[ 337.328057] ttyS0: 16 input overrun(s)
[ 338.591951] ttyS0: 11 input overrun(s)
[ 340.215313] ttyS0: 9 input overrun(s)
[ 341.360737] ttyS0: 14 input overrun(s)
[ 342.553417] ttyS0: 15 input overrun(s)
[ 343.600646] ttyS0: 6 input overrun(s)
As you can see, we are seeing very large numbers of overruns every second.
I dumped the contents of /proc/interrupts before and after running the tests. We are seeing very large increases in the counts in all cases. Assuming I'm reading the output correctly, it looks like the serial port is set to the same interrupt in all 3 kernels, but in the case of 3.8.7, it doesn't share the interrupt with anything else. The other two kernels appear to have multiple peripherals sharing the same interrupt.?
17: 2255 IO-APIC-fasteoi serial
17: 162316 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial
17: 795 IO-APIC 17-fasteoi INTEL_MID_DMAC2, intel_quark_uart, INTEL_MID_DMAC2, intel_quark_uart, pxa2xx-spi.1
Assuming that the sharing is taking place, how do we move those other peripherals to other interrupts?
If you need the full output of /proc/interrupts, let me know and I can post it.
may I ask you to test how it will work with this image?
It has UARTs on different interrupts:
24: 72 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart 25: 2319 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart
And please post all output of /proc/interrupts after a heavy load.
Thank you for producing a new kernel. The kernel as you packaged it boots, but it lacks our product's environment. I tried merging the kernel with all of our environment, but it didn't go well. It looks like there are a number of devices (/sys/proc/gpio, eth0, etc.) that aren't loading which prevent our stuff from running.
Is it possible for you to tell us how you managed to move the other devices away from the interrupt that the serial port is using? That way I can make the change and rebuild the kernel here. At the moment, I think we'd prefer to try and continue using the 3.19 we are building from here:
the repository you have mentioned above now have an update.
It is related with an UARTs interrupt separation.
I guess you may try to use it.
Now it looks like:
root@quark:~# cat /proc/interrupts CPU0 0: 29 IO-APIC-edge timer 7: 2 IO-APIC-edge 8: 1 IO-APIC-edge rtc0 9: 2 IO-APIC-fasteoi acpi, gpio_sch 16: 91 IO-APIC 16-fasteoi pxa2xx-spi.0, ohci_hcd:usb2 17: 0 IO-APIC 17-fasteoi pxa2xx-spi.1 19: 4 IO-APIC 19-fasteoi ehci_hcd:usb1 24: 0 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart 25: 9098 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart 26: 3948 PCI-MSI-edge mmc0 35: 287 PCI-MSI-edge intel_qrk_gip 36: 1 PCI-MSI-edge pch_udc 37: 4157 PCI-MSI-edge enp0s20f6 40: 2 gsi-sch_gpio_irq 0-0020 46: 29 PCI-MSI-edge iwlwifi 100: 2 cy8c9540a-irq gpiolib NMI: 0 Non-maskable interrupts LOC: 16500 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 1 IRQ work interrupts RTR: 0 APIC ICR read retries TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 0 Machine check polls ERR: 2 MIS: 0
Sorry for the delays, we've encountered a few other issues with 3.19. I may start separate threads for them.
We can't yet fully test the new build. For some reason we aren't getting the /dev/video0 device to show up like it used to with the 3.14 in the official BSP. I've tried everything I can think of to enable with menuconfig.
Without the streaming video, we were still seeing the overrun errors under the original 3.19. With the new version using the interrupt fixes, we haven't yet seen any overruns. This is a very good sign so far. We will keep testing as soon as we can figure out the video0 problem.
I can't yet call it fixed, but it is looking good.
What kind of interrupt fixes are you talking about?
To understand a difference just compare interrupt list for kernel 3.19.8 shown above and the original Intel BSP 1.2.0 below:
root@quark:~# cat /proc/interrupts CPU0 0: 46 IO-APIC-edge timer 7: 1 IO-APIC-edge 8: 1 IO-APIC-edge rtc0 9: 1 IO-APIC-fasteoi acpi, gpio_sch 16: 3554 IO-APIC-fasteoi mmc0, pxa2xx-spi.0, ohci_hcd:usb2 17: 887 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial 19: 79 IO-APIC-fasteoi ehci_hcd:usb1 32: 1 --sch_gpio_irq_chip 0-0020 40: 7373 PCI-MSI-edge intel_qrk_gip 41: 1 PCI-MSI-edge pch_udc 42: 0 PCI-MSI-edge enp0s20f6 100: 1 cy8c9540a-irq gpiolib NMI: 0 Non-maskable interrupts LOC: 3370 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 0 Machine check polls ERR: 1 MIS: 0
As you may see several devices are located on the same shared interrupt:
17: 887 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial
Interrupt fixes allow to separate them.