1 2 Previous Next 17 Replies Latest reply on Apr 11, 2016 2:14 PM by M3LiNdRu

    Program ceases to run after many hours (Galileo Gen2)

    CHerbst

      I've been working on a project for my company and have hit a road block on an issue I have been seeing.  For context, I am working on the Galileo Gen 2 with an Intel Centrino Advanced-N 6025, I am using the Intel XDK IoT Edition to write programs in Node js, along with some python scripts.  The program itself is supposed to read data from a sensor, write it to a file, and then send that file to a server.  I also have the program going to a server and retrieving a file.  The reading from a sensor then sending a file happens every minute and the retrieval of a file happens every three seconds.

       

      Now down to the actual issue.  I would like this program to run essentially indefinitely, but as this is mostly a proof of concept project, even a few days would suffice.  As of now I can't get it to run 24 hours.  Every morning when I get to work, the program is just not running.  Since my work computer goes to sleep I no longer have the console output to show if an exception was thrown, but I do still have the serial terminal connection still open, but it doesn't show any change. 

       

      At first I believed an exception was thrown so I added the following to my code:

      process.on('uncaughtException',  function(code){

      ****Code to create a file displaying error code along with timestamp********

      }

       

      The next day I still had nothing.  No file was generated, no outputs in the serial terminal, and the Galileo appeared to be working just fine, except for the fact that the Javascript code was no longer running.  This happens every time.  I did finally get to see something in the last day, which still didn't tell me much.  What I saw was that the last file to be sent to the server when the program crashed was completely empty, this only happened once. The other times the last file sent was unaffected, so it only tells me that the crash did not happen at the same place in my code.

       

      I do know though that a simple reboot of the Galileo is all I need to get it back working, so a simple work around is to check the uptime, and if it's been some x number of hours reboot the device.  I only worry this covers up a bigger problem so wonder what other people's thoughts are on this?

        • 1. Re: Program ceases to run after many hours (Galileo Gen2)
          Intel_Alvarado

          Hi,

           

          I’ve never seen this behavior; I’ve never let the Galileo running over night. When you return to your program after it stops working, was the process terminated or was the program just hanged?

          I don’t think I’ve seen this issue on the Galileo community so I searched around in various forums and found a couple of suggestions.

           

          The first suggestion was to use forever forever . Forever is a tool that allows you to run a node script continuously.

          Other users had also found useful setting up a timer that sends ping requests just to keep the connection alive. http://stackoverflow.com/questions/4569956/ping-a-mysql-server

          Also, are you behind a firewall? Some firewalls close tcp connections. Take a look at TCP keepalive overview .

           

          Again, I’ve not tested these suggestions because I’ve never had the issue of the program ending due to inactivity for that long, but hopefully they’ll provide some clues as to why this is happening.

           

          Sergio

          • 2. Re: Program ceases to run after many hours (Galileo Gen2)
            CHerbst

            Sergio

             

            My code uses setTimeout() and calls itself, so the code itself is an infinite loop, which is why I suspected an exception before.  I am dealing with an FTP server, and I know the server is up during all those times (in fact my code gives me feedback to a down server so I usually know it's down before I get an email warning me that it will be down for maintenance).  If a firewall were to be blocking my connections, my program would most likely be generating error reports (I have a lot of exception catching that generates files so I can read them with a cat command in the serial terminal).

             

            I've seen the wifi now and then go out on my board, but the serial window usually gives a report of that, and again my program generates a file listing it can't connect to the server and a timestamp.  The weird part about my current issue is that it just appears to stop, no reports, no code appears to execute, nothing.  Are there linux commands I can use to see if the process is still running?  Because that would allow me to answer your question better when I get into work tomorrow.

             

            I've been wondering if because I am technically running an infinite loop if maybe something is happening to resources and I just crap out the memory or something.  3 second infinite loops of retrieving a file from an FTP server for many hours is a lot of iterations and if there is a problem of resources not being properly disposed of after every cycle then there would be a build up.

             

            CHerbst

            • 3. Re: Program ceases to run after many hours (Galileo Gen2)
              CHerbst

              Well after posting this the board of course decided to continue running for over 24 hours.  *facepalm*.  Yet it began to display a set of new symptoms, which required a reboot when I got back to work today.  For now I'm going to just code in a reboot every 9 hours and see how well that stabilizes my system.  As this is a pretty sloppy fix, anyone can feel free to bring suggestions to the table.  I may try the Forever in the future, but as of now a sloppy patch job will do.

              • 4. Re: Program ceases to run after many hours (Galileo Gen2)
                CHerbst

                I am now kicking myself since the board finally had its issue while I am at work and I didn't copy what I saw in the serial terminal and now it is lost.  If I manage to see it again I will post the error that I am getting in the Linux window.

                 

                Just to clarify what I saw and what finally happened while I was at work, yesterday just minutes before I left.  I was connected to the device serially and just keeping an eye on the system by using the cat command on any text files generated from issues.  A single line of output was suddenly displayed.  I didn't think much of it since most of the line looked like random numbers, but it did have the word error followed by a hex number.  I thought it to be junk so cleared the terminal window.  About a minute later I noticed something seemed wrong and observed the usual problem where nothing appeared to be executed.  As I write this I am now also kicking myself for not running the top command to confirm that Linux was no longer running my code.

                 

                As soon as I manage to get the thing to crap out on me again, going to post the error line here, and do a google search on it.

                • 5. Re: Program ceases to run after many hours (Galileo Gen2)
                  Intel_Alvarado

                  Hi,

                   

                  Yes, if you ever get that error message again please share it in this post. In the meantime I’ll do some more research on this and hopefully find something you’ll find useful.

                   

                  Sergio

                  • 6. Re: Program ceases to run after many hours (Galileo Gen2)
                    Intel_Alvarado

                    Hi,

                     

                    Just to check, what kind of sensor are you using? How are you communicating, through SPI, I2C or other?

                    Which image are you using? Are you using the standard Yocto image or the IoT version?

                     

                    Sergio

                    • 7. Re: Program ceases to run after many hours (Galileo Gen2)
                      CHerbst

                      Sergio

                       

                      I have the Grove Starter Kit plus Intel IoT Edition for Gen 2, and am using their light sensor and potentiometer which communicate through analog as inputs.  For outputs I have their LED (so digital) and the Grove LCD RGB Backlight which is through I2C.  For image not a hundred percent sure as I was not the one to install the image, the person who had this project before me had done that.  However, following instructions on here I can see that my version number is 201502260041.  So I would think that yes I have the IoT version.  I was also running on mraa version 0.7.3 (but also saw the issue on 0.7.2).

                       

                      Hope that helps,

                      CHerbst

                      • 8. Re: Program ceases to run after many hours (Galileo Gen2)
                        Intel_Alvarado

                        Hi,

                         

                        We’d like to replicate this issue. Can you please provide code?

                         

                        Sergio

                        • 9. Re: Program ceases to run after many hours (Galileo Gen2)
                          CHerbst

                          Sergio

                           

                          Is there a way I can privately send you the code, as I am working for a company and not sure I should be posting code on an open forum.  I have modified the code to not have some of our confidential information (such as IP and login for our FTP server), but would still like to send you the code I can.

                           

                          The other thing is I managed to see the error code again.  Had to go to my power settings on my computer and made sure it never went to sleep, that way the connection to the serial terminal was never lost.

                           

                          root@galileo:~# [28072.390527] node[206]: segfault at c12e80a2 ip c12e80a2 sp cb855fe4 error ffff0015 

                           

                          I also ran the top command, can show you the output and also the comparison output for when the programming is running, but either way it does appear that the program does cease to run.

                           

                          CHerbst

                          • 10. Re: Program ceases to run after many hours (Galileo Gen2)
                            Intel_Alvarado

                            Hi,

                             

                            There are two options:

                            1. You can go to my profile and send it to me as a private message. One the right side of the screen there is a message option next to a follow button. Click on the message option and send it as a private message.
                            2. Go to Intel Support and submit a service ticket. In the description explain that you have a community case with me and I’ll take your case.

                             

                            Sergio

                            • 11. Re: Program ceases to run after many hours (Galileo Gen2)
                              CHerbst

                              Sergio

                               

                              My follow up question is, which would be the best way to send a zip folder with all code in it to you.  I could copy and paste it into the private message, but not sure if that is desired.  Since I'm working with the Intel XDK IoT environment I already saved a copy of my project to a zip folder and don't see a means of attaching the folder through a private message.  My guess is that I will need to fill out a service ticket, but thought I should run by you what I'm trying to send first.

                               

                              CHerbst

                              • 12. Re: Program ceases to run after many hours (Galileo Gen2)
                                Intel_Alvarado

                                Hi CHerbst,

                                 

                                The preferred way would be to fill a new service ticket, it will allow you to attach a .zip file. I will be waiting for you to create it.

                                 

                                Sergio

                                • 13. Re: Program ceases to run after many hours (Galileo Gen2)
                                  CHerbst

                                  For those who may happen to be looking at this thread an update: I have sent my code to Sergio and don't expect to hear back from him for some time.  With the code I sent him the program reboots every 20 hours, and it rebooted 5 times for me before I had seen the issue again, which as the math will show many days had passed before I even saw the issue.  I am not sure how much more feedback I can provide, so in the future a more definitive answer may come, but seriously debugging this one is going to take a long time.

                                   

                                  If anyone could shed light on the output I had seen when it crashed: root@galileo:~# [28072.390527] node[206]: segfault at c12e80a2 ip c12e80a2 sp cb855fe4 error ffff0015

                                  that would be nice.  I tried some google searching but got a lot of nothing, and not savvy enough in Linux to know the best ways I should be going about my search.

                                  • 14. Re: Program ceases to run after many hours (Galileo Gen2)
                                    Intel_Alvarado

                                    Hi,

                                     

                                    After performing some extensive testing we were unable to replicate this issue. I’d suggest you to try and debug the code and look for memory leaks. One approach would be to break down the code into different sections to see under what conditions a memory leak occurs.

                                     

                                    Sergio

                                    1 2 Previous Next