14 Replies Latest reply on Jul 30, 2015 7:04 PM by allene

    Files partially lost on Edison probably on power off.

    allene

      I have an program that collects nmea data and writes it to a file.  It also is called with a pipe (#program > file) to write all the printf commands to another file.  Most of this data is also sent via Bluetooth to a phone and it is logged there as well.  Last race both the piped file and the file on the Edison ended after about 2 hours in this 4 hour race.  All the Bluetooth data was fine and the phone captured all 4 hours so I know the program was running.

       

      The only explanation I can come up with is that the data was not actually written to the file yet when I turned off the power.  This is a headless unit so issuing "shutdown now" is not possible unless I somehow create a shutdown command from the phone.

       

      What is the best way to deal with this?  I have several projects going using this code and they will be capturing data that would not be possible to repeat. My race is nothing and my data was also on my phone but if I sent this out to capture a days practice for the Olympics or Jimmy Spithill takes his Moth out and that data is lost, that really becomes unacceptable.

        • 1. Re: Files partially lost on Edison probably on power off.
          Steven Moy

          Hello,

           

          Writing file reliably is actually quite an art on embedded system because unexpected situation happens more often than a desktop system. You will need to have a good strategy to handle what kind of errors can happen on your file system calls. I went to a meet up once and the presenter mentions fsync() is pretty much your only atomic operation for a file system. You have to use fsync well to get durability.

           

          This blog entry expands on fsync better than I can elaborate on an reply, Everything You Always Wanted To Know About fsync() - xavier roche's homework

           

          Think about how much data are you willing to loss if fsync did not complete. You may want to split up a 4 hour data recoding into multiple file if you don't want a failed file system call to lose all your recordings.

           

          Good luck.

          • 2. Re: Files partially lost on Edison probably on power off.
            allene

            Is it possible that 2 hours worth of data, which would be 9 MBytes, is being cached?  Would fsync() be the answer for that. I did a test and ran 10 hours of data then did a power down.  I lost 6 seconds, which is trivial for this application.

            • 3. Re: Files partially lost on Edison probably on power off.
              Steven Moy

              Can you post the code regarding the file system operation? I think it will attract more people to analyze. Regarding how much data can be lost, it greatly depends on the flash controller and the block size and the filesystem write cache behavior. Unless the fsync returns without error, there is not much guarantee on the durability of data before the previous fsync.

               

              Below is a quote from a ext4 developer,

               

              "OK, so enter ext4 and delayed allocation. With delayed allocation, we don't allocate a location on disk for the data block right away. Since there is no location on disk, there is no place to write the data on a commit; but it also means that there is no security problem. It also results in a massive performance improvements; for example, if you create a scratch file, and then delete it 20 seconds later, it will probably never hit the disk. Unfortunately, the default VM tuning parameters, which can be controlled by /proc/sys/vm/dirty_expire_centiseconds and /proc/sys/vm/dirty_writeback_centiseconds, means that in practice, a newly created file won't hit disk until about 45-150 seconds later, depending on how many dirty pages are in the page cache at the time. (This isn't unique to ext4, by the way --- any advanced filesystem which does delayed allocation, which includes xfs and the in the future, btrfs, will have the same issue.)"

               

              Comment #45 : Bug #317781 : Bugs : linux package : Ubuntu

              • 4. Re: Files partially lost on Edison probably on power off.
                allene

                I am thinking that using fsync() is not going to keep whatever trashed 2 hours of data from happening.  My testing shows that fsync() prevents 6 seconds of data form being lost.  That is inconsequential.   I either need some sort of fancy circuit, or a button that issues a shutdown command like you have on a desktop.

                 

                The easiest thing for me is to have that command come from the tablet and that is what I am planning on doing.

                 

                Allen

                • 5. Re: Files partially lost on Edison probably on power off.
                  allene

                  I have read all the links posted and other links on this issue and find nothing that can explain the loss of 2 hours of data.  The comment #45 above talks about 150 seconds of data.  If I lost 3 minutes of data I would not care.  My application does not need the last few minutes of data.

                   

                  I am debating what to do about it.  Perhaps a shutdown button on the box tied to a GPIO pin trigger an interrupt.  Perhaps just bringing pin 3 of J18 out but that isn't that easy to use blind without some way to know if the shutdown happened.  Or perhaps the Bluetooth instruction I implemented is good enough.

                   

                  The command I issued is

                   

                  system("exec shutdown now");

                  • 6. Re: Files partially lost on Edison probably on power off.
                    Steven Moy

                    If you can implement a graceful shutdown and its sufficient for your use case, that's awesome. However, if a power is lost unexpectedly, and you lose two hours of worth data because of a power failure, that seems to indicate a bigger problem.

                     

                    Can you explain how you are saving your data to non-volatile memory? The entire data stream in a single file? If so, how often do you fsync?

                    • 7. Re: Files partially lost on Edison probably on power off.
                      allene

                      I am saving the data two ways creating two files.  One is that the program is called with a redirect for the output to go to a file. 

                       

                      #program > file

                       

                       

                      The second is that a file is created and data is written to it once a second after which the file is closed.  I do not fsync and as far as I can tell from reading, I don't need to as I do not care if I lose a few minutes of data.  It takes a long time after a race to get back to the slip and none of that data is interesting. 

                       

                      Both files quit at the same time.  I found that particularly strange.  When I say the same time, both files had the last entry with a GPS time of 173849 so it was within one second.

                       

                      Here is the write to the log file

                       

                      void write_log_row(){

                           fp=fopen(filename, "a");

                           static int j = 1;

                           int i = 0;

                           fprintf(fp,"%d\t",j++);

                           pthread_mutex_lock(&log_lock);

                                for (i = 0 ; i < LOG_ARRAY_MAX ; i++){

                                       fprintf(fp,"%0.6f\t",log_array[i]);

                                }

                                fprintf(fp,"%s\n",mark);

                           pthread_mutex_unlock(&log_lock);

                           fclose(fp);

                      }

                       

                       

                      Allen

                      • 8. Re: Files partially lost on Edison probably on power off.
                        Steven Moy

                        I really think you should fsycn at least sometime. The email thread that I linked earlier include the developer's point of view that its totally reasonable if ext4 never hit the physical media if one never calls fsync. That's because the POSIX never defined the behavior regarding one writes data to a file descriptor but never close nor fsync.

                         

                        Based on the above snippet, unless fclose returns, there is no guarantee in POSIX for anything. If any bits land on the physical media, that's just because implementation details on ext4 but not because the code instructs it to do so.

                         

                        I understand your concern is not on the last bits of data written to the file but without fsync or fclose, there is really no guarantee what happens to data that was written earlier.

                        • 9. Re: Files partially lost on Edison probably on power off.
                          allene

                          There is a fclose(fp) in the code.  Isn't that enough?  I can certainly add fsync but I can't see how that explains the problem observed and I don't want to think I solved it unless I know what I had might have caused the issue.  In any event, how do I fsync the redirect to file?

                           

                          I really would like some explanation as to how two completely different files could quit at the exact same time.  One being a Linux level file and the other opened by the program.

                           

                          Allen

                          • 10. Re: Files partially lost on Edison probably on power off.
                            Steven Moy

                            Sorry, turns out i was wrong about fclose

                             

                            From "man 2 close":

                             

                            A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes.

                            The man page says that if you want to be sure that your data are on disk, you have to use fsync() yourself.

                             

                            http://stackoverflow.com/questions/705454/does-linux-guarantee-the-contents-of-a-file-is-flushed-to-disc-after-close

                            • 11. Re: Files partially lost on Edison probably on power off.
                              Steven Moy

                              Re: the two file quit at approximately same time

                               

                              #program > file2

                               

                              as soon as program dies, shell process that responsible for writing to "file2" will die as well. If you want fine control to file2, should do it inside program and have program takes an argument regarding where to write to file2.

                              • 12. Re: Files partially lost on Edison probably on power off.
                                allene

                                Clearly this is not an issue of the data not being written as no buffer is going to hold 2 hours of data.  I was either a faulty Edison or there was something interrupted when the directory was being rearranged.  There is reference in the fsync():

                                 

                                Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an  explicit fsync() on a file descriptor for the directory is also  needed.

                                 

                                And there is a lot of discussion on the net about this statement but not a lot of talk about how to actually do it.  Not clear it always works.  If this can be done on an Edison, how is it done?

                                • 13. Re: Files partially lost on Edison probably on power off.
                                  Steven Moy

                                  I think I have a theory on why you lose 2 hours worth of date

                                   

                                  In your write_log_row function, you open/close the same file without fsync. Do you call write_log_row function more than once in your code? If so, can this happen?

                                   

                                  Consider the following events sequentially:

                                   

                                  fopen file in append mode (let's say fp points to some block number 123)

                                  write some data

                                  fclose file (we don't know whether dirty blocks have landed in the physical media since we don't fsync)

                                   

                                  fopen file in appen mode (do we know if fp points to block number 123, or some block that is end of file? kernel can still have your dirty pages in memory)

                                  write some data (well, if its pointed to block number 123, your previous write will be gone)

                                  fcloes file

                                   

                                  You can test this theory out by only fopen in the start of your program, fclose in the end of the program, and store the fp as a global in the code.

                                  • 14. Re: Files partially lost on Edison probably on power off.
                                    allene

                                    I write once a second.  Whatever happened, it is not very likely.  I have run this program perhaps a hundred times and only lost data once.  That is what makes testing impossible.

                                     

                                    Allen