5 Replies Latest reply on May 6, 2018 10:31 PM by Intel Corporation

    Training process

    virtualdvid

      Hey it's me again! I am training a model right now and it seems it is going pretty well but I'm wondering if I can see the training process like this:

       

      Example:

      50/50 [==============================] - 190s 4s/step - loss: 2.6233 - predictions_loss: 0.9289 - aux_predictions_loss: 0.7987 - predictions_acc: 0.7420 - aux_predictions_acc: 0.7720 - val_loss: 3.2657 - val_predictions_loss: 1.3838 - val_aux_predictions_loss: 1.2556 - val_predictions_acc: 0.6370 - val_aux_predictions_acc: 0.6360 Epoch 153/189

      50/50 [==============================] - 190s 4s/step - loss: 3.1491 - predictions_loss: 1.2833 - aux_predictions_loss: 1.1981 - predictions_acc: 0.6420 - aux_predictions_acc: 0.6660 - val_loss: 3.6265 - val_predictions_loss: 1.6375 - val_aux_predictions_loss: 1.4866 - val_predictions_acc: 0.5830 - val_aux_predictions_acc: 0.6120 Epoch 154/189

      50/50 [==============================] - 190s 4s/step - loss: 3.0816 - predictions_loss: 1.2146 - aux_predictions_loss: 1.1703 - predictions_acc: 0.6660 - aux_predictions_acc: 0.6850 - val_loss: 3.3087 - val_predictions_loss: 1.3965 - val_aux_predictions_loss: 1.2729 - val_predictions_acc: 0.6550 - val_aux_predictions_acc: 0.6580 Epoch 155/189

      16/50 [========>.....................] - ETA: 1:40 - loss: 2.9582 - predictions_loss: 1.1257 - aux_predictions_loss: 1.0727 - predictions_acc: 0.7344 - aux_predictions_acc: 0.7281

       

      I am using a terminal from hub.colfaxresearch.com (jupyter). Does exist a command or something to check this?

       

      Thank you!

        • 1. Re: Training process
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi David,

          Please confirm if you wanted to see the real time logs of the running job in jupyter notebook (hub).

          Also, could you please let us know if you run the python code directly or use the qsub utility with jupyter notebook.
          Kindly share more details on how you submit the job in jupyter notebook.

          Regards,
          Anju

          • 2. Re: Training process
            virtualdvid

            Yes I want to see real-time logs it could be from the terminal or a jupyter notebook.

            I used `qsub` utility from the terminal:

             

            Here my steps:

             

            1. I used this instruction to open the terminal: Using Jupyter Notebook* Terminal Console | Intel® Software

            2. There:

            • Created conda environment.
            • activated environment.
            • Installed some libraries
            • created file "myjob" with this lines:

                          #PBS -l nodes=1

                          cd $PBS_O_WORKDIR

                          echo Starting calculation

                          source activate iMaterialist

                          python NASNet.py

                          echo End of calculation

            • executed `qsub myjob` from the terminal
            • started training job with number xxxxx.c00x
            • It ran for certain hours.

            3. The job stopped suddenly and I don't get any information about the process.

            4. My model saves a basic log and it showed me it was on the epoch 14/189 way far to the end.

             

            I am sorry I think I made another question related here:

            Missing job process

             

            Thank you!

            • 3. Re: Training process
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hi David,
              To view the real time logs of a running job, we use qpeek <JOB_ID> & qpeek –e <JOB_ID>.
              Unfortunately, the qpeek command does not work in terminal of jupyter notebook hub now.
              Hence we would suggest you three options:

              1. Terminal that you get through Jupyter notebook(hub) is actually that of a compute node and hence you could directly run the commands there, as you do in a local machine. i.e. no need to use qsub utility at all. This will give the logs just as in a local machine. The drawback of this is that, the terminal when inactive, could lose connection and hence you might stop seeing the logs beyond a point of time. Secondly, since there is 4 hour session time limit with Jupyter notebook(hub), you could view the logs only for that long.
               
              1. Submit the job using the jupyter notebook terminal. Save the job id. Login using ssh with the help of putty or linux terminal (Steps are available in the link provided with the welcome mail of DevCloud). Run the commands, qpeek <JOB_ID>  and  qpeek –e <JOB_ID> in the ssh terminal.
               
              1. Since you are already familiar with the basic linux commands and qsub utility, all the steps that you do currently in Jupyter notebook terminal could be done in the ssh terminal. This option will help to avoid switching between jupyter notebook & ssh terminal.
              Regards,
              Anju
              • 4. Re: Training process
                virtualdvid

                I installed PUTTY and it is working perfectly!! Thank you!!

                • 5. Re: Training process
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  Hi David,

                  Thanks for the confirmation. We will go ahead and close this case.
                  Feel free to open another thread for any further queries.

                  Regards,
                  Anju