6 Replies Latest reply on Apr 30, 2018 3:46 PM by afshin67

    walltime limit

    afshin67

      Hi,

      Each of my jobs run about 60 hours and it seems that there is a limit of 24 hours per job. Is there any way to increase this limit?

      Thanks,

      Afshin    

        • 1. Re: walltime limit
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi Afshin,

          Thanks for reaching out to us.

          Since DevCloud is a shared resource and is being used by a large number of students, we wanted to ensure fair utilization to all users. This is the reason for keeping the 24 hours limit.

          It would be great if you could share the kind of job that you are running. Incase you are using Caffe or Tensorflow kind of frameworks you can save the intermediate checkpoints and resume after 24 hours.

          Do reach out to us for any further concerns.

          Regards,
          Sandhiya

          • 2. Re: walltime limit
            afshin67

            Hi Sandhiya,

             

            Thanks for the explanation. The jobs that I am running are based on the extension of this work:

            https://arxiv.org/abs/1708.05924

            which is basically an extension of DQN algorithm for a real supply chain problem. The decision is how much each agent---of a supply chain, e.g. the supply chain of the Target stores---needs to order in each period to minimize the total cost of the system, with the assumption that agents do not know anything about the state and actions of other agents. Indeed it is a partially observed MDP problem, and I am trying to develop a DQN-type algorithm for that. The code is written in python2.7 and uses TensorFlow. I can explain more details of the work if you are interested.

            About the saving and reloading the network, you are right it is a possibility, but it makes the process of extracting the output logs and the report generation much harder. That would be great if there was any possibility to avoid that procedure.

             

            Thanks,

            Afshin

             

             

            • 3. Re: walltime limit
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hi Afshin,
              Maximum wall time set for DevCloud is 24 hours. As of now we don't have an option to change the wall time beyond this.

               

              Another option is to try the below optimization techniques to reduce your run time.
              Steps involved:
              1. Set interop and intra op threads with parameters updated. The following code can be used for that:

               

              import tensorflow as tf
              config = tf.ConfigProto(intra_op_parallelism_threads=24, inter_op_parallelism_threads=2, allow_soft_placement=True,  device_count = {'CPU': 24})
              session = tf.Session(config=config)

              2. Set OpenMP* environment variables (OMP_) and extensions (KMP_).
              os.environ["OMP_NUM_THREADS"] = "24"
              os.environ["KMP_BLOCKTIME"] = "30"
              os.environ["KMP_SETTINGS"] = "1"
              os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"

               

              Optimization parameters may vary for each use case. So it is recommended to change the parameters if required.

               

              Regards,
              Dona

              • 4. Re: walltime limit
                Intel Corporation
                This message was posted on behalf of Intel Corporation

                Hi Afshin,
                I hope the above information clarifies your query. I am closing the case for now. Please let us know for any further queries.

                Regards,
                Dona

                • 5. Re: walltime limit
                  afshin67

                  Hi Dona,

                   

                  Thanks for letting me know. I am testing the proposed setting to see if they work. I'll update the ticket latter with the improvement, obtained by changing the settings.

                   

                  Best,

                  Afshin    

                  • 6. Re: walltime limit
                    afshin67

                    Hi Dona,

                    It worked well and decreased the walltime to ~20 hours.    

                    Thanks.