10 Replies Latest reply on Feb 15, 2018 10:42 PM by Intel Corporation

    Problem in operating the cluster

    imAArora

      I am having issues with operating cluster.

      I was working with Keras on Tensorflow as well as Theano backend on Cluster. I have already set up a virtualenv with Tensorflow optimized for Intel. I already know how qstat and qsub work. I am still unable to Run a simple CNN within seconds. It takes an hour for a single epoch. I am working on MNIST dataset.

      I have already configured the Cluster and my code runs without errors. After going through a lot of forums I came to know that we require an optimized version of the libraries to work fast enough on the Cluster. I have installed all the libraries using the -c Intel flag. I am unable to find one for Keras. It would be really nice to know is there a certain way to install it.

      It would be really helpful if anybody can guide me where I am doing wrong.

      Thank you

        • 1. Re: Problem in operating the cluster
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Thank you for reaching out to Intel AI Academy.

          We are looking into it and will get back to you soon with the updates. Meanwhile could you please set the following parameters - 

          os.environ["OMP_NUM_THREADS"]
          os.environ["KMP_BLOCKTIME"]
          os.environ["KMP_SETTINGS"]
          os.environ["KMP_AFFINITY"] = "granularity=fine,1,0"

          and confirm if you observe any improvements in speed.

          Regards,
          Astha
           

          • 2. Re: Problem in operating the cluster
            imAArora

            I have tried changing these parameters in my application.py file as

             

            import os

            os.environ["OMP_NUM_THREADS"]='1'

            os.environ["KMP_BLOCKTIME"]='active'

            os.environ["KMP_SETTINGS"]='1'

            os.environ["KMP_AFFINITY"] = "granularity=fine,1,0"

             

            but, using these values increases the runtime by 10x more so a 3 sec problem now takes 400 sec

             

            Can you help me by telling what values should I pass?

            Thank you

            • 3. Re: Problem in operating the cluster
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Could you try setting the following two parameters only with the values provided. If this doesn't improve the performance, try changing the values of OMP_NUM_THREADS to a value in the range of 30-40.
              os.environ["OMP_NUM_THREADS"]='40'
              os.environ["KMP_AFFINITY"] = "granularity=fine,1,0"


              Regards,

              Astha

              • 4. Re: Problem in operating the cluster
                imAArora

                Thank you

                But, the problem is still not resolved.

                Changing the values between 30 to 40 seems to have no change on the runtime. It still takes 40 sec which takes 3 sec on a local system.

                 

                My shell file is

                #PBS -l nodes=1:knl

                echo "hello-parallel starts"

                source activate tensorflow-27

                echo "using tensorflow-27"

                python2 MyModel.py

                echo "hello-parallel ends"

                 

                • 5. Re: Problem in operating the cluster
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  Could you provide the code scripts that you are running. We will try to reproduce it at our end, analyse the issue and get back to you on this.

                  Regards,

                  Astha

                  • 6. Re: Problem in operating the cluster
                    imAArora

                    I am running the sample cifar10_CNN.py present in opt/keras/examples

                    It takes 200 sec on my local system and 500 sec on the Cluster.

                    I have made a virtualenv, installed all dependencies using -c intel flag. I'm using

                    keras                     2.0.5                    py27_0

                    tensorflow                1.3.1               np113py27_1    intel

                    Python 2.7.14 :: Intel Corporation

                    • 7. Re: Problem in operating the cluster
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Try the following optimization tips in the  cifar10_CNN.py file -
                      1. Set interop and intra op threads in Keras with parameters updated. The following code can be used for that:
                      from keras import backend as K
                      import tensorflow as tf
                      config = tf.ConfigProto(intra_op_parallelism_threads=64, inter_op_parallelism_threads=2, allow_soft_placement=True,  device_count = {'CPU': 64})
                      session = tf.Session(config=config)
                      K.set_session(session)

                       

                      2. Set OpenMP* environment variables (OMP_) and extensions (KMP_).
                      os.environ["OMP_NUM_THREADS"] = "64"
                      os.environ["KMP_BLOCKTIME"] = "30"
                      os.environ["KMP_SETTINGS"] = "1"
                      os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"

                       

                      This will give you a better performance than what you've observed.

                       

                      For a much better performance, try DevCloud with skylake xeon processors.

                       

                      Regards,

                       

                      Astha

                      • 8. Re: Problem in operating the cluster
                        imAArora

                        Thanks a lot

                        This makes the code run at the same speed as that on a local system.

                        I wanted to know, are there any other specific lines that could be added to make the code execute even faster. Also, these lines have no effect when used with projects meaning that they cannot be used for all kinds of codes. I have added these lines to

                        GitHub - qiuqiangkong/DCASE2016_Task1: DCASE2016 TASK1 Scene Classification

                        The execution time is almost the same without adding these lines which are 10x the local system.

                        It would be nice if you cou.ld help me with this as well.

                        Thank you

                        • 9. Re: Problem in operating the cluster
                          Intel Corporation
                          This message was posted on behalf of Intel Corporation

                          Hi,

                           

                          Please consider the below mentioned points for performance improvement in general -

                           

                          1. We suggest for you to move to the DevCloud with skylake xeon processors. We tested the cifar10_CNN.py example there, and observed a reduction in the execution time by 4x from what you've observed.

                           

                          2. You can play around with the configuration parameters provided for the optimization. You can modify all the parameters but more specifically modify the values for  intra_op_parallelism_threads, device_count.

                           

                          3. These optimizations are code dependent. There are certain codes that are implicitly single threaded and setting a greater number for say, intra_op_parallelism_threads would not
                          benefit it much. The time lag could be due to I/O bottleneck also. The best way to go about codes in case you observe a time lag would be to use a profiling tool to determine
                          where the code is taking the maximum time to execute and modify your code accordingly.

                           

                          Thanks & Regards,
                          Astha

                          • 10. Re: Problem in operating the cluster
                            Intel Corporation
                            This message was posted on behalf of Intel Corporation

                            Hi,

                            Hope the above suggestions were helpful. Can we go ahead and close the case.