9 Replies Latest reply on Feb 11, 2018 10:48 PM by Intel Corporation

    How to install Intel Optimized Chainer on Intel AI Dev Cloud?

    jxpxxzj

      https://github.com/intel/chainer

       

      Following its installation guide, I cloned the source into my user directory:

       

      [u9851@c009 ~]$ git clone https://github.com/intel/chainer.git

      [u9851@c009 ~]$ cd chainer

      [u9851@c009 chainer]$ python setup.py install

      running install

      Intel mkl-dnn preparing ...

      Intel mkl-dnn prepared !

      running build_ext

      building 'ideep4py._ideep4py' extension

      swigging ideep4py/py/ideep4py.i to ideep4py/py/ideep4py_wrap.cpp

      swig -python -c++ -builtin -modern -modernargs -Iideep4py/py/mm -Iideep4py/py/primitives -Iideep4py/py/swig_utils -Iideep4py/include/primitives/ -Iideep4py/include/mm/ -o ideep4py/py/ideep4py_wrap.cpp ideep4py/py/ideep4py.i

      unable to execute 'swig': No such file or directory

      error: command 'swig' failed with exit status 1

       

      I think it told me that swig is missing, so I tried to install setuptools:

       

      [u9851@c009 chainer]$ pip install setuptools --user

      Requirement already satisfied: setuptools in /glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages

      [u9851@c009 chainer]$ pip install -U setuptools

      Collecting setuptools

        Using cached setuptools-38.4.0-py2.py3-none-any.whl

      Installing collected packages: setuptools

        Found existing installation: setuptools 27.2.0

          Uninstalling setuptools-27.2.0:

      Exception:

      Traceback (most recent call last):

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/shutil.py", line 544, in move

          os.rename(src, real_dst)

      OSError: [Errno 18] Invalid cross-device link: '/glob/intel-python/versions/2018u1/intelpython3/bin/easy_install' -> '/tmp/pip-31za3ler-uninstall/glob/intel-python/versions/2018u1/intelpython3/bin/easy_install'

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main

          status = self.run(options, args)

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/commands/install.py", line 342, in run

          prefix=options.prefix_path,

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/req/req_set.py", line 778, in install

          requirement.uninstall(auto_confirm=True)

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/req/req_install.py", line 754, in uninstall

          paths_to_remove.remove(auto_confirm)

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/req/req_uninstall.py", line 115, in remove

          renames(path, new_path)

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/site-packages/pip/utils/__init__.py", line 267, in renames

          shutil.move(old, new)

        File "/glob/intel-python/versions/2018u1/intelpython3/lib/python3.6/shutil.py", line 559, in move

          os.unlink(src)

      OSError: [Errno 30] Read-only file system: '/glob/intel-python/versions/2018u1/intelpython3/bin/easy_install

       

      On the other hand, I also tried to install swig from source, but failed:

      [u9851@c009 swig]$ ./configure.ac

      -bash: ./configure.ac: Permission denied

       

      So what is the right way to install intel/chainer on the Intel AI Dev Cloud?

      It really needs a swig, so is it possible to install swig on that computer?

      And where should I install intel/chainer, the login node or computation node by qsub?

       

      In addition, intel/chainer was successfully installed on my own computer, but when I import it in Python, it throws an exception:

      >>> import chainer

      terminate called after throwing an instance of 'mkldnn::error'

        what():  std::exception

      Aborted

      It may caused by Intel MKL-DNN, buy I have no idea about solving it.

        • 1. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi i am looking into this i will get back to you

          • 2. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
            Intel Corporation
            This message was posted on behalf of Intel Corporation

            Hi,
            Please follow below steps for chainer installation
            1) Create conda environment as below
                conda create --name chainer_env python=3.6
            2) Activate conda environment as below
               source activate chainer_env
            3)git clone -b master_v3 https://github.com/intel/chainer
            4)cd chainer
            5)pip install setuptools
            6)pip install numpy
            7)conda install -c intel swig
            8)python setup.py build
            9)python setup.py install
            10)pip install six
            11)pip install filelock
            now try importing chainer if you still get mkl-dnn error then install mkl-dnn as below
            conda install -c intel mkl-dnn

            • 3. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
              jxpxxzj

              Thanks for your solution, but I got an failure when running the first step:

              [u9851@c009 ~]$ conda create --name chainer_env python=3.6

              Solving environment: failed

              libgcc_s.so.1 must be installed for pthread_cancel to work

              Aborted

               

              What I did before was I successfully installed swig by these commands:

              ./configure  --prefix=/home/u9851

              make

              make install

              And then I installed chainer on the login node by folllowing its instruction using python setup.py install

              However, neither qsub or Jupyter Notebook could import chainer successfully:

              Traceback (most recent call last):

                File "./train_mnist.py", line 8, in <module>

                  import chainer.functions as F

              ModuleNotFoundError: No module named 'chainer.functions'

               

              Did these installation commands affect the conda enviroment?

              • 4. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
                Intel Corporation
                This message was posted on behalf of Intel Corporation

                Hi,
                can you please run the following command and share me the output.
                 find /usr/ -name libgcc_s*

                • 5. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  Hi,
                  can i know from where your running your code?
                  i am able to import chainer.functions
                  please see the below output i am running it from chainer directory
                  (chainer_noon) [u5656@c009 ~]$ cd chainer
                  (chainer_noon) [u5656@c009 chainer]$ python
                  Python 3.6.3 |Intel Corporation| (default, Oct 16 2017, 15:28:36)
                  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
                  Type "help", "copyright", "credits" or "license" for more information.
                  Intel(R) Distribution for Python is brought to you by Intel Corporation.
                  Please check out: https://software.intel.com/en-us/python-distribution
                  >>> import chainer.functions as F
                  >>>
                   

                  • 6. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
                    jxpxxzj

                    Here's the output:

                    [u9851@c009 ~]$  find /usr/ -name libgcc_s*

                    find: ‘/usr/lib/firewalld’: Permission denied

                    /usr/lib/gcc/x86_64-redhat-linux/4.8.2/32/libgcc_s.so

                    /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_s.so

                    /usr/lib64/libgcc_s-4.8.5-20150702.so.1

                    /usr/lib64/libgcc_s.so.1

                    find: ‘/usr/share/polkit-1/rules.d’: Permission denied

                    find: ‘/usr/libexec/initscripts/legacy-actions/auditd’: Permission denied

                    • 7. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
                      jxpxxzj

                      Thank you for you reply, I tried it in two ways:

                       

                      First, I did it in login node, which didn't throw exeption:

                      [u9851@c009 ~]$ cd chainer

                      [u9851@c009 chainer]$ python

                      Python 3.6.3 |Intel Corporation| (default, Oct 16 2017, 15:28:36)

                      [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux

                      Type "help", "copyright", "credits" or "license" for more information.

                      Intel(R) Distribution for Python is brought to you by Intel Corporation.

                      Please check out: https://software.intel.com/en-us/python-distribution

                      >>> import chainer.functions as F

                      >>>

                       

                      Then I create a .py file and a .sh script, using qsub to commit:

                       

                      Here's python file: test_chainer.py

                      import chainer.functions as F

                      print(dir(F))

                       

                      And shell script: test.sh

                      cd chainer

                      python test_chainer.py

                       

                      I got a right output which contains functions in chainer.functions

                      ['AbsoluteError',  ... ] # It is too long to wrote here, but it is the right output

                       

                      Finally I ran its mnist example, seems every works fine:

                      [u9851@c009 ~]$ cat test.sh.o38633

                       

                      ########################################################################

                      # Colfax Cluster - https://colfaxresearch.com/

                      #      Date:           Tue Jan 30 02:43:34 PST 2018

                      #    Job ID:           38633.c009

                      #      User:           u9851

                      # Resources:           neednodes=1:ppn=2,nodes=1:ppn=2,walltime=06:00:00

                      ########################################################################

                       

                      GPU: -1

                      # unit: 1000

                      # Minibatch-size: 100

                      # epoch: 20

                       

                      epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time

                      1           0.193224    0.140609              0.94215        0.9526                    7.7509

                      ......

                      20          0.00932911  0.106015              0.997183       0.9826                    137.234

                       

                      ########################################################################

                      # Colfax Cluster

                      # End of output for job 38633.c009

                      # Date: Tue Jan 30 02:45:58 PST 2018

                      ########################################################################

                       

                      But if i wrote the .py file outside the chainer source directory, I would get an ModuleNotFoundError:

                      Traceback (most recent call last):

                        File "test_chainer_outside.py", line 1, in <module>

                          import chainer.functions as F

                      ModuleNotFoundError: No module named 'chainer.functions'

                      It would happen either login node or computation node.

                       

                      Maybe the final problem should be fixed with a configured conda enviroment, but anyway chainer works, and I am really appreciate for the hints you gave me.

                       

                      In addition, I am a little curious about the simple benchmark result, the time perfomance was similar to chainer/chainer which I installed from pip:

                      epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time

                      1           0.193959    0.0864066             0.941633       0.972                     7.18748

                      ......

                      20          0.010036    0.100875              0.99715        0.9834                    134.877

                       

                      So is Intel AI Dev Cloud suitable for Intel Optimized Chainer, or it is a problem about intel/mkl-dnn?

                      • 8. Re: How to install Intel Optimized Chainer on Intel AI Dev Cloud?
                        Intel Corporation
                        This message was posted on behalf of Intel Corporation

                        Hi,
                        since it is a new topic please raise a new thread.
                        As this issue got resolve can i close this ticket.