9 Replies Latest reply on Apr 25, 2017 12:00 AM by Intel Corporation

    Intel Omni-Path implementation in an opensource HPC stack

    Oxedions

      Hi,

       

      I am currently building an Open Source HPC stack: GitHub - oxedions/banquise: HPC stack based on Salt

      This stack is made do deploy and maintain an HPC cluster, but I can also be used as a base for other tasks.

       

      However, I miss interconnect hardware to create other networks than basic Ethernet. I would like to add the Intel OmniPath compatibility in the stack.

      Is there a specific documentation describing how to setup Intel interconnect on RHEL 7, and also Debian/SLES ? Also, is there a specific place to download packages with I suppose libs, kernel modules, and tools ?

       

      With my best regards

       

      Ox

        • 1. Re: Intel Omni-Path implementation in an opensource HPC stack
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hello Ox,
           
                      Regarding your question, “I would like to add the Intel OmniPath compatibility in the stack.  And documentation describing how to setup Intel interconnect on RHEL 7, and also Debian/SLES ? “.
           
          Here are some information I found, http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fabric_Software_IG_H76467_v5_0.pdf, you could start on page 21.
           
          I am going to research more on this and I will get back with you as soon as possible.
           
          If there is anything else we can help please feel free to ask.

          Best regards,
           
          Henry A.

          • 2. Re: Intel Omni-Path implementation in an opensource HPC stack
            Intel Corporation
            This message was posted on behalf of Intel Corporation

            Hello Ox Oxedions,
             
            Please help us providing more information in order to better assist you with your question:
             
            1. Are you trying to build an Intel OPA cluster?
             
            2. Are you looking for documents on how to setup Intel’s OPA fabric?
             
            3. Are you looking for OPA software?
             
            4. Please let us know on what do you mean with “Interconnect”
             
            5. Please provide us more detailed information.
             
            We will be looking forward for your response.

            Best regards
             

            Sergio S.

            • 3. Re: Intel Omni-Path implementation in an opensource HPC stack
              Oxedions

              Hi Henry, Hi Sergio,

               

              Thank you very much for your answers.

               

              Henry: This is exactly what I was looking for. I didn't have the time to look deeper, but it's seems very similar to Mellanox OFED setup. I just have to extract what I need from the CLI install script (RPM installed and maybe other things done by this script).

               

              I just fail to understand where the subnet manager (if needed) should be running, and if opensm can be used as the subnet manager. Maybe Intel Omnipath assume fabric switches are always in charge of this task, and that no SM should be running on a linux sever. It must be described somewhere in the pdf, I will find it.

               

              Sergio:

              1. Are you trying to build an Intel OPA cluster?

              No, I don't have this hardware at disposal. However, Intel OPA cluster are now common in HPC, and this is why I want to add OPA support in my opensource tool. (see part 5).

               

              2. Are you looking for documents on how to setup Intel’s OPA fabric?

              Yes.

               

              3. Are you looking for OPA software?

              Yes. I found download procedure in the document provided by Henry, I will get it this way.

               

              4. Please let us know on what do you mean with “Interconnect”

              Of course. In common HPC clusters, there are 2 kinds of networks: eth network, for administration purposes, and interconnect. Interconnects are used for parallel computations (data exchanges between processes) and for IO (Parallel file system or NFS), because they provide very low latency and high bandwidth. Famous interconnects are Qlogic (now Intel) Infiniband, Mellanox Infiniband, Cray Aries and Intel Omnipath.

              Also, another difference with standard ethernet is that these interconnect networks are relying on a subnetmanager, that scan the network to provide a map with shortest path to all nodes, and discover new elements. Most of the time, the interconnect networks are using a specific topology to provide better performances (ftree, hypercube, alltoall, etc).

               

              5. Please provide us more detailed information.

              I am working on my free time on an OpenSource project, called Banquise. It is the result of what I think would be the next gen stack for HPC: no scripts, one tool, and ability to replay infinitely the "apply" to ensure its ok or to update things.

              The aim of this project is to allow small universities/companies to easily deploy and maintain an HPC cluster, so I need to provide a simple way to setup many things, and in particular interconnects if I want my project to meet success. I don't have the hardware, everything I develop when related to a specific hardware is done "virtually", and will be tested later.

               

              My aim here is to do something similar to this for Intel Omnipath (assuming it works like Mellanox IB):

              - a general state "client side", that install basic rpm, start needed services/load needed kernel modules, then that setup ipoib.

              - a specific state "server side" that install what is needed on the management server (Subnet Manager, monitoring tools, etc).

              - some probes for monitoring (like perfquery tool for infiniband) to be added in Shinken.

              I do not plan to provide a way to configure switches, this is the task of Intel tools, so I just assume the switches are already up and configured, or that the model used are passive.

               

              I hope these information will help you understand what is my goal and what I am looking for.

               

              With my best regards

               

              Ox

              • 4. Re: Intel Omni-Path implementation in an opensource HPC stack
                Intel Corporation
                This message was posted on behalf of Intel Corporation

                Hello Ox Oxedions;

                 

                Thank you for taking the time to provide more additional information, we are going to check on it and let you know as soon we have a response for you.

                 

                Thank you for your patience on this matter.

                 

                Regards.

                 

                Sergio S.

                • 5. Re: Intel Omni-Path implementation in an opensource HPC stack
                  Oxedions

                  Hi Sergio,

                   

                  Thank you very much.

                   

                  I am the one asking, so I will be patient.

                  And I still have few things to code before I can add the Interconnect states :-)

                   

                  With my best regards

                   

                  Ox

                  1 of 1 people found this helpful
                  • 6. Re: Intel Omni-Path implementation in an opensource HPC stack
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hello Ox,

                    Thanks for the update.

                    We are still working on your case and as soon as we have an update we will proceed to reply back to you.

                    Best regards,
                    Caesar B.
                     

                    1 of 1 people found this helpful
                    • 7. Re: Intel Omni-Path implementation in an opensource HPC stack
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Hello OX,

                      For documents on how to setup Intel's Omni-Path Fabric.

                      A good starting point is the “Intel Omni-path Fabric Staging Guide” (10.2 and 10.3 releases, and the name may be changed to something like Setup Guide in 10.4 release).

                      For switch hardware setup, detailed instruction is documented in the “Intel® Omni-Path Fabric Switches Hardware Installation Guide”. Setup of the host interface card is documented in “Intel® Omni-Path Host Fabric Interface Installation Guide”.

                      For software setup, detailed instruction is documented in the “Intel® Omni-Path Fabric Software Installation Guide”. All documents are available for public download and updated with each OPA software release. ( http://www.intel.com/content/www/us/en/support/network-and-i-o/fabric-products/000016242.html )

                      The interconnect networks that are relying on a subnet manager, that scan the network to provide a map with shortest path to all nodes, and discover new elements are called “Fabric” by Intel. These interconnect networks are using a specific topologies to provide better performances (ftree, hypercube, alltoall, etc). 

                      Please take into consideration there are other documents released with each Omni-Path(OPA) software release. Some examples are “Intel® Performance Scaled Messaging 2 (PSM2) Programmer’s Guide”, “Intel® Omni-Path Fabric Performance Tuning User Guide”, etc.

                      Please let me know if you have any more questions.

                      Best regards,
                      Caesar B.






                       

                      1 of 1 people found this helpful
                      • 8. Re: Intel Omni-Path implementation in an opensource HPC stack
                        Oxedions

                        Hi Caesar,

                         

                        My apologies for the delay of my answer.

                         

                        Using this, I will add an experimental support for Intel OPA in the next release of my tool, and then find someone with the good hardware to test it.

                        Because I am focusing on the main engine to enable multiple nodes group (needed for monitoring, but iwll be usefull for other softwares) and multiple networks, it may not be released before this summer.

                         

                        Thank you very much for all of these files and links !

                         

                        With my best regards

                         

                        Ox

                        • 9. Re: Intel Omni-Path implementation in an opensource HPC stack
                          Intel Corporation
                          This message was posted on behalf of Intel Corporation

                          You are welcome, do not hesitate in contacting us if you need further assistance.

                          Best Regards,

                          Steven V.

                          1 of 1 people found this helpful