9 Replies Latest reply on Jul 26, 2011 8:26 AM by wbrozas

    cannot ping cores sccKit 1.4.0 on new hardware

    wbrozas

      We recevied a new SCC unit because our old unit didn't have any usable EMAC ports. We are currently trying to install scckit 1.4.0. EMAC ports A,B, and D are enabled according to rocky lake board login screen whose output is provided below:

       

        Copyright 2010 by Intel Corporation
        Intel Labs - Germany Microprocessor Lab

       

        Board Serial# 01095100007
        Usable GB ETH 1101
        Software:     1.10  Build: 1228  Oct 12 2010  18:18:01
        CPLD:         1.07
        HW-ID:        0x00
        POWR1220:     0xC0000001 (master), 0x40000001 (slave)
        DDR3 modules: Present: 0 1 2 3 4 5 6 7

       

      We have gone through the scckit 1.4.0 instructions repeatedly using EMAC port A but have not been able to ping the cores (We have tried to use the 1.4.1.1 tarball as well). We can however initialize, reset and boot the cores (sccBmc, sccReset, sccBoot). So we believe the pci express interface is working correctly. We have tried to ping the cores without DNS (eg. ping 192.168.x.1) and have recieved no response.

       

      We get green connectivity lights on EMAC port A and flashing activity lights but still are unable to ping the cores. We are bypassing the dns and trying to ping the core ip addresses directly. Below is a copy of our /etc/network/interfaces and /opt/sccKit/systemSettings.ini are provided below. According to the sccGui the cores are booting correctly so we think it is some sort of networking problem. We are utilizing a switch in the same manner as the instructions and have no trouble connecting to the BMC.

       

       

      /etc/network/interfaces

       

      auto lo
      iface lo inet loopback

       

      auto eth0
      iface eth0 inet static
      address 172.16.72.147
      netmask 255.255.255.192
      gateway 172.16.72.129

       

      auto eth1
      iface eth1 inet static
      address 192.168.26.254
      netmask 255.255.255.0

       

      auto eth1:1
      iface eth1:1 inet static
      address 192.168.2.128
      netmask 255.255.255.0

       

      /opt/sccKit/systemSettings.ini

       

      [General]
      CRBServer=192.168.2.127:5010
      memorySize=8
      platform=RockyLake
      maxTransId=64
      sccFirstMac=00:11:22:33:44:55
      sccHostIp=192.168.26.254
      sccFirstIp=192.168.26.1
      sccMacEnable=a

       

      Is there anything obvious we are doing wrong? Is there anything we can do to better debug this problem?

       

      Thanks

        • 1. Re: cannot ping cores sccKit 1.4.0 on new hardware
          wbrozas

          I have redone the instructions using emac ports B and D and no activity lights come on and it does not work

          • 2. Re: cannot ping cores sccKit 1.4.0 on new hardware
            tedk

            Unfortunately, we are experiencing some difficulty with 1.4.0 with eMAC enabled, but I think we are only days away from a fix. The problem is described in http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=264

             

            What I found worked with another group today (Brown Univ) was to install 1.4.0 (not 1.4.1 or 1.4.1.1) and disable the eMAC interface. The link to the bug has an attachment called How to disable eMAC and that should take you through the steps.

             

            When I installed 1.4.1.1 and disabled the eMAC interface, I saw the MCPC crash when I tried to boot SCC Linux. Brown saw this same problem with their system.But when they dropped back to 1.4.0 and disabled eMAC, they were once again able to work. The bug also has attached files in.rck.zone and ex.rck.zone.

             

            Please try 1.4.0 and eMAC disabled. If this does not work, please file a Bugzilla bug.

            1 of 1 people found this helpful
            • 3. Re: cannot ping cores sccKit 1.4.0 on new hardware
              wbrozas

              Thats unfortunate because on 1.3 my mpi program hangs (mpirun -np 1 works, not to exciting though), I believe due to much input and output and correct me if I'm wrong but is the new direction of traffic over the emac ports suppose to fix that bug. Whats new to 1.4 if emac ports are disabled?

              • 4. Re: cannot ping cores sccKit 1.4.0 on new hardware
                tedk

                There's a doc that lists new 1.4.0 features on this site, but you are correct ... eMAC is the major new thing. However, I honestly believe that the fix is very close. It's at high priority.

                • 5. Re: cannot ping cores sccKit 1.4.0 on new hardware
                  wbrozas

                  I'm a little confused about the firmware that should be flashed for emac ports disabled.  What should be flashed and what should my systemSetting.ini look like. Also what does your /etc/network/interfaces file look like

                  • 6. Re: cannot ping cores sccKit 1.4.0 on new hardware
                    tedk

                    Use the latest firmware. You'll notice that the bitstreams have a _ab or a _cd in their name. When eMAC is enabled, you want the bitstream that is for the eMAC ports that you are using. But when eMAC is disabled you're using the non-eMAC portion of the bitstream and so it shouldn't matter which you use.

                     

                    On one of our internal marc systems we have rl_20110627_cd.bit. You can see what bitstream you are using with the "sccBmc -c set" command. Or just telnet into the BMC and issue set.

                     

                    When you install the bitstream with install.csh, you want the serial number in update.txt to be larger than the serial number in the update.txt in /flash on the BMC. If install.csh does not work, issue (as root) "apt-get remove crbif-dkms" before running it.

                     

                    You want the number in /opt/sccKit/current/firmware/RockyLake/update/update.txt to be larger than that in /flash/update.txt on the BMC. Since you have your own MCPC/SCC you can log into the BMC ar root@<the BMC IP address>. The root password is in install.csh and if you can become root on your MCPC you can read that.

                     

                    With /etc/network/interfaces ... comment our the eth1:1. You don't have to actually disconnect the switch, but you can if you want. You do want eth1 to be on the same subnet as the BMC. eth0 is your outside ethernet connection. Ths interfaces file has a quirk ... it will not accept spaces at the end of lines. We discovered this painfully.

                     

                    auto lo
                    iface lo inet loopback

                    auto eth0
                    iface eth0 inet dhcp <== we use dhcp in our Data Center but users often have a static eth0.
                    up service bind9 restart

                    auto eth1
                    iface eth1 inet static
                    address 10.3.16.25 <== our BMC IP for this machine is 10.3.16.125
                    netmask 255.255.255.0

                    auto crb0
                    iface crb0 inet static
                    address 192.168.1.254
                    netmask 255.255.255.0

                    up route add -net 192.168.0.0 netmask 255.255.255.0 gw 192.168.1.1 dev crb0
                    down route del -net 192.168.0.0 netmask 255.255.255.0 dev crb0

                    • 7. Re: cannot ping cores sccKit 1.4.0 on new hardware
                      wbrozas

                      Thanks its working now

                      • 8. Re: cannot ping cores sccKit 1.4.0 on new hardware
                        tedk

                        We have a sccKit_1.4.1.2.tar.bz2 on our SVN http://marcbug.scc-dc.com/svn/repository/tarballs/

                        If you had to disable eMAC on your 1.4.0, you should now be able to install 1.4.1 with patches 1 and 2 (called 1.4.1.2) and enable eMAC. This is a fix for bug 264 http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=264

                        • 9. Re: cannot ping cores sccKit 1.4.0 on new hardware
                          wbrozas

                          We have tried to install 1.4.1.2. sccBmc, sccReset and sccBoot appear to work. soon after the performance meters show that the cores are booted. After the cores booted we did a tcpdump and we found that no packets are leaving the emac ports (Note: we put a hub between the emac port and the switch to monitor network traffic). We can see that the management pc is sending arp requests to the emac port but still no response from the emac port.

                           

                          Also we have tried every usable emac port. 1101