11 Replies Latest reply on Apr 30, 2011 8:14 AM by saibbot

    sccKit 1.4.0

    tedk

      Note that sccKit 1.4.0 is available. For details look at sccKit 1.4.0 post and What's New in sccKit 1.4.0.

       

      If you are using a Data Center system, your PI (Principal Investigator) must specifically request the upgrade. Do so by sending email to scc_research_proposals@intel.com. We are requiring this approval becasue we do not want to affect ongoing research measurements. The upgrade itself will take about two hours. We won't upgrade without PI approval.

       

      If you have our own MCPC/SCC system and have upgraded it to sccKit 1.4.0, please don't hesitate to share your experiences on this forum..

        • 1. Re: sccKit 1.4.0
          roybakker

          Some experiences:

           

          We did not experience any blocking problems with the new sccKit, otherwise we would have posted that ;-)

           

          Here at the University of Amsterdam I installed the upgrade and sccKit 1.4.0 a few weeks ago (2 days after the first release). For the new hardware configuration, we didn't use a gigabyte switch, but inserted a third (gigabyte) network card into our MCPC. Since then, We haven't had the high IO load crashes anymore. Just curious, how do you solve this in the datacenter, are you putting switches between all the machines?

           

          We noticed that (however some was solved last week by providing more linux images), the intel sources are not always compatible with the new configuration. There are a lot of new possibilities, and it takes a while to investigate it all next to our ongoing research.

           

          We might want to experiment with different linux configurations including POP-SHM. It would be great if that source becomes available, so we can adapt that, make some changes and compile our own kernel as we could do before the update.

           

          A quick overview of our current MCPC system (network) layout:

           

          System load:    0.51               Users logged in:     0
          Usage of /home: 4.1% of 171.48GB   IP address for eth2: 192.168.3.254
          Memory usage:   41%                IP address for eth0: 146.50.56.56
          Swap usage:     0%                 IP address for eth1: 192.168.2.1
          Processes:      189                IP address for crb0: 192.168.1.254

           

          If someone has questions or needs help, feel free to contact.

           

          Roy.

          bakkerr AT science DOT uva DOT nl

           

          ps in our spare time we enjoyed playing Doom on the SCC, but the latency for keyboard and mouse input is too high to make it really playable ;-) I didn't manage to get mplayer working, it seems to be missing codecs configuration (it expects to find some files in mbrummer's homedir). No time (and desperate need) to figure out what the problem really is and solve it.

          • 2. Re: sccKit 1.4.0
            tedk

            The new linux source will be available hopefully today. The delay was due to the fact that it required an Intel enviornment in odrder to build and that's now been fixed. In addition, there were lots of temporary files made after a test build that remained after cleaning. So it was just a matter of finding out exactly what were the right files to check in and modifying the build scripts. This is more bookkeeping than engineering but had to be done.

             

            In the data center we have one big switch or rather several switches connecting to a master switch which sits behind our router. We didn't install an extra network card, but that method works as well.

            • 3. Re: sccKit 1.4.0
              merritt.alex

              Hi,

               

              We're trying to get our SCC working with the new kit, but are running into training failures. We built it and can run sccGui as well as sccBmc, but sccBmc -i repeatedly informs us that all forms of training during the process fail.

               

              Just to give some background, with sccKit 1.3.0 training would only sometimes fail. However, the most successful configuration was when the adapter was in the x4 slot instead of the recommended x16. With 1.4.0, we have been unable to get any of the configurations to train properly.

               

              We have the recommended DX58SO motherboard for the MCPC. We have tried the folowing versions of the BIOS for our motherboard: Dec 15 2010, Nov 09 2010 and June 14 2010.

               

              Any suggestions on how to proceed? We can provide detailed logs if necessary.

               

              Thanks!

              Alex

              • 4. Re: sccKit 1.4.0
                tedk

                I checked with our Augie (our data center administrator) about this. Normally, when something like this happens, it's a bad CPU. But in your case it happened just after installing1.4.0, so we shouldn't jump to conclusions.

                 

                I suspect you have already shut everything down and started up from scratch.Basically what you want is the SCC unit running when you reboot the MCPC. When rebooting it detects that the SCC is there and laods the crbif driver.

                 

                Faced with that problem here, I would revert to 1.3.0 to verify that the hw is working. If this problem persists, please file a Bugzilla bug and we'll track it there.

                • 5. Re: sccKit 1.4.0
                  florian.thoma

                  I just upgraded from 1.3.0 to 1.4.0 and I can't get the EMAC working. MCPC (eth1), BMC and SCC(port A "Usable GB ETH 1101") are connected to a gigabit ethernet switch. Accessing the bmc and the cores with the scc* tools works, only access via tcp/ip to the cores fails. My guess is that the network setup on the mcpc is broken but I found no obvious error. Maybe you see the problem.

                   

                  ~$ ifconfig -a

                  crb0      Link encap:Ethernet  HWaddr 00:00:00:00:00:00 
                            inet addr:192.168.1.254  Mask:255.255.255.0
                            UP RUNNING NOARP  MTU:1500  Metric:1
                            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
                            collisions:0 txqueuelen:1000
                            RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

                   

                  eth0      Link encap:Ethernet  HWaddr 70:71:bc:bc:88:1d 
                            inet addr:172.22.141.161  Bcast:172.22.141.255  Mask:255.255.255.0
                            inet6 addr: fe80::7271:bcff:febc:881d/64 Scope:Link
                            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                            RX packets:8215 errors:0 dropped:0 overruns:0 frame:0
                            TX packets:8953 errors:0 dropped:0 overruns:0 carrier:0
                            collisions:0 txqueuelen:1000
                            RX bytes:1042093 (1.0 MB)  TX bytes:9402245 (9.4 MB)
                            Memory:d3300000-d3320000

                   

                  eth1      Link encap:Ethernet  HWaddr 00:0a:5e:56:fd:16 
                            inet addr:192.168.3.254  Bcast:192.168.3.255  Mask:255.255.255.0
                            inet6 addr: fe80::20a:5eff:fe56:fd16/64 Scope:Link
                            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                            TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
                            collisions:0 txqueuelen:1000
                            RX bytes:0 (0.0 B)  TX bytes:3831 (3.8 KB)
                            Interrupt:18 Base address:0xa000

                   

                  eth1:1    Link encap:Ethernet  HWaddr 00:0a:5e:56:fd:16 
                            inet addr:192.168.2.254  Bcast:192.168.2.255  Mask:255.255.255.0
                            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                            Interrupt:18 Base address:0xa000

                   

                  lo        Link encap:Local Loopback 
                            inet addr:127.0.0.1  Mask:255.0.0.0
                            inet6 addr: ::1/128 Scope:Host
                            UP LOOPBACK RUNNING  MTU:16436  Metric:1
                            RX packets:5889 errors:0 dropped:0 overruns:0 frame:0
                            TX packets:5889 errors:0 dropped:0 overruns:0 carrier:0
                            collisions:0 txqueuelen:0
                            RX bytes:9496637 (9.4 MB)  TX bytes:9496637 (9.4 MB)

                   

                  ~$ route
                  Kernel IP routing table
                  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
                  192.168.3.0     *               255.255.255.0   U     0      0        0 eth1
                  192.168.2.0     *               255.255.255.0   U     0      0        0 eth1
                  192.168.1.0     *               255.255.255.0   U     0      0        0 crb0
                  192.168.0.0     192.168.1.1     255.255.255.0   UG    0      0        0 crb0
                  172.22.141.0    *               255.255.255.0   U     0      0        0 eth0
                  link-local      *               255.255.0.0     U     1000   0        0 crb0
                  default         172.22.141.254  0.0.0.0         UG    100    0        0 eth0

                   

                  ~$ cat /opt/sccKit/systemSettings.ini
                  [General]                                                                                                                                                    
                  CRBServer=192.168.2.127:5010                                                                                                                                 
                  memorySize=8                                                                                                                                                 
                  platform=RockyLake                                                                                                                                           
                  maxTransId=64                                                                                                                                                
                  sccFirstMac=00:45:4D:41:44:31                                                                                                                                
                  sccHostIp=192.168.3.254
                  sccFirstIp=192.168.3.1
                  sccMacEnable=a

                   

                  ~$ sccBmc -i

                  ...

                  INFO: [line 3601]  ---- Rock Creek setup DONE ---------------------------
                  INFO: (Re-)configuring GRB registers...

                   

                  ~$ sccReset -g

                  ...

                   

                  ~$ sccBoot -l

                  INFO: Linux has been started successfully. Cores should be reachable via TCP/IP shortly...

                   

                  ~$ ssh root@rck00
                  ssh: connect to host rck00 port 22: No route to host

                   

                  ~$ ping rck00
                  PING rck00.in.rck.net (192.168.3.1) 56(84) bytes of data.
                  From rckhost.ex.rck.net (192.168.3.254) icmp_seq=1 Destination Host Unreachable

                   

                  ~$ nmap -v 192.168.3.1-48

                   

                  Starting Nmap 5.00 ( http://nmap.org ) at 2011-04-27 16:17 CEST
                  NSE: Loaded 0 scripts for scanning.
                  Initiating Ping Scan at 16:17
                  Scanning 48 hosts [2 ports/host]
                  Completed Ping Scan at 16:17, 8.22s elapsed (48 total hosts)
                  Read data files from: /usr/share/nmap
                  Nmap done: 48 IP addresses (0 hosts up) scanned in 8.30 seconds

                   

                  Any suggestions?

                  • 6. Re: sccKit 1.4.0
                    tedk

                    I put this in as a P2/blocker bug in our bugzilla.

                    http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=209

                     

                    I didn't see you as a Bugzilla user so I couldn't put you on the CC list. Please create a bugzilla account for yourself. A problem this detailed and technical should be discussed in Bugzilla where we can prioritize and track it and get more eyes on it.

                     

                    I haven't see this problem before. 1.4.0 installed cleanly for us once we figured out the process. Sometimes we would inadvertantly skip a step or have the wrong permission  but nothing serious that wasn't easily fixed.

                     

                    Can you ping rck00 with its IP address? Like "ping 192.168.3.1"

                     

                    I'd also  recommend looking at the network config files in /etc/bind, specifically the *.zone files. If they look OK (consult our installtion 1.4.0 pdf, which I suspect you already have) and update the serial value. Then restart bind.

                     

                    Does /opt/sccKit have a .ssh2 directory?

                    • 7. Re: sccKit 1.4.0
                      florian.thoma

                      Thanks for your quick reply.

                      I didn't know about bugzilla but I will continue there when I have everything checked against the newest version of the installation guide.

                      • 8. Re: sccKit 1.4.0
                        tedk

                        We noticed this in the data center ...

                         

                        A system reports two good ports A and C. Plug the cable into port A. The wink lights don't act properly ... asynchonous blinking. Installation goes fine but cores will not respond to a ping. Change cable to Port C. It would seem the GB ETH XXXX results do not always prove the port is good.

                        • 9. Re: sccKit 1.4.0
                          saibbot

                          Hello,

                           

                          After the upgrade to sccKit 1.4.0, I wanted to use the linux image from sccKit 1.3.0 in order to check something (noticed some weird behavior of my already tested programs). So I loaded the linux from sccKit 1.3.0 (from /opt/sccKit/1.3.0/resources) but the cores were unreachable (with the linux image from 1.4.0 they are reachable). I tried using both the commands from 1.3.0 and 1.4.0.

                           

                          ssh -F /opt/sccKit/.ssh2/openssh_config root@rck05
                          ssh: connect to host rck05 port 22: No route to host
                          
                          PING rck05.in.rck.net (192.168.46.6) 56(84) bytes of data.
                          From rckhost.ex.rck.net (192.168.46.254) icmp_seq=2 Destination Host Unreachable
                          

                           

                           

                          Do I need to take any extra steps to make it work, or is there any other problem?

                           

                          Vasileios (marc026).

                          • 10. Re: sccKit 1.4.0
                            tedk

                            I wouldn't expect a 1.4.0 installation to work with the linux.obj that came with 1.3.0. Have you enabled the eMAC interface in systemSettings.ini? The 1.3.0 linux does not contain eMAC drivers.

                             

                            What you might have to do is edit systemSetting.ini to look like its 1.3.0 version (although I'd leave the maxTransId assignment) and return the zone files to what they were previously (restart bind). named.conf.local also has some eMAC stuff in it.

                             

                            What we do here is save all the 1.3.0 config files. Then we can return to 1.3.0 by remaking links and copying over some files ... takes a few minutes to go from 1.4.0 to 1.3.0 that way. But we haven't tried running an older 1.3.0 linux under 1.4.0.

                            • 11. Re: sccKit 1.4.0
                              saibbot

                              Yes, the eMac is enabled. Right, it could not work.

                               

                              I will try your suggestions whenever I have time.

                               

                              Thanks a lot.