3 Replies Latest reply on Jul 16, 2018 6:40 AM by AswathyC

    Caffe Solver train error

    JosephT

      I tried to run SGD training as described on the intel caffe SGD wiki. I created the lmdb, mean files and ran the caffe train command. But getting below error during first iteration itself.

      Network got initalized successfully. https://github.com/intel/caffe/issues/135

       

      There is a post related to this error.    https://github.com/intel/caffe/issues/135

      https://github.com/intel/caffe/issues/135

       

      I0711 23:41:01.134186 226670 net.cpp:521] Network initialization done.

      I0711 23:41:01.134258 226670 solver.cpp:121] Solver scaffolding done.

      I0711 23:41:01.134276 226670 caffe.cpp:342] Starting Optimization

      I0711 23:41:01.134280 226670 solver.cpp:397] Solving AlexNet

      I0711 23:41:01.134284 226670 solver.cpp:398] Learning Rate Policy: step

      I0711 23:41:01.134289 226670 solver.cpp:474] Iteration 0, Testing net (#0)

      I0711 23:41:01.134357 226670 net.cpp:1560] Ignoring source layer relu1

      I0711 23:41:01.134362 226670 net.cpp:1560] Ignoring source layer relu2

      I0711 23:41:02.007095 226670 solver.cpp:563]     Test net output #0: accuracy = 0.24

      I0711 23:41:02.007171 226670 solver.cpp:563]     Test net output #1: loss = 9.35955 (* 1 = 9.35955 loss)

      I0711 23:41:02.867419 226670 solver.cpp:312] Iteration 0, loss = 13.9127

      I0711 23:41:02.867477 226670 solver.cpp:333]     Train net output #0: loss = 13.9127 (* 1 = 13.9127 loss)

      I0711 23:41:02.867499 226670 sgd_solver.cpp:215] Iteration 0, lr = 0.001

      F0711 23:41:02.867722 226670 sgd_solver.cpp:431] Check failed: true == net_params[param_id]->get_prv_data_descriptor()->layout_compare( net_params[param_id]->get_prv_diff_descriptor()) (1 vs. 0)

      *** Check failure stack trace: ***

          @     0x2adbf39c047d  google::LogMessage::Fail()

          @     0x2adbf39c3fc5  google::LogMessage::SendToLog()

          @     0x2adbf39bff93  google::LogMessage::Flush()

          @     0x2adbf39c54de  google::LogMessageFatal::~LogMessageFatal()

          @     0x2adbf2ce68fa  _ZN5caffe9SGDSolverIfE9SGDFusionEif.h

          @     0x2adbf2ce5346  _ZN5caffe9SGDSolverIfE11ApplyUpdateEi.h

          @     0x2adbf2cf7692  caffe::SGDSolver<>::ApplyUpdate()

          @     0x2adbf2b99059  _ZN5caffe6SolverIfE4StepEi.h

          @     0x2adbf2b97887  _ZN5caffe6SolverIfE5SolveEPKc.h

          @     0x5638504d603c  train()

          @     0x5638504d44f2  main

          @     0x2adbf79d1c05  __libc_start_main

          @     0x5638504d42e9  (unknown)

      /var/spool/torque/mom_priv/jobs/2801.c002.SC: line 4: 226670 Aborted                 caffe train -solver solver.prototxt