I am attempting to run an application that uses mpich/mpiexec to assign threads to cores. I have had no trouble extensively running the same application on similar hardware (Dual Intel Xeon E5-2680-V3 system), in the same version of Fedora (23), which suggests to me that this is memory/hardware related .
I note that the E5-2680-V3 system (which runs the application without issue) does not have TSX-NI, whereas the E52697A v4 system does. Could this be the issue? Is it possible to disable TSX-NI on my E5-2697A v4 system to diagnose this? Otherwise, would updating the CPU microcode help?
Very little debugging info is given when the application fails, but given how quick it fails after execution, it is quite clear that something is very wrong here:
[wri@wrimodels12 runs]$ ems_domain --localize midatl
Starting UEMS Program ems_domain (V15.99.8) on wrimodels12 at Sat Dec 2 20:18:28 2017 UTC
* Localizing "midatl" domain - /home/wri/wrfems/uems/runs/midatl
Projection : lat-lon
Standard Longitude : -41 Degrees
Reference Latitude : 42 Degrees
Reference Longitude : -41 Degrees
Grid NX x NY : 495 x 165
Grid Spacing : 0.170 Degrees
Geog Dset Res : modis_lakes+modis_30s+modis_15s+10m
* Burn'n up 32 processors to localize your domain. Please ignore the smoke - Failed (11)
! Error running GEOGRID - System Signal Code (SN) : 11 (Invalid Memory Reference - Seg Fault)
While perusing the log/domain_geogrid_stdout.log file I saw the following:
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
Also use the --nogeogrid flag for debugging.
[wri@wrimodels12 static]$ /home/wri/wrfems/uems/util/mpich2/bin/mpiexec -n 32 /home/wri/wrfems/uems/bin/geogrid
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 47823 RUNNING AT wrimodels12
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions