(Click to open topic with navigation)
If you use an Intel Many-Integrated Cores (MIC) architecture-based product (e.g., Intel Xeon Phi™) in your cluster for parallel processing, you must configure TORQUE to detect them.
Prerequisites
Setup Options
There are two ways to configure MIC-based devices with TORQUE: (1) manually and (2) by auto-detection.
Manual configuration
napali np=12 mics=2
Auto-detect
When you use auto-detection, pbs_mom discovers the MIC-based devices and reports them to pbs_server.
./configure --enable-mics <other configure options>
TORQUE
pbsnodes
Example 20-2: pbsnodes output
slesmic state = free np = 100 ntype = cluster status = rectime=1347634381,varattr=,jobs=,state=free,netload=7442004852,gres=,loadave=0.00,ncpus=32,physmem=65925692kb,availmem=66531344kb,totmem=68028984kb,idletime=59059,nusers=2,nsessions=8,sessions=4387 4391 4392 4436 4439 4443 4459 100395,uname=Linux slesmic 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 mics = 2 mic_status = mic[1]=mic_id=8796;num_cores=61;num_threads=244;physmem=8065748992;free_physmem=7854972928;swap=0;free_swap=0;max_frequency=1090;isa=COI_ISA_KNC;load=0.000000;normalized_load=0.000000;,mic[0]=mic_id=8796;num_cores=61;num_threads=244;physmem=8065748992;free_physmem=7872712704;swap=0;free_swap=0;max_frequency=1090;isa=COI_ISA_KNC;load=0.540000;normalized_load=0.008852; rhmic.ac state = free np = 100 ntype = cluster status = rectime=1347634381,varattr=,jobs=,state=free,netload=3006171583,gres=,loadave=0.00,ncpus=32,physmem=65918268kb,availmem=66901588kb,totmem=67982644kb,idletime=59477,nusers=2,nsessions=2,sessions=3401 29320,uname=Linux rhmic.ac 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 mics = 1 mic_status = mic[0]=mic_id=8796;num_cores=61;num_threads=244;physmem=8065748992;free_physmem=7872032768;swap=0;free_swap=0;max_frequency=1090;isa=COI_ISA_KNC;load=0.540000;normalized_load=0.008852;<mic_status>;
Moab
mdiag -n -v
Example 20-3: mdiag -n -v output
$ mdiag -n -v compute node summary Name State Procs Memory Disk Swap Speed Opsys Arch Par Load Classes Features hola Idle 4:4 8002:8002 1:1 10236:13723 1.00 linux - hol 0.24 [batch] - GRES=MICS:2, ----- --- 4:4 8002:8002 1:1 10236:13723 Total Nodes: 1 (Active: 0 Idle: 1 Down: 0)
checknode -v
Example 20-4: checknode output
$ checknode slesmic node slesmic State: Idle (in current state for 00:00:16) Configured Resources: PROCS: 100 MEM: 62G SWAP: 64G DISK: 1M MICS: 2 Utilized Resources: SWAP: 1581M Dedicated Resources: --- Generic Metrics: mic1_mic_id=8796.00,mic1_num_cores=61.00,mic1_num_threads=244.00,mic1_physmem=8065748992.00,mic1_free_physmem=7854972928.00,mic1_swap=0.00,mic1_free_swap=0.00,mic1_max_frequency=1090.00,mic1_load=0.12,mic1_normalized_load=0.00,mic0_mic_id=8796.00,mic0_num_cores=61.00,mic0_num_threads=244.00,mic0_physmem=8065748992.00,mic0_free_physmem=7872679936.00,mic0_swap=0.00,mic0_free_swap=0.00,mic0_max_frequency=1090.00 MTBF(longterm): INFINITY MTBF(24h): INFINITY Opsys: linux Arch: --- Speed: 1.00 CPULoad: 0.000 Classes: [batch] RM[napali]* TYPE=PBS EffNodeAccessPolicy: SHARED Total Time: 3:45:43 Up: 3:45:43 (100.00%) Active: 00:00:00 (0.00%) Reservations: ---
Syntax
Example 20-5: Request MIC-based device(s) in qsub
qsub .... -l nodes=X:mics=Y
Because these resources are delimited with a colon, this command requests a job with X nodes and Y mics per task. If you run the same command and delimit the resources with a comma (qsub .... -l nodes=X,mics=Y), you request a job with X nodes and Y mics per job.
qstat -f
Example 20-6: qstat -f output
Job Id: 5271.napali Job_Name = STDIN Job_Owner = dbeer@napali job_state = Q queue = batch server = napali Checkpoint = u ctime = Fri Sep 14 08:56:33 2012 Error_Path = napali:/home/dbeer/dev/private-torque/trunk/STDIN.e5271 Hold_Types = n Join_Path = oe Keep_Files = n Mail_Points = a mtime = Fri Sep 14 08:56:33 2012 Output_Path = napali:/home/dbeer/dev/private-torque/trunk/STDIN.o5271 Priority = 0 qtime = Fri Sep 14 08:56:33 2012 Rerunable = True Resource_List.neednodes = 1:mics=1 Resource_List.nodect = 1 Resource_List.nodes = 1:mics=1 substate = 10 Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/dbeer, PBS_O_LOGNAME=dbeer, PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/b in:/usr/games,PBS_O_MAIL=/var/mail/dbeer,PBS_O_SHELL=/bin/bash, PBS_O_LANG=en_US.UTF-8, PBS_O_SUBMIT_FILTER=/usr/local/sbin/torque_submitfilter, PBS_O_WORKDIR=/home/dbeer/dev/private-torque/trunk,PBS_O_HOST=napali, PBS_O_SERVER=napali euser = dbeer egroup = company queue_rank = 3 queue_type = E etime = Fri Sep 14 08:56:33 2012 submit_args = -l nodes=1:mics=1 fault_tolerant = False job_radix = 0 submit_host = napali
checkjob -v
Example 20-7: checkjob -v output
dthompson@mahalo:~/dev/moab-test/trunk$ checkjob -v 2 job 2 (RM job '2.mahalo') AName: STDIN State: Idle Creds: user:dthompson group:dthompson class:batch WallTime: 00:00:00 of 1:00:00 SubmitTime: Thu Sep 13 17:06:06 (Time Queued Total: 00:00:24 Eligible: 00:00:02) TemplateSets: DEFAULT Total Requested Tasks: 1 Req[0] TaskCount: 1 Partition: ALL Dedicated Resources Per Task: PROCS: 1 MICS: 1 ...