|
|
(37 intermediate revisions by 4 users not shown) |
Line 1: |
Line 1: |
− | [[Category:SLURM]] | + | [[Category:Scheduler]][[Category:Infrastructure]] |
− | ==Summary==
| + | HiPerGator users may finely control selection of compute hardware for a SLURM job like specific processor families, processor models by using the <code>--constraint</code> directive to specify HiPerGator server ''features''. |
− | HiPerGator users may finely control the compute nodes requested by a given SLURM job (e.g. specific processor families, processor models) by using the <code>--constraint</code> directive to specify the node features they desire. Note that the partition must be selected if it's not the default partition. | |
| | | |
| ;Example: | | ;Example: |
− | #SBATCH --partition=hpg1-compute
| |
− | #SBATCH --constraint=westmere
| |
− | but
| |
− | #SBATCH --constraint=haswell
| |
− | for the default hpg2-compute partition.
| |
| | | |
− | ==Using node features as job constraints== | + | Use one of the following commands to specify between rome and milan microarchitectures |
− | ===Commonly constrained features=== | + | #SBATCH --constraint=rome |
− | Use node features as SLURM job constraints.
| + | #SBATCH --constraint=milan |
| | | |
− | A non-exhaustive list of commonly used feature constraints, found to be generally useful:
| + | Basic boolean logic can be used to request combinations of features. For example, to request nodes that have Intel processors '''AND''' InfiniBand interconnect use |
− | {| class="wikitable"
| |
− | | align="center" style="background:#f0f0f0;"|'''Feature'''
| |
− | | align="center" style="background:#f0f0f0;"|'''Constraints'''
| |
− | | align="center" style="background:#f0f0f0;"|'''Description'''
| |
− | |-
| |
− | | Compute partition||<code>hpg1</code> , <code>hpg2</code>||''Requests nodes within a specified compute partition''
| |
− | |-
| |
− | | Processor family||<code>amd</code> , <code>intel</code>||''Requests nodes having processors of a specified vendor''
| |
− | |-
| |
− | | Network fabric||<code>infiniband</code>||''Requests nodes having an Infiniband interconnect''
| |
− | |}
| |
− | | |
− | ;Examples:
| |
− |
| |
− | To request an Intel processor, use the following:
| |
− | | |
− | #SBATCH --contstraint=intel
| |
− | | |
− | To request nodes that have Intel processors '''AND''' InfiniBand interconnect:
| |
| | | |
| #SBATCH --constraint='intel&infiniband' | | #SBATCH --constraint='intel&infiniband' |
| | | |
− | To request nodes that have processors from the Intel Sandy Bridge '''OR''' Haswell CPU families: | + | To request processors from either AMD Rome '''OR''' Milan CPU family use |
− | | |
− | #SBATCH --constraint='sandy-bridge|haswell'
| |
− | | |
− | ==Node features by partition==
| |
− | HiPerGator node features are documented comprehensively below, sectioned by partition. Use the table columns headings within each section to sort by the criteria of your choice.
| |
− | | |
− | This documentation will be updated periodically. To request current node feature information directly from the cluster, load the ufrc module <code>module load ufrc</code> and run the following command:
| |
− | $ nodeInfo
| |
− | | |
− | ===hpg1-compute===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | amd||c6145||dhabi||o6378||4||64||250
| |
− | |-
| |
− | | amd||a2840||opteron||o6220||2||16||60
| |
− | |-
| |
− | | intel||r2740||westmere||x5675||2||12||94
| |
− | |-
| |
− | | intel||c6100||westmere||x5675||2||12||92
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | * Nodes in the <code>hpg1-compute</code> partition use the InfiniBand network fabric for distributed memory parallel processing and fast access to storage.
| |
− | ===hpg2-compute===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | intel||sos6320||haswell||e5-s2643||2||32||125
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | * Nodes in the <code>hpg2-compute</code> partition use the InfiniBand network fabric for distributed memory parallel processing and fast access to storage.
| |
− | ===hpg2-dev===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | amd||sm-h8qg6||dhabi||o6378||2||64||250
| |
− | |-
| |
− | | intel||sos6320||haswell||e5-2698||2||28||125
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | * Nodes in the <code>hpg2-dev</code> partition use the InfiniBand network fabric for distributed memory parallel processing and fast access to storage.
| |
− | | |
− | ===gpu===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | GPUS
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | intel||r730||haswell||e5-2683||4||2||28||125
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | * Nodes in the <code>gpu</code> partition are equipped with Nvidia Tesla K80 GPU Computing Modules.
| |
− | * Nodes in the <code>gpu</code> partition use the InfiniBand network fabric for distributed memory parallel processing and fast access to storage.
| |
− | ===bigmem===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | amd||r815||magny||o6174||4||48||512
| |
− | |-
| |
− | | intel||r820||sandy-bridge||e5-4607||4||24||768
| |
− | |-
| |
− | | intel||r940||skylake||8186||4||192||1546
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | | |
− | ===gui===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | intel||sos6320||haswell||e5-2698||2||32||125
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | ===phase4===
| |
− | <div style="padding: 5px;">
| |
− | {| class="wikitable sortable"
| |
− | |-
| |
− | ! scope="col" | Processor Vendor
| |
− | ! scope="col" | Chassis Model
| |
− | ! scope="col" | Processor Family
| |
− | ! scope="col" | Processor Model
| |
− | ! scope="col" | Sockets
| |
− | ! scope="col" | CPUs
| |
− | ! scope="col" | RAM (GB)
| |
− | |-
| |
− | | amd||c6105||lisbon||o4184||2||12||32
| |
− | |-
| |
− | |}
| |
− | </div>
| |
− | * Nodes in the <code>phase4</code> partition use the InfiniBand network fabric for distributed memory parallel processing and fast access to storage.
| |
| | | |
− | ==Node Data Table== | + | #SBATCH --constraint='rome|milan' |
| | | |
− | This data has been automatically extracted from node information in SLURM and system processor information
| + | ==All Node Features== |
− | | + | You can run <code>nodeInfo</code> command from the ufrc environment module to list all available SLURM features. In addition, the table below shows automatically updated nodeInfo output as well as the corresponding CPU models. |
− | {{#get_web_data:url=https://bio.rc.ufl.edu/pub/ufrc/data/node_data.csv | + | {{#get_web_data:url=https://data.rc.ufl.edu/pub/ufrc/data/node_data.csv |
| |format=CSV with header | | |format=CSV with header |
− | |data=partition=Partition,ncores=NodeCores,sockets=Sockets,scores=SocketCores,mem=Memory(MB),feat=Features,cpumodel=CPU | + | |data=partition=Partition,ncores=NodeCores,sockets=Sockets,ht=HT,socketcores=SocketCores,memory=Memory,features=Features,cpumodel=CPU |
| |cache seconds=7200 | | |cache seconds=7200 |
| }} | | }} |
− | {| class="wikitable sortable" border="1" cellspacing="0" cellpadding="2" align="center" style="border-collapse: collapse; margin: 1em 1em 1em 0; border-top: none; border-right:none; " | + | {| class="wikitable sortable" border="1" sort=Partition cellspacing="0" cellpadding="2" align="center" style="border-collapse: collapse; margin: 1em 1em 1em 0; border-top: none; border-right:none; " |
| ! Partition | | ! Partition |
− | ! Node Cores | + | ! Cores per node |
| ! Sockets | | ! Sockets |
| ! Socket Cores | | ! Socket Cores |
− | ! Memory(MB) | + | ! Threads/Core |
− | ! SLURM Features | + | ! Memory,GB |
| + | ! Features |
| ! CPU Model | | ! CPU Model |
| {{#for_external_table:<nowiki/> | | {{#for_external_table:<nowiki/> |
| {{!}}- | | {{!}}- |
− | {{!}} [[{{{#external_value:partition}}}]] | + | {{!}} {{{partition}}} |
| {{!}} {{{ncores}}} | | {{!}} {{{ncores}}} |
− | {{!}} [{{{sockets}}}] | + | {{!}} {{{sockets}}} |
− | {{!}} {{{scores}}} | + | {{!}} {{{socketcores}}} |
− | {{!}} {{{mem}}} | + | {{!}} {{{ht}}} |
− | {{!}} {{{feat}}} | + | {{!}} {{{memory}}} |
| + | {{!}} {{{features}}} |
| {{!}} {{{cpumodel}}} | | {{!}} {{{cpumodel}}} |
| }} | | }} |
| |} | | |} |
| + | |
| + | '''Note''': the bigmem partitions are maintained for calculations requiring large amounts of memory. To submit jobs to this partition you will need to add the following directive to your job submission script. |
| + | #SBATCH --partition=bigmem |
| + | |
| + | Since our regular nodes have 1TB of available memory we do not recommend using bigmem nodes for jobs with memory requests lower than that. |
| + | |
| + | |
| + | '''Note''': See [[GPU_Access]] for more details on GPUs, such as available GPU memory. The following CPU models are in order from the oldest (HPG2) to the newest (HPG3) - haswell, rome, milan. |
HiPerGator users may finely control selection of compute hardware for a SLURM job like specific processor families, processor models by using the --constraint
directive to specify HiPerGator server features.
- Example
Use one of the following commands to specify between rome and milan microarchitectures
#SBATCH --constraint=rome
#SBATCH --constraint=milan
Basic boolean logic can be used to request combinations of features. For example, to request nodes that have Intel processors AND InfiniBand interconnect use
#SBATCH --constraint='intel&infiniband'
To request processors from either AMD Rome OR Milan CPU family use
#SBATCH --constraint='rome|milan'
All Node Features
You can run nodeInfo
command from the ufrc environment module to list all available SLURM features. In addition, the table below shows automatically updated nodeInfo output as well as the corresponding CPU models.
Error while fetching data from URL https://data.rc.ufl.edu/pub/ufrc/data/node_data.csv: $2.
There was a problem during the HTTP request: 403 Forbidden
Partition
|
Cores per node
|
Sockets
|
Socket Cores
|
Threads/Core
|
Memory,GB
|
Features
|
CPU Model
|
Note: the bigmem partitions are maintained for calculations requiring large amounts of memory. To submit jobs to this partition you will need to add the following directive to your job submission script.
#SBATCH --partition=bigmem
Since our regular nodes have 1TB of available memory we do not recommend using bigmem nodes for jobs with memory requests lower than that.
Note: See GPU_Access for more details on GPUs, such as available GPU memory. The following CPU models are in order from the oldest (HPG2) to the newest (HPG3) - haswell, rome, milan.