Submitting Jobs to the Queue

The SCC is a shared system, and jobs that are to run on them are submitted to a queue; the Scheduler then orders the jobs in order to make the best use of the machine, and has them launched when resources become available. The intervention of the scheduler can mean that the jobs aren't quite run in a first-in first-out order.


The maximum wall clock time for a job in the queue is 4000 hours; Our cluster will put your jobs in diffrent queue depend on the walltime you request.Short time jobs will got more resource and will start sooner. So if your job need fewer time, please specify that in your script -- your job will start sooner. (It's easier for the scheduler to fit in a short job than a long job). On the downside, the job will be killed automatically by the queue manager software at the end of the specified wallclock time, so if you guess wrong you might lose some work. So the standard procedure is to estimate how long your job will take and add 10% or so.

Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.

Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs. We will help make sure that your jobs start and run efficiently.

Batch Submission Script

You interact with the queuing system through the Slurm queue/resource manager. To submit a job, you must write a script which describes the job and how it is to be run and submit it to the queue, using the command A sample submission script is shown below with the #PBS directives at the top and the rest being what will be executed on the compute node.

#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out

cd $SLURM_SUBMIT_DIR

module load modulename

your_commands_goes_here
yourscripts data_1 > output

The lines that begin #SBATCH are commands that are parsed and interpreted by sbatch at submission time, and control administrative things about your job. In this example, the script above requests one nodes, using 8 processors per node, for a wallclock time of two hour.

Not all lines above is required be in the submission script, some have default value like cpu default to 1 core and ram default to 8G.  If you do not need mail notifications, you do not need include  --mail line in it. But we strongly recommand you to add the line like cpu, time and mem parameter to your scripts.

Slurm Directives

ResourceFlag SyntaxDescriptionNotes
partition--partition=general-computePartition is a queue for jobs.default on ub-hpc is general-compute
time--time=01:00:00Time limit for the job.1 hour;
nodes--nodes=2Number of compute nodes for the job.default is 1; compute nodes




cpus/cores--ntasks-per-node=8Corresponds to number of cores on the compute node.default is 1
resource feature--gres=gpu:2Request use of GPUs on compute nodesdefault is no feature specified;
memory--mem=24000Memory limit per compute node for the job. Do not use with mem-per-cpu flag.memory in MB; default limit is 3000MB per core
memory--mem-per-cpu=4000Per core memory limit. Do not use the mem flag,memory in MB; default limit is 3000MB per core
job name--job-name="hello_test"Name of job.default is the JobID
output file--output=test.outName of file for stdout.default is the JobID
email address--mail-user=username@buffalo.eduUser's email addressrequired
email notification--mail-type=ALL

--mail-type=ENDWhen email is sent to user.omit for no email
access--exclusiveExclusive acccess to compute nodes.default is sharing nodes

Slurm environment variables

The Slurm controller will set variables in the environment of the batch script. Here we make a list of them and the corresponding Torque/MOAB environment variables for the comparison.

SLURM VariablesTorque/MOABDescription
SLURM_ARRAY_JOB_IDPBS_JOBIDJob array's master job ID number
SLURM_ARRAY_TASK_COUNT
Total number of tasks in a job array
SLURM_ARRAY_TASK_IDPBS_ARRAYIDJob array ID (index) number
SLURM_ARRAY_TASK_MAX
Job array's maximum ID (index) number
SLURM_ARRAY_TASK_MIN
Job array's minimum ID (index) number
SLURM_ARRAY_TASK_STEP
Job array's index step size
SLURM_CLUSTER_NAME
Name of the cluster on which the job is executing
SLURM_CPUS_ON_NODE
Number of CPUS on the allocated node
SLURM_CPUS_PER_TASKPBS_VNODENUMNumber of cpus requested per task. Only set if the --cpus-per-taskoption is specified.
SLURM_JOB_ACCOUNT
Account name associated of the job allocation
SLURM_JOB_CPUS_PER_NODEPBS_NUM_PPNCount of processors available to the job on this node.
SLURM_JOB_DEPENDENCY
Set to value of the --dependency option
SLURM_JOB_NAMEPBS_JOBNAMEName of the job
SLURM_JOBID

SLURM_JOB_IDPBS_JOBIDThe ID of the job allocation
SLURM_MEM_PER_CPU
Same as --mem-per-cpu
SLURM_MEM_PER_NODE
Same as --mem
SLURM_NNODES

SLURM_JOB_NUM_NODES
Total number of different nodes in the job's resource allocation
SLURM_NODELIST

SLURM_JOB_NODELISTPBS_NODEFILEList of nodes allocated to the job
SLURM_NTASKS_PER_NODE
Number of tasks requested per node. Only set if the --ntasks-per-node option is specified.
SLURM_NTASKS_PER_SOCKET
Number of tasks requested per socket. Only set if the --ntasks-per-socket option is specified.
SLURM_NTASKS

SLURM_NPROCSPBS_NUM_NODESSame as -n, --ntasks
SLURM_SUBMIT_DIRPBS_O_WORKDIRThe directory from which sbatch was invoked
SLURM_SUBMIT_HOSTPBS_O_HOSTThe hostname of the computer from which sbatch was invoked
SLURM_TASK_PID
The process ID of the task being started
SLURMD_NODENAME
Name of the node running the job script

</pre>

Job Submission

$ sbatch [SCRIPT-FILE-NAME]

where you will replace [SCRIPT-FILE-NAME] with the file containing the submission script. This will return a job ID, for example 51923, which is used to identify the jobs. Information about a queued job can be found using