SCC  is using new queueing system called Slurm to manage compute resources and to schedule jobs that use them. Users use Slurm commands to submit batch and interactive jobs and to monitor their progress during execution.

Log in

First, log into an SCC login node. This step is dependent on your local environment; but in an OS X or Linux environment, you should be able to use the standard OpenSSH command-line client.

$ ssh –X -l $username scclogin.camhres.ca


DO NOT RUN ANY PROGRAMS ON THIS NODE, please connect to one of our development nodes to prepare your jobs for submission to the cluster:

ssh dev01

ssh dev02


Transitioning from PBS to Slurm

The original SCC cluster ran the PBS batch system. We are planning migrate the batch system to Slurm.

In general, a PBS batch script is a bash or csh script that will work in Slurm. Slurm will attempt to convert PBS directives appropriately. In many cases, you may not need to change your existing PBS batch scripts to work with Slurm. This is fine for scripts that have simple PBS directives, e.g. #PBS -m be.

We have manage to convert the following PBS Environment variables to Slurm Environment

  • PBS_JOBID
  • PBS_JOBNAME
  • PBS_O_WORKDIR
  • PBS_O_HOST
  • PBS_NUM_NODES
  • PBS_NUM_PPN
  • PBS_NP
  • PBS_O_NODENUM
  • PBS_O_VNODENUM
  • PBS_O_TASKNUM
  • PBS_ARRAYID

Note that except the above variables, not all PBS Environment variables will not be converted by Slurm. variables For anything more complicated, you should rewrite your batch scrips in Slurm syntax. Batch scripts for parallel jobs in particular should be rewritten for Slurm.

Equivalent commands in PBS and Slurm

PurposePBSSlurm
Submit a jobqsub jobscriptsbatch jobscript
Delete a jobqdel job_idscancel job_id
Delete all jobs belonging to userqdel `qselect -u user`scancel -u user
Job statusqstat -u usersqueue -u user
Show all jobsqstat -asqueue
Environment variables
Job ID$PBS_JOBID$SLURM_JOBID
Submit directory$PBS_O_WORKDIR$SLURM_SUBMIT_DIR
Allocated node list$PBS_NODEFILE$SLURM_JOB_NODELIST
Job array index$PBS_ARRAY_INDEX$SLURM_ARRAY_TASK_ID
Number of cores/processes
$SLURM_CPUS_PER_TASK
$SLURM_NTASKS

Job specifications
Set a wallclock limitqsub -l nodes=1,walltime=HH:MM:SSsbatch -t [min] OR -t [days-hh:mm:ss]
Standard output fileqsub -o filename
  1. PBS -o filename
sbatch -o filename
  1. SBATCH --output filename
  1. SBATCH -o filename
Standard errror fileqsub -e filename
  1. PBS -e filename
sbatch -e filename
  1. SBATCH --error filename
  2. SBATCH -e filename
Combine stdout/stderrqsub -j oe
  1. PBS -j oe
This is the default.
Location of out/err filesqsub -k oe
  1. PBS -k oe
not needed. By default, slurm will write stdout/stderr files to the directory from which the job is submitted.
Export environment to allocated nodeqsub -Vsbatch --export=all (default)
Export a single variableqsub -v np=12sbatch --export=np
Email notificationsqsub -m be
  1. PBS -m be
END|FAIL|ALL
  1. SBATCH --mail-type=ALL
Job nameqsub -N jobname -l nodes=1 jobscript
  1. PBS -N JobName
sbatch --job-name=name jobscript
  1. SBATCH --job-name=JobName
Job restartn]sbatch --requeue OR --no-requeue
Working directory-sbatch --workdir=[dirname]
Memory requirementqsub -l nodes=1:g8sbatch --mem=8g
qsub -l nodes=1,mem=256gbsbatch --mem=256g
Job dependencyqsub -W depend=afterany:jobidsbatch --depend=afterany:jobid
Job Blockingqsub -W block=true---- no equivalent ------------------
Job arraysqsub -J 1-100 jobscriptsbatch --array=1-100 jobscript
Licensesqsub -l nodes=1,matlab=1sbatch --licenses=matlab????????????

Converting a PBS batch script to a Slurm batch script

Defaults:

Slurm will, by default, attempt to understand all PBS options in the batch script. For example, a batch script containing

#PBS -N JobName

will be internally translated by Slurm into

#SBATCH --job-name=JobName

and the job will show up in in the squeue output with the job name 'JobName'.

Thus, most of your old PBS batch scripts should work in Slurm without problems. For new batch scripts, we recommend that you start using the SLURM options.

Ignore PBS directives:

If you do not want the PBS directives in your batch script to be internally translated by Slurm, use the --ignore-pbs option to Slurm. For example, submitting with:

[biowulf ~]$ sbatch --ignore-pbs jobscript

will cause Slurm to ignore all #PBS directives in the batch script.

pbs2slurm:

A script called pbs2slurm.py can be used to convert your existing PBS batch scripts to Slurm scripts.

Sample session.

[biowulf2 ~]$ pbs2slurm.py < run1.pbs > run1.slurm

run1.pbs

#!/bin/bash -l
  1. PBS -N germline
  1. PBS -m be
  1. PBS -k oe
cd $PBS_O_WORKDIR germline -bits 50 -min_m 1 -err_hom 2 <<EOF 1 CEU.22.map CEU.22.ped generated EOF

run1.slurm

#!/bin/bash -l
  1. SBATCH --job-name="germline"
  1. SBATCH --mail-type=BEGIN,END
  1. PBS -k oe
cd $SLURM_SUBMIT_DIR germline -bits 50 -min_m 1 -err_hom 2 <<EOF 1 CEU.22.map CEU.22.ped generated EOF

Note that the directive #PBS -k oe is not translated. This directive is unnecessary in Slurm, so there is no equivalent. Slurm defaults to writing a single stderr/stdout file to the directory from which the job was submitted. (This Slurm behaviour can be changed with the #SBATCH -o filename and #SBATCH -e filename flags).

Batch jobs

Submitting job

Slurm is primarily a resource manager for batch jobs: a user writes a job script that Slurm schedules to run non-interactively when resources are available. Users primarily submit computational jobs to the Slurm queue using the sbatch command.

$ sbatch job-script.sh

sbatch takes a number of command-line arguments. These arguments can be supplied on the command-line:

$ sbatch --ntasks 16 job-script.sh

or embedded in the header of the job script itself using #SBATCH directives:

  1. !/bin/bash
  1. SBATCH --ntasks 16

You can use the scancel command to cancel a job that has been queued, whether the job is pending or currently running. Jobs are cancelled by specifying the job id that is assigned to the job during submission.

Example batch job script: hello-world.sh

  1. !/bin/bash --login
  1. SBATCH --ntasks 1
  1. SBATCH --tasks-per-node=1
  1. SBATCH --output hello-world.out
  1. SBATCH --qos debug
  1. SBATCH --time=00:05:00

echo Running on $(hostname --fqdn): 'Hello, world!'

This minimal example job script, hello-world.sh, when submitted with sbatch, writes the name of the cluster node on which the job ran, along with the standard programmer's greeting, "Hello, world!", into the output file hello-world.out

$ sbatch hello-world.sh

Note that any Slurm arguments must precede the name of the job script.

Job requirements

Slurm uses the requirements declared by job scripts and submission arguments to schedule and execute jobs as efficiently as possible. To minimize the time your jobs spend waiting to run, define your job's resource requirements as accurately as possible.

--nodes

The number of nodes your job requires to run.

--mem

The amount of memory required on each node.

--ntasks

The number of simultaneous tasks your job requires. (These tasks are analogous to MPI ranks.)

--ntasks-per-node

The number of tasks (or cores) your job will use on each node.

--time

The amount of time your job needs to run.

The --time requirement (also referred to as "walltime") deserves special mention. Job execution time can be somewhat variable, leading some users to overestimate (or even maximize) the defined time limit to prevent premature job termination; but an unnecessarily long time limit may delay the start of the job and allow undetected stuck jobs to waste more resources before they are terminated.

The --mem requirement if not defined in your scripts, it will be set to default 16GB/core

For all resources, --time included, smaller resource requirements generally lead to shorter wait times.

Summit nodes can be shared, meaning each such node may execute multiple jobs simultaneously, even from different users.

Additional job parameters can be got with the sbatch  --help or man sbatch.

Summit Partitions

On SCC, partitions was defined as the following table.

Partition nameDescriptionMaxNodesMax Walltime
shortshort (default)244H
mediumMedium time168H
longLong time84000H
debugDebug220H/core
gpuGPU-enabled1n/a

for non-gpu purpose job, you do not need to define partition in your scripts . we will assaign to correct partition according to job’s request time.

Quality of service (QOS)

.

On SCC, QoSes are used to constrain or modify the characteristics that a job can have. For example, by selecting the "debug" QoS, a user can obtain higher queue priority for a job with the tradeoff that the maximum allowed wall time is reduced from what would otherwise be allowed on that partition.

The Current available Summit QoSes are

QOS nameDescriptionMax walltimeMax jobs/userNode limitsPriority boost
normaldefaultDerived from partitionn/an/a0
debugFor quicker turnaround when testing20h for 1 core22/jobEquiv. of 1-day queue wait time

Shell variables and environment

Jobs submitted to Summit are not automatically set up with the same environment variables as the shell from which they were submitted. Thus, it is required to load any necessary modules or set any environment variables needed by the job within the job script. These settings should be included after any #SBATCH directives in the job script.

Job arrays

Job arrays provide a mechanism for running several instances of the same job with minor variations.

Job arrays are submitted using sbatch, similar to standard batch jobs.

$ sbatch --array=[0-9] job-script.sh

Each job in the array will have access to a $SLURM_ARRAY_TASK_ID set to the value that represents that job's position in the array. By consulting this variable, the running job can perform the appropriate variant task.

Example array job script: hello-world.sh

  1. !/bin/bash
  1. SBATCH --array 0-9
  1. SBATCH --ntasks 1
  1. SBATCH --output array-job.out
  1. SBATCH --open-mode append
  1. SBATCH --qos debug
  1. SBATCH --time=00:05:00

echo "$(hostname --fqdn): index ${SLURM_ARRAY_TASK_ID}"

This minimal example job script, array-job.sh, when submitted with sbatch, submits ten jobs with indexes 0 through 9. Each job appends the name of the cluster node on which the job ran, along with the job's array index, into the output file array-job.out

$ sbatch array-job.sh

Allocations

Access to computational resources is allocated via shares of CPU time assigned to Slurm allocation accounts. You can determine your default allocation account using the sacctmgr command.

$ sacctmgr list Users Users=$USER format=DefaultAccount

Use the --account argument to submit a job for an account other than your default.

  1. SBATCH --account=crcsupport

You can use the sacctmgr command to list your available accounts.

$ sacctmgr list Associations Users=$USER format=Account

Job mail

Slurm can be configured to send email notifications at different points in a job's lifetime. This is configured using the --mail-type and --mail-user arguments.

  1. SBATCH --mail-type=END
  1. SBATCH --mail-user=user@example.com

The --mail-type configures what points during job execution should generate notifications. Valid values include BEGIN, END, FAIL, and ALL.

Resource accounting

Resources used by Slurm jobs are recorded in the Slurm accounting database. This accounting data is used to track allocation usage.

The sacct command displays accounting data from the Slurm accounting database. To query the accounting data for a single job, use the --job argument.

$ sacct --job $jobid

sacct queries can take some time to complete. Please be patient.

You can change the fields that are printed with the --format option, and the fields available can be listed using the --helpformat option.

$ sacct --job=200 --format=jobid,jobname,qos,user,nodelist,state,start,maxrss,end

If you don't have a record of your job IDs, you can use date-range queries in sacct to find your job.

$ sacct --user=$USER --starttime=2017-01-01 --endtime=2017-01-03

To query the resources being used by a running job, use  sstat  instead:

$sstat -a -j JobID.batch

where you should replace JObID with the actual ID of your running job. sstat is especially useful for determining how much memory your job is using; see the "MaxRSS" field. 

Monitoring job progress

The squeue command can be used to inspect the the Slurm job queue and a job's progress through it.

By default, squeue will list all jobs currently queued by all users. This is useful for inspecting the full queue; but, more often, users simply want to inspect the current state of their own jobs.

$ squeue --user=$USER

Slurm can provide an estimate of when your jobs will start, along with what resources it expects to dispatch your jobs to. Please keep in mind that this is only an estimate!

$ squeue --user=$USER --start

More detailed information about a specific job can be accessed using the scontrol command.

$ scontrol show job $SLURM_JOB_ID

Interactive jobs

Interactive jobs allow users to log in to a compute node to run commands interactively on the command line. They are commonly run with the debug QoS as part of an interactive programming and debugging workflow. The simplest way to establish an interactive session is to use the srun  command:

$ srun -p debug --time=01:00:00 --pty /bin/bash

This will open a login shell using one core on one node for one hour. If you prefer to submit an existing job script or other executable as an interactive job, use the salloccommand.

$ salloc -p debug --time=01:00:00 job-script.sh

If you do not provide a command to execute, salloc starts up a Slurm job that nodes will be assigned to, but it does not log you in to the allocated node(s).

The srun  and salloc commands each support the same parameters as sbatch, and can override any default configuration. Note that any #SBATCH directives in your job script will not be interpreted by salloc when it is executed in this way. You must specify all arguments directly on the command line.

Interactive are only allowed in debug and gpu partition on Scc Cluter. The maximum walltime allowed is 2 hours.

Temporary Directories

When a SLURM job starts, the scheduler creates a temporary directory for the job on the compute node's local hard drive. This $SLURM_TMPDIR directory is very useful for jobs that need to use or generate a large number of small files, as the /export/ramdisk  ramdisk filesystem is optimized for files. The default maximum size of this local ramdisk size is around half of the total memory of the node.

The directory is owned by the user running the job. The path to the temporary directory is made available as the $SLURM_TMPDIR variable. At the end of the job, the temporary directory is automatically removed.

You can use the ${SLURM_TMPDIR} variable in job scripts to copy temporary data to the temporary job directory. If necessary, it can also be used as argument for applications that accept a temporary directory argument.

Note - Default Paths

Many applications and programming languages use the $TMPDIR environment variable, if available, as the default temporary directory path. If this variable is not set, the applications will default to using the /tmp directory, which is not desirable. SLURM will set $TMPDIR to the same value as $SLURM_TMPDIR unless $TMPDIR has already been set, in which case it will be ignored.

If you are using large memory program which will using half of the node memory, check your job script(s) and shell initialization files like .bashrc and .bash_profile to make sure you have $TMPDIR set to other place because ramdisk will take the usable memory.

If a personal Singularity container is used, make sure that the $SINGULARITYENV_TMPDIR variable is set within the job to export the local scratch location into the Singularity container.

CIFS Directories

Windows AD account user can access X and Y drive in SCC directly with readonly permission.

Before submitting job, do the following:

  1. Check kerberos ticket

[andytest@camh.ca@scclogin01 ~]$ klist

Usually you will got the following result :

Ticket cache: KEYRING:persistent:1861908071:1861908071

Default principal: andytest@CAMH.CA

Valid starting Expires Service principal

04/04/2018 14:58:31 04/05/2018 00:58:31 krbtgt/CAMH.CA@CAMH.CA

renew until 04/11/2018 14:58:29

if you got something like:

klist: Credentials cache keyring 'persistent:1861908071:1861908071' not found

you need to reinitialize your kerberos ticket

  1. Initial kerberos ticket

    If you log in for long time, the kerberos ticket may expire, using kinit to reinitialize

    [andytest@camh.ca@scclogin01 ~]$ kinit

  2. Submitting job with auks parameter

    When submiting job, please add -auks=yes parameter in all slurm command like:

Example batch job script: test_cifs.sh

  1. !/bin/bash --login
  1. SBATCH --ntasks 1
  1. SBATCH --tasks-per-node=1
  1. SBATCH --output test_cifs.out
  1. SBATCH --qos debug
  1. SBATCH --time=00:05:00

echo Running on $(hostname --fqdn): 'Hello, world!'

ls /cifs/X

ls /cifs/Y

sbatch –auks=yes test_cifs.sh

check the file test_cifs.out , you should see something like:

[andytest@camh.ca@scclogin01 ~]$ cat test_cifs.out

Running on node20.camhres.ca: Hello, world!

ReseachIT_Agreements

Andytest_CS

ReseachIT_Agreements

  • No labels