Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Two most important commands for monitoring your job status are squeue and scontrol show job.
  • squeue -l -u <username>. Shows all your jobs that are in the SLURM queue.

  • squeue -l -u <username> -p <partition name>. Shows all your jobs that are in the specific partition (in case you used multiple) in the SLURM queue.

  • scontrol show job -dd <job_id>. Shows all information about specific SLURM job. It is worth paying attention to the following information:

    • Requeue. Shows how many times your job was re-queued. Some jobs may have higher priority and may pre-empt (i.e. cancel) your running jobs and put them back to the queue. If your job takes too long time and Requeue is greater than 1 then, most probably, the reason why your job takes so long is because it was cancelled and re-queued several times.
    • TimeLimit. Shows time limit of your job.
    • Command. The SLURM script that was executed. (only for sbatch script.sh)
    • StdErr. File where STDERR is written.
    • StdOut. File where STDOUT is written.
    • BatchScript. The command that was executed. (only for sbatch --wrap="script.sh args...")

To see all the jobs in the queue use

...

Individual job status can be queried using the checkjob command, followed by the JobID:

$ squeue -j [JOB-ID]


SCC  allow user to  see all the jobs in the queue by using

$ showq

Jobs can be cancelled with the canceljob command

$ scancel [JOB-ID]

Again, these commands have many options, which can be read about on their man pages.


Info

Content by Label
showLabelsfalse
max5
spacesSCC
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "kb-how-to-article" and type = "page" and space = "SCC"
labelskb-how-to-article

...