nntool.slurm.config

Classes

SlurmArgs

alias of SlurmConfig

SlurmConfig([mode, job_name, partition, ...])

Configuration class for SLURM job submission and execution.

class nntool.slurm.config.SlurmConfig(mode='run', job_name='Job', partition='', output_parent_path='./', output_folder='slurm', node_list='', node_list_exclude='', num_of_node=1, tasks_per_node=1, gpus_per_task=0, cpus_per_task=1, gpus_per_node=None, mem='', timeout_min=9223372036854775807, stderr_to_stdout=False, setup=<factory>, pack_code=False, use_packed_code=False, code_root='.', code_file_suffixes=<factory>, exclude_code_folders=<factory>, use_distributed_env=False, distributed_env_task='torch', processes_per_task=1, distributed_launch_command='', extra_params_kwargs=<factory>, extra_submit_kwargs=<factory>, extra_task_kwargs=<factory>)[source]

Configuration class for SLURM job submission and execution.

Parameters:
  • mode (Literal["run", "debug", "local", "slurm"]) – Running mode for the job. Options include: “run” (default, directly run the function), “debug” (run debugging which will involve pdb if it reachs a breakpoint), “local” (run the job locally by subprocess, without gpu allocations and CUDA_VISIBLE_DEVICES cannot be set), or “slurm” (run the job on a SLURM cluster).

  • job_name (str) – The name of the SLURM job. Default is ‘Job’.

  • partition (str) – The name of the SLURM partition to use. Default is ‘’.

  • output_parent_path (str) – The parent directory path for saving the slurm folder. Default is ‘./’.

  • output_folder (str) – The folder name where SLURM output files will be stored. Default is ‘slurm’.

  • node_list (str) – A string specifying the nodes to use. Leave blank to use all available nodes. Default is an empty string.

  • node_list_exclude (str) – A string specifying the nodes to exclude. Leave blank to use all nodes in the node list. Default is an empty string.

  • num_of_node (int) – The number of nodes to request. Default is 1.

  • tasks_per_node (int) – The number of tasks to run per node. Default is 1.

  • gpus_per_task (int) – The number of GPUs to request per task. Default is 0.

  • cpus_per_task (int) – The number of CPUs to request per task. Default is 1.

  • gpus_per_node (int) – The number of GPUs to request per node. If this is set, gpus_per_task will be ignored. Default is None.

  • mem (str) – The amount of memory (GB) to request. Leave blank to use the default memory configuration of the node. Default is an empty string.

  • timeout_min (int) – The time limit for the job in minutes. Default is sys.maxsize for effectively no limit.

  • stderr_to_stdout (bool) – Whether to redirect stderr to stdout. Default is False.

  • setup (List[str]) – A list of environment variable setup commands. Default is an empty list.

  • pack_code (bool) – Whether to pack the codebase before submission. Default is False.

  • use_packed_code (bool) – Whether to use the packed code for execution. Default is False.

  • code_root (str) – The root directory of the codebase, which will be used by the code packing. Default is the current directory (.).

  • code_file_suffixes (List[str]) – A list of file extensions for code files to be included when packing. Default includes .py, .sh, .yaml, and .toml.

  • exclude_code_folders (List[str]) – A list of folder names relative to code_root that will be excluded from packing. Default excludes ‘wandb’, ‘outputs’, and ‘datasets’.

  • use_distributed_env (bool) – Whether to use a distributed environment for the job. Default is False.

  • distributed_env_task (Literal["torch"]) – The type of distributed environment task to use. Currently, only “torch” is supported. Default is “torch”.

  • processes_per_task (int) – The number of processes to run per task. This value is not used by SLURM but is relevant for correctly set up distributed environments. Default is 1.

  • distributed_launch_command (str) – The command to launch distributed environment setup, using environment variables like {num_processes}, {num_machines}, {machine_rank}, {main_process_ip}, {main_process_port}. Default is an empty string.

  • extra_params_kwargs (Dict[str, str]) – Additional parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.

  • extra_submit_kwargs (Dict[str, str]) – Additional submit parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.

  • extra_task_kwargs (Dict[str, str]) – Additional task parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.

set_output_path(output_parent_path)[source]

Set output path and date for the slurm job.

Parameters:

output_parent_path (str) – The parent path for the output.

Returns:

The updated SlurmConfig instance.

Return type:

SlurmConfig

nntool.slurm.config.SlurmArgs[source]

alias of SlurmConfig