nntool.slurm.config¶
Classes
alias of |
|
|
Configuration class for SLURM job submission and execution. |
- class nntool.slurm.config.SlurmConfig(mode='run', job_name='Job', partition='', output_parent_path='./', output_folder='slurm', node_list='', node_list_exclude='', num_of_node=1, tasks_per_node=1, gpus_per_task=0, cpus_per_task=1, gpus_per_node=None, mem='', timeout_min=9223372036854775807, stderr_to_stdout=False, setup=<factory>, pack_code=False, use_packed_code=False, code_root='.', code_file_suffixes=<factory>, exclude_code_folders=<factory>, use_distributed_env=False, distributed_env_task='torch', processes_per_task=1, distributed_launch_command='', extra_params_kwargs=<factory>, extra_submit_kwargs=<factory>, extra_task_kwargs=<factory>)[source]¶
Configuration class for SLURM job submission and execution.
- Parameters:
mode (Literal["run", "debug", "local", "slurm"]) – Running mode for the job. Options include: “run” (default, directly run the function), “debug” (run debugging which will involve pdb if it reachs a breakpoint), “local” (run the job locally by subprocess, without gpu allocations and CUDA_VISIBLE_DEVICES cannot be set), or “slurm” (run the job on a SLURM cluster).
job_name (str) – The name of the SLURM job. Default is ‘Job’.
partition (str) – The name of the SLURM partition to use. Default is ‘’.
output_parent_path (str) – The parent directory path for saving the slurm folder. Default is ‘./’.
output_folder (str) – The folder name where SLURM output files will be stored. Default is ‘slurm’.
node_list (str) – A string specifying the nodes to use. Leave blank to use all available nodes. Default is an empty string.
node_list_exclude (str) – A string specifying the nodes to exclude. Leave blank to use all nodes in the node list. Default is an empty string.
num_of_node (int) – The number of nodes to request. Default is 1.
tasks_per_node (int) – The number of tasks to run per node. Default is 1.
gpus_per_task (int) – The number of GPUs to request per task. Default is 0.
cpus_per_task (int) – The number of CPUs to request per task. Default is 1.
gpus_per_node (int) – The number of GPUs to request per node. If this is set, gpus_per_task will be ignored. Default is None.
mem (str) – The amount of memory (GB) to request. Leave blank to use the default memory configuration of the node. Default is an empty string.
timeout_min (int) – The time limit for the job in minutes. Default is sys.maxsize for effectively no limit.
stderr_to_stdout (bool) – Whether to redirect stderr to stdout. Default is False.
setup (List[str]) – A list of environment variable setup commands. Default is an empty list.
pack_code (bool) – Whether to pack the codebase before submission. Default is False.
use_packed_code (bool) – Whether to use the packed code for execution. Default is False.
code_root (str) – The root directory of the codebase, which will be used by the code packing. Default is the current directory (
.).code_file_suffixes (List[str]) – A list of file extensions for code files to be included when packing. Default includes
.py,.sh,.yaml, and.toml.exclude_code_folders (List[str]) – A list of folder names relative to code_root that will be excluded from packing. Default excludes ‘wandb’, ‘outputs’, and ‘datasets’.
use_distributed_env (bool) – Whether to use a distributed environment for the job. Default is False.
distributed_env_task (Literal["torch"]) – The type of distributed environment task to use. Currently, only “torch” is supported. Default is “torch”.
processes_per_task (int) – The number of processes to run per task. This value is not used by SLURM but is relevant for correctly set up distributed environments. Default is 1.
distributed_launch_command (str) – The command to launch distributed environment setup, using environment variables like
{num_processes},{num_machines},{machine_rank},{main_process_ip},{main_process_port}. Default is an empty string.extra_params_kwargs (Dict[str, str]) – Additional parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.
extra_submit_kwargs (Dict[str, str]) – Additional submit parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.
extra_task_kwargs (Dict[str, str]) – Additional task parameters for the SLURM job as a dictionary of key-value pairs. Default is an empty dictionary.
- nntool.slurm.config.SlurmArgs[source]¶
alias of
SlurmConfig