API Reference

Data

Image

dl_utils.data.image.byte_imread(data: bytes) ndarray[source]
dl_utils.data.image.byte_imwrite(image: ndarray, format='PNG', **kwargs) bytes[source]

Video

dl_utils.data.video.save_video(frames: np.ndarray | torch.Tensor, save_path: str | PathLike[str] | Path, fps: int | float = 30, codec: str = 'avc1')[source]
Parameters:
  • frames – Video frames in shape (F, H, W, C). The pixel values should be in range [0, 255].

  • save_path – Path to save video.

  • fps – FPS of video, default 30.

  • codec – Codec of video, default avc1.

dl_utils.data.video.load_video(video_path: str | PathLike[str] | Path, resize: Tuple[int, int] | int = None, center_crop: Tuple[int, int] | int = None, max_frames: int = None) ndarray[source]

Load a video file.

Parameters:
  • video_path – Path to the video file.

  • resize – Resize frames to the specified size. If None, no resizing. Accepts (width, height) or int.

  • center_crop – Center crop frames to the specified size. If None, no cropping. Accepts (width, height) or int.

  • max_frames – Maximum number of frames to load. If None, load all frames.

Returns:

Frames as a NumPy array with shape (F, H, W, C). Pixel values are in [0, 255], color order is RGB.

Note

  • If the video is grayscale, the color channel will be replicated to 3.

dl_utils.data.video.get_video_fps(video_path: str | PathLike[str] | Path) float[source]

Retrieve the FPS of a video.

Parameters:

video_path – Path to the video file.

Returns:

The FPS of the video.

dl_utils.data.video.get_video_frame_count(video_path: str | PathLike[str] | Path) int[source]

Retrieve the total number of frames in a video.

Parameters:

video_path – Path to the video file.

Returns:

The number of frames in the video.

dl_utils.data.video.get_video_duration(video_path: str | PathLike[str] | Path) Tuple[float, int, float][source]

Retrieve the FPS, frame count, and duration (in seconds) of a video.

Parameters:

video_path – Path to the video file.

Returns:

A tuple containing FPS, frame count, and duration in seconds.

dl_utils.data.video.get_video_duration_batch(video_paths: List[str | PathLike[str] | Path]) List[float][source]

Get duration of videos in batch.

Parameters:

video_paths – List of paths to videos.

Returns:

A list of tuples, each containing FPS, frame count, and duration in seconds.

dl_utils.data.video.convert_to_h265(input_file: AnyStr, output_file: AnyStr, ffmpeg_exec: AnyStr = '/usr/bin/ffmpeg', keyint: int = None, overwrite: bool = False, verbose: bool = False) None[source]

convert video to h265 format using ffmpeg @param input_file: input path @param output_file: output path @param ffmpeg_exec: @param keyint: @param overwrite: overwrite the existing file @param verbose: show ffmpeg output

dl_utils.data.video.convert_to_h264(input_file: AnyStr, output_file: AnyStr, ffmpeg_exec: AnyStr = '/usr/bin/ffmpeg', keyint: int = None, overwrite: bool = False, verbose: bool = False) None[source]

Array

dl_utils.data.array.to_numpy(array: ndarray | Tensor) ndarray[source]

Convert array-like object to numpy array.

dl_utils.data.array.to_tensor(array: ndarray | Tensor) Tensor[source]

Convert array-like object to torch tensor.

dl_utils.data.array.to_original(array: ndarray | Tensor, ori_dtype) ndarray | Tensor[source]

Convert array-like object to original type.

Normalization

dl_utils.data.normalize.normalize(data: ndarray | Tensor, mean: float | int | ndarray | Tensor, std: float | int | ndarray | Tensor, dim=-1) ndarray | Tensor[source]

Normalize the input array (usually image or video).

Parameters:
  • data – Input array, can be a NumPy array or a PyTorch tensor.

  • mean – Scalar or vector of means for each channel.

  • std – Scalar or vector of standard deviations for each channel.

  • dim – The channel dimension to normalize along. Default is -1 (last dimension).

Returns:

Normalized image or video in the same type as input (NumPy array or PyTorch tensor).

Examples

>>> import numpy as np
>>> from dl_utils import normalize
>>> img = np.array([[[0, 128, 255]]], dtype=np.float32)
>>> normalize(img, mean=128, std=64)
array([[[-2.      ,  0.      ,  1.984375]]])
>>> import torch
>>> from dl_utils import normalize
>>> img_t = torch.tensor([[[0, 128, 255]]], dtype=torch.float32)
>>> normalize(img_t, mean=torch.tensor([0, 128, 255]), std=torch.tensor([1, 64, 255]))
tensor([[[0., 0., 0.]]], dtype=torch.float64)
dl_utils.data.normalize.inv_normalize(data: ndarray | Tensor, mean: float | int | ndarray | Tensor, std: float | int | ndarray | Tensor, dim=-1) ndarray | Tensor[source]

Inverse normalize the input array (usually image or video).

Parameters:
  • data – Input array, can be a NumPy array or a PyTorch tensor, which has been previously normalized.

  • mean – Scalar or vector of means used in the original normalization.

  • std – Scalar or vector of standard deviations used in the original normalization.

  • dim – The channel dimension along which normalization was applied. Default is -1 (last dimension).

Returns:

Denormalized image or video in the same type as input (NumPy array or PyTorch tensor).

Json

dl_utils.data.json.load_json(file)[source]
dl_utils.data.json.save_json(data, file, save_pretty=False, sort_keys=False)[source]

LMDB

class dl_utils.data.lmdb.JsonLmdb(env: Environment, autogrow: bool)[source]

Bases: Lmdb

https://pypi.org/project/lmdbm/

Pickle

dl_utils.data.pickle.save_pickle(obj, file)[source]
dl_utils.data.pickle.load_pickle(file)[source]

Sampling

dl_utils.data.sample.sample_evenly(input_data: List[Any] | ndarray | Tensor | Sequence[Any], n: int) ndarray | List[Any] | Tensor[source]

Evenly sample N elements from input_data. Supports list, numpy array, or torch tensor. The input_data can be empty, and n can be less than or equal to 0, in which case it will return empty data.

Parameters:
  • input_data – List, numpy array, or torch tensor to sample from.

  • n – Number of elements to sample.

Returns:

Sampled data in the same type as input_data.

dl_utils.data.sample.sample_randomly(input_data: List[Any] | ndarray | Tensor | Sequence[Any], n: int, ordered: bool = False, seed: int = None, put_back: bool = False) ndarray | List[Any] | Tensor[source]

Randomly sample N elements from input_data. Supports list, numpy array, or torch tensor.

Parameters:
  • input_data – List, numpy array, or torch tensor to sample from.

  • n – Number of elements to sample.

  • ordered – Whether to return sampled elements in the original order.

  • seed – Random seed for reproducibility.

  • put_back – If True, sample with replacement.

Returns:

Sampled data in the same type as input_data.

Text

dl_utils.data.text.save_text(text: str, file)[source]
dl_utils.data.text.load_text(file) str[source]

Distributed

dl_utils.distributed.gather_objects(list_object: List[Any]) List[Any][source]

gather a list of something from multiple GPU.

dl_utils.distributed.rank0_wrapper(fn)[source]

Wrap any function to only run on rank 0.

dl_utils.distributed.get_master_addr() str | None[source]
dl_utils.distributed.dist_breakpoint(rank: int = 0)[source]

Breakpoint for distributed training. Enter the breakpoint only if the current rank is rank, and block all other processes using distributed barrier.

dl_utils.distributed.get_local_rank() int[source]

Get the local rank, the local index of the GPU.

dl_utils.distributed.get_device() device[source]

Get current rank device.

dl_utils.distributed.rank0_log(fn, *args, **kwargs)[source]

Log only on rank 0.

dl_utils.distributed.dist_info(print_fn: ~typing.Callable[[str], ~typing.Any] = <built-in function print>, prefix: str = '')[source]

Print torch distributed information for debugging.

dl_utils.distributed.rank0() bool[source]

Global rank 0

dl_utils.distributed.barrier_if_distributed(*args, **kwargs)[source]

Synchronizes all processes if under distributed context.

dl_utils.distributed.get_global_rank() int[source]

Get the global rank, the global index of the GPU.

dl_utils.distributed.local_rank0() bool[source]

Local rank 0 (of each node)

dl_utils.distributed.get_master_port() int | None[source]
dl_utils.distributed.get_world_size() int[source]

Get (global) world size, the total amount of GPUs.

dl_utils.distributed.rank0_print(*args, **kwargs)[source]

Print only on rank 0

dl_utils.distributed.recursive_to(obj: Any, device: str | device = None) Any[source]

Recursively move all torch.Tensor in obj to the given device. Supports: Tensor, list, tuple, dict, set. Leaves other objects intact.

Parameters:
  • obj – The object to move.

  • device – The device to move to. If None, uses the current device if gpu is available, else “cpu”.

Returns:

The object with all torch.Tensor moved to the given device.

File System

dl_utils.fs.list_files_multithread(directory, n_jobs=16, depth: int | None = None)[source]

List all files in a directory recursively using multiple threads. Useful for list files on NFS.

Parameters:
  • directory – The directory to search.

  • n_jobs – Number of parallel jobs (threads) to use.

  • depth – Maximum recursion depth. If None, no depth limit.

Returns: List of all file paths found under the directory.

dl_utils.fs.list_files(path: str, depth: int | None = None) List[str][source]

List all files in a folder recursively.

Parameters:
  • path – Root path to start the search.

  • depth – Maximum depth to search. If None, there is no depth limit. If 0 or less, stop searching deeper.

Returns: A List of file paths found under the given path.

dl_utils.fs.make_parent_dirs(path: PathLike)[source]

Visualize

dl_utils.visualize.plot_distribution(data, remove_outlier=False, percent_range=(0.1, 99.9))[source]