API Reference¶
Data¶
Image¶
Video¶
- dl_utils.data.video.save_video(frames: np.ndarray | torch.Tensor, save_path: str | PathLike[str] | Path, fps: int | float = 30, codec: str = 'avc1')[source]¶
- Parameters:
frames – Video frames in shape (F, H, W, C). The pixel values should be in range [0, 255].
save_path – Path to save video.
fps – FPS of video, default 30.
codec – Codec of video, default avc1.
- dl_utils.data.video.load_video(video_path: str | PathLike[str] | Path, resize: Tuple[int, int] | int = None, center_crop: Tuple[int, int] | int = None, max_frames: int = None) ndarray[source]¶
Load a video file.
- Parameters:
video_path – Path to the video file.
resize – Resize frames to the specified size. If None, no resizing. Accepts (width, height) or int.
center_crop – Center crop frames to the specified size. If None, no cropping. Accepts (width, height) or int.
max_frames – Maximum number of frames to load. If None, load all frames.
- Returns:
Frames as a NumPy array with shape (F, H, W, C). Pixel values are in [0, 255], color order is RGB.
Note
If the video is grayscale, the color channel will be replicated to 3.
- dl_utils.data.video.get_video_fps(video_path: str | PathLike[str] | Path) float[source]¶
Retrieve the FPS of a video.
- Parameters:
video_path – Path to the video file.
- Returns:
The FPS of the video.
- dl_utils.data.video.get_video_frame_count(video_path: str | PathLike[str] | Path) int[source]¶
Retrieve the total number of frames in a video.
- Parameters:
video_path – Path to the video file.
- Returns:
The number of frames in the video.
- dl_utils.data.video.get_video_duration(video_path: str | PathLike[str] | Path) Tuple[float, int, float][source]¶
Retrieve the FPS, frame count, and duration (in seconds) of a video.
- Parameters:
video_path – Path to the video file.
- Returns:
A tuple containing FPS, frame count, and duration in seconds.
- dl_utils.data.video.get_video_duration_batch(video_paths: List[str | PathLike[str] | Path]) List[float][source]¶
Get duration of videos in batch.
- Parameters:
video_paths – List of paths to videos.
- Returns:
A list of tuples, each containing FPS, frame count, and duration in seconds.
- dl_utils.data.video.convert_to_h265(input_file: AnyStr, output_file: AnyStr, ffmpeg_exec: AnyStr = '/usr/bin/ffmpeg', keyint: int = None, overwrite: bool = False, verbose: bool = False) None[source]¶
convert video to h265 format using ffmpeg @param input_file: input path @param output_file: output path @param ffmpeg_exec: @param keyint: @param overwrite: overwrite the existing file @param verbose: show ffmpeg output
Array¶
- dl_utils.data.array.to_numpy(array: ndarray | Tensor) ndarray[source]¶
Convert array-like object to numpy array.
Normalization¶
- dl_utils.data.normalize.normalize(data: ndarray | Tensor, mean: float | int | ndarray | Tensor, std: float | int | ndarray | Tensor, dim=-1) ndarray | Tensor[source]¶
Normalize the input array (usually image or video).
- Parameters:
data – Input array, can be a NumPy array or a PyTorch tensor.
mean – Scalar or vector of means for each channel.
std – Scalar or vector of standard deviations for each channel.
dim – The channel dimension to normalize along. Default is -1 (last dimension).
- Returns:
Normalized image or video in the same type as input (NumPy array or PyTorch tensor).
Examples
>>> import numpy as np >>> from dl_utils import normalize >>> img = np.array([[[0, 128, 255]]], dtype=np.float32) >>> normalize(img, mean=128, std=64) array([[[-2. , 0. , 1.984375]]])
>>> import torch >>> from dl_utils import normalize >>> img_t = torch.tensor([[[0, 128, 255]]], dtype=torch.float32) >>> normalize(img_t, mean=torch.tensor([0, 128, 255]), std=torch.tensor([1, 64, 255])) tensor([[[0., 0., 0.]]], dtype=torch.float64)
- dl_utils.data.normalize.inv_normalize(data: ndarray | Tensor, mean: float | int | ndarray | Tensor, std: float | int | ndarray | Tensor, dim=-1) ndarray | Tensor[source]¶
Inverse normalize the input array (usually image or video).
- Parameters:
data – Input array, can be a NumPy array or a PyTorch tensor, which has been previously normalized.
mean – Scalar or vector of means used in the original normalization.
std – Scalar or vector of standard deviations used in the original normalization.
dim – The channel dimension along which normalization was applied. Default is -1 (last dimension).
- Returns:
Denormalized image or video in the same type as input (NumPy array or PyTorch tensor).
Json¶
LMDB¶
Pickle¶
Sampling¶
- dl_utils.data.sample.sample_evenly(input_data: List[Any] | ndarray | Tensor | Sequence[Any], n: int) ndarray | List[Any] | Tensor[source]¶
Evenly sample N elements from input_data. Supports list, numpy array, or torch tensor. The input_data can be empty, and n can be less than or equal to 0, in which case it will return empty data.
- Parameters:
input_data – List, numpy array, or torch tensor to sample from.
n – Number of elements to sample.
- Returns:
Sampled data in the same type as input_data.
- dl_utils.data.sample.sample_randomly(input_data: List[Any] | ndarray | Tensor | Sequence[Any], n: int, ordered: bool = False, seed: int = None, put_back: bool = False) ndarray | List[Any] | Tensor[source]¶
Randomly sample N elements from input_data. Supports list, numpy array, or torch tensor.
- Parameters:
input_data – List, numpy array, or torch tensor to sample from.
n – Number of elements to sample.
ordered – Whether to return sampled elements in the original order.
seed – Random seed for reproducibility.
put_back – If True, sample with replacement.
- Returns:
Sampled data in the same type as input_data.
Text¶
Distributed¶
- dl_utils.distributed.gather_objects(list_object: List[Any]) List[Any][source]¶
gather a list of something from multiple GPU.
- dl_utils.distributed.dist_breakpoint(rank: int = 0)[source]¶
Breakpoint for distributed training. Enter the breakpoint only if the current rank is rank, and block all other processes using distributed barrier.
- dl_utils.distributed.dist_info(print_fn: ~typing.Callable[[str], ~typing.Any] = <built-in function print>, prefix: str = '')[source]¶
Print torch distributed information for debugging.
- dl_utils.distributed.barrier_if_distributed(*args, **kwargs)[source]¶
Synchronizes all processes if under distributed context.
- dl_utils.distributed.get_global_rank() int[source]¶
Get the global rank, the global index of the GPU.
- dl_utils.distributed.get_world_size() int[source]¶
Get (global) world size, the total amount of GPUs.
- dl_utils.distributed.recursive_to(obj: Any, device: str | device = None) Any[source]¶
Recursively move all torch.Tensor in obj to the given device. Supports: Tensor, list, tuple, dict, set. Leaves other objects intact.
- Parameters:
obj – The object to move.
device – The device to move to. If None, uses the current device if gpu is available, else “cpu”.
- Returns:
The object with all torch.Tensor moved to the given device.
File System¶
- dl_utils.fs.list_files_multithread(directory, n_jobs=16, depth: int | None = None)[source]¶
List all files in a directory recursively using multiple threads. Useful for list files on NFS.
- Parameters:
directory – The directory to search.
n_jobs – Number of parallel jobs (threads) to use.
depth – Maximum recursion depth. If None, no depth limit.
Returns: List of all file paths found under the directory.
- dl_utils.fs.list_files(path: str, depth: int | None = None) List[str][source]¶
List all files in a folder recursively.
- Parameters:
path – Root path to start the search.
depth – Maximum depth to search. If None, there is no depth limit. If 0 or less, stop searching deeper.
Returns: A List of file paths found under the given path.