Create Dataset

In this section, we will prepare the dataset and create a dataloader.

Overall, the dataloader can be created by:

from yolo import create_dataloader
dataloader = create_dataloader(cfg.task.data, cfg.dataset, cfg.task.task, use_ddp)

For inference, the dataset will be handled by StreamDataLoader, while for training and validation, it will be handled by YoloDataLoader.

The input arguments are:

DataConfig: DataConfig, the relevant configuration for the dataloader.
DatasetConfig: DatasetConfig, the relevant configuration for the dataset.
task_name: str, the task name, which can be inference, validation, or train.
use_ddp: bool, whether to use DDP (Distributed Data Parallel). Default is False.

Train and Validation

Dataloader Return Type

For each iteration, the return type includes:

batch_size: the size of each batch, used to calculate batch average loss.
images: the input images.
targets: the ground truth of the images according to the task.

Auto Download Dataset

The dataset will be auto-downloaded if the user provides the auto_download configuration. For example, if the configuration is as follows:

path: tests/data
train: train
validation: val

class_num: 80
class_list: ['Person', 'Bicycle', 'Car', 'Motorcycle', 'Airplane', 'Bus', 'Train', 'Truck', 'Boat', 'Traffic light', 'Fire hydrant', 'Stop sign', 'Parking meter', 'Bench', 'Bird', 'Cat', 'Dog', 'Horse', 'Sheep', 'Cow', 'Elephant', 'Bear', 'Zebra', 'Giraffe', 'Backpack', 'Umbrella', 'Handbag', 'Tie', 'Suitcase', 'Frisbee', 'Skis', 'Snowboard', 'Sports ball', 'Kite', 'Baseball bat', 'Baseball glove', 'Skateboard', 'Surfboard', 'Tennis racket', 'Bottle', 'Wine glass', 'Cup', 'Fork', 'Knife', 'Spoon', 'Bowl', 'Banana', 'Apple', 'Sandwich', 'Orange', 'Broccoli', 'Carrot', 'Hot dog', 'Pizza', 'Donut', 'Cake', 'Chair', 'Couch', 'Potted plant', 'Bed', 'Dining table', 'Toilet', 'Tv', 'Laptop', 'Mouse', 'Remote', 'Keyboard', 'Cell phone', 'Microwave', 'Oven', 'Toaster', 'Sink', 'Refrigerator', 'Book', 'Clock', 'Vase', 'Scissors', 'Teddy bear', 'Hair drier', 'Toothbrush']

auto_download:
  images:
    base_url: https://github.com/WongKinYiu/yolov9mit/releases/download/v1.0-alpha/
    train:
      file_name: mock_train
      file_num: 5
    val:
      file_name: mock_val
      file_num: 5
  annotations:
    base_url: https://github.com/WongKinYiu/yolov9mit/releases/download/v1.0-alpha/
    annotations:
      file_name: mock_annotations

First, it will download and unzip the dataset from {prefix}/{postfix}, and verify that the dataset has {file_num} files.

Once the dataset is verified, it will generate {train, validation}.cache in Tensor format, which accelerates the dataset preparation speed.

Inference

In streaming mode, the model will infer the most recent frame and draw the bounding boxes by default, given the save flag to save the image. In other modes, it will save the predictions to runs/inference/{exp_name}/outputs/ by default.

Dataloader Return Type

For each iteration, the return type of StreamDataLoader includes:

images: tensor, the size of each batch, used to calculate batch average loss.
rev_tensor: tensor, reverse tensor for reverting the bounding boxes and images to the input shape.
origin_frame: tensor, the original input image.

Input Type

Stream Input:
- webcam: int, ID of the webcam, for example, 0, 1.
- rtmp: str, RTMP address.
Single Source:
- image: Path, path to image files (jpeg, jpg, png, tiff).
- video: Path, path to video files (mp4).
Folder:
- folder of images: Path, the relative or absolute path to the folder containing images.