Open Source Datasets

This is an article to teach you how to make your own dataset or where to find open-source datasets that are free to use and download.

Creating a Custom Dataset

Capture your own images with a camera then create labels for each image that indicates the bounding boxes and IDs of the object class captured.

Option 1: Create labels for all of the images using Yolo_mark [1]. The repo and instructions for use can be found here. These labels will be made in the darknet format.

Option 2: Use Innotescus, a Pittsburgh startup working on high-performance image annotation. They offer free academic accounts to CMU students. You can upload datasets and have multiple people working on annotations. There are task metrics that track how many of each class of image are annotated and show heat maps of their relative locations within an image so you can ensure proper data distributions.

Create a free beta account here

Open-Source Datasets:

General Datasets

OpenImages

MS COCO

Labelme

ImageNet

COIL100

Image to Language:
Visual Genome
Visual Qa

CIFAR-10

Specific Application Datasets:

Chess Pieces

BCCD

Mountain Dew

Pistols

Packages

6-sided dice

Boggle board

Uno Cards

Lego Bricks

YouTube

Synthetic Fruit

Fruit

Flowers:
Flower Classification 1
Flower Classification 2
Flower Classification 3

Plants:
Plant Doc
Plant Analysis

Wildfire smoke

Aerial Maritime Drone

Anki Vector Robot

Home Objects

Indoor Room Scenes:
Princeton lsun
MIT toralba

Places

Parking Lot

Car Models

Improved Udacity Self Driving Car

Pothole

Hard Hat

Masks

People and Animals:

Aquarium

Brackish Underwater

Racoon

Thermal Cheetah

ASL

RPS

Human Hands

Human Faces

Celebrity Faces

Thermal Dogs and People

Dogs

Dogs and Cats

Summary

We reviewed how to create labels for custom images to build a dataset. We also reviewed where to access specific and general open-source datasets depending on your application.

See Also:

References

[1] AlexeyAB (2019) Yolo_mark (Version ea049f3). https://github.com/AlexeyAB/Yolo_mark.