Open Source Datasets
This is an article to teach you how to make your own dataset or where to find open-source datasets that are free to use and download.
Creating a Custom Dataset
Capture your own images with a camera then create labels for each image that indicates the bounding boxes and IDs of the object class captured.
Option 1: Create labels for all of the images using Yolo_mark [1]. The repo and instructions for use can be found here. These labels will be made in the darknet format.
Option 2: Use Innotescus, a Pittsburgh startup working on high-performance image annotation. They offer free academic accounts to CMU students. You can upload datasets and have multiple people working on annotations. There are task metrics that track how many of each class of image are annotated and show heat maps of their relative locations within an image so you can ensure proper data distributions.
Create a free beta account here
Open-Source Datasets:
General Datasets
Image to Language:
Visual Genome
Visual Qa
Specific Application Datasets:
Flowers:
Flower Classification 1
Flower Classification 2
Flower Classification 3
Plants:
Plant Doc
Plant Analysis
Indoor Room Scenes:
Princeton lsun
MIT toralba
Improved Udacity Self Driving Car
People and Animals:
Summary
We reviewed how to create labels for custom images to build a dataset. We also reviewed where to access specific and general open-source datasets depending on your application.
See Also:
References
[1] AlexeyAB (2019) Yolo_mark (Version ea049f3). https://github.com/AlexeyAB/Yolo_mark.