Image Annotation: Techniques, Tools & Best Practices (2026)

Q: What are the different types of image annotation?

The main types are: bounding box annotation (rectangles around objects), polygon annotation (irregular outline tracing), semantic segmentation (every pixel labelled), instance segmentation (individual object masks), keypoint annotation (landmark points), and polyline annotation (open paths for lines like lane markings).

Q: What is the difference between semantic segmentation and instance segmentation?

Semantic segmentation assigns a class label to every pixel but does not distinguish between separate objects of the same class. Instance segmentation goes further: it gives each individual object its own mask, so two cars in the same frame are labelled as separate instances.

Q: When should I use bounding box annotation?

Bounding box annotation is the right choice for object detection tasks where you need to know an object exists and roughly where it is. It is the fastest and most cost-effective annotation method.

Q: What is AI-assisted image annotation?

AI-assisted annotation uses a model like SAM 3 to automatically generate initial label suggestions that human annotators then review and correct. It reduces per-label time while keeping human judgment in the loop.

Q: What is inter-annotator agreement (IAA) and what score is good?

Inter-annotator agreement measures how consistently different annotators label the same images. An IAA of 85% or above (measured by IoU for bounding boxes) is the industry threshold before scaling up a labelling project.

Q: What image annotation export formats should my tool support?

At minimum: COCO JSON, YOLO, Pascal VOC, and Mask PNG. These cover the most widely used model training frameworks.

A practical guide to choosing the right annotation method, building an efficient workflow, and getting training data your model can actually learn from.

Quick Answer: What Is Image Annotation?

Image annotation is the process of adding structured labels to raw image data using bounding boxes, polygon outlines, segmentation masks, keypoints, or polylines so that a machine learning model can learn to recognise, locate, or understand objects within images. The annotation technique you choose depends on the computer vision task the model is being trained to perform. Annotation quality directly determines how well the model performs in production.

Every computer vision model from the face unlock on your phone to the warehouse robot picking the right package was trained on annotated images. Somebody, or some combination of human annotators and AI assistance, drew bounding boxes around objects, outlined shapes with polygon tools, or labelled every pixel using segmentation techniques. That process is image annotation, and the quality of that labelled training data determines how well your model performs once it hits production.

This guide walks through every major image labelling technique, explains when to use each one, and lays out the workflow steps that consistently produce high-quality training datasets for machine learning. Whether you are setting up your first computer vision annotation project or diagnosing why a current dataset is underperforming, this is the reference to keep open.

Part of Scematics' Data Annotation Series

This guide is a cluster page within Scematics' Complete Data Annotation Guide. If you are also working with video or documents, see the Video Annotation Guide and the Text and Document Annotation Guide for modality-specific guidance.

In This Guide

What Is Image Annotation?

Image Annotation Techniques: All Six Methods Explained

Quick-Reference Technique Selector

How to Build an Image Annotation Workflow

Common Image Annotation Mistakes (and How to Avoid Them)

What to Look for in Image Annotation Software

In-House vs. Managed Annotation Service

Frequently Asked Questions (FAQ)

1. What Is Image Annotation?

Image annotation is the process of adding structured labels to raw image data so that a machine learning model can learn to recognize, locate, or understand objects and patterns within images. It is also called image labelling or image tagging depending on context, and the technique used bounding box, polygon, segmentation, keypoint depends on the computer vision task the model is being trained for.

Raw images carry no inherent meaning for a neural network. A JPEG of a busy street contains thousands of pixels arranged in a grid. The model cannot tell a pedestrian from a traffic light unless it has been shown thousands of labelled training examples during the supervised learning process. Every correctly labelled image you add moves the model closer to being reliable in the real world.

Core Principle: The model is only as good as the annotated data it learned from. Image annotation is where training data gets its meaning and where the ceiling of your model's real-world accuracy is set.

2. Image Annotation Techniques: All Six Methods Explained

The image annotation technique you choose should match the task your computer vision model needs to perform. Using a more complex annotation method than necessary wastes labelling time and budget. Using a simpler one than required limits what the model can learn. Here is a breakdown of every major technique in current use.

Bounding Box Annotation

A rectangle drawn around an object of interest. The output is four coordinates: x/y position of the top-left corner plus width and height. Bounding box annotation is the most widely used image labelling technique in computer vision and the foundation of most object detection datasets.

Best for: Object detection tasks retail shelf monitoring, vehicle detection, face detection, package identification in logistics.

Not ideal when: Objects have irregular shapes and the bounding box would include significant background area that harms model learning.

Tip: Bounding box is the fastest and most cost-effective image labelling technique. It is the right default choice unless your use case specifically requires more precise shape definition.

Polygon Annotation

The annotator traces the outline of an object by placing a series of connected points, producing an irregular shape that follows the actual contour much more closely than a rectangle. Preferred whenever object boundaries are irregular and precision matters for model training.

Best for: Objects with distinct edges that do not conform to a rectangular shape crop plants, disease patches, construction site equipment, manufacturing defect marking.

Trade-off: Two to four times slower per object than bounding box annotation. AI-assisted tools that snap polygon vertices to object edges reduce this gap significantly in production workflows.

Semantic Segmentation Annotation

Every pixel in the image is assigned a class label. If a training dataset has three classes (road, vehicle, pedestrian), every single pixel is tagged as one of those three. The model learns complete scene composition, not just object locations.

Best for: Scene understanding tasks autonomous driving scene parsing, satellite and aerial image analysis, medical image annotation where tissue type throughout the image matters.

Trade-off: Most time-intensive annotation type. A single complex outdoor scene can take 30–90 minutes to segment fully. AI-assisted segmentation with SAM 3 reduces time substantially, but pixel-level labelling still carries higher cost per image than bounding box or polygon annotation.

Instance Segmentation Annotation

Semantic segmentation with one important addition: it distinguishes between separate instances of the same class. In semantic segmentation, two cars in a frame are both labelled 'car.' In instance segmentation, they are labelled 'car 1' and 'car 2' with separate pixel masks enabling the model to count, track, and individually identify objects.

Best for: Counting tasks, object tracking, and any application where individual object identity matters cell counting in pharmaceutical research, crowd monitoring for security AI, automated inventory counting in retail.

Keypoint Annotation

Specific landmark points on an object are marked rather than its outline or bounding area. A human body keypoint dataset might mark shoulders, elbows, wrists, hips, knees, and ankles. A facial keypoint dataset marks eye corners, nose tip, and mouth corners. Keypoint labelling produces skeletal representations capturing structure and proportion.

Best for: Pose estimation, gesture recognition, facial expression analysis, sports performance analytics any application where the relative positions of specific landmark points matter more than the overall object shape.

Polyline Annotation

Polylines are open paths unlike the closed shapes that polygon annotation produces. The annotator traces a line rather than enclosing an area, capturing the direction and position of linear structures.

Best for: Lane marking detection for autonomous vehicle training data, road centreline annotation for HD map creation, cable and pipe detection in utility inspection, crack detection in structural and civil engineering inspection.

3. Quick-Reference: Choosing the Right Image Annotation Technique

Use this table when scoping a new computer vision annotation project. Match the model task to the technique then size the workflow for the complexity that technique requires.

Technique	Use When	Example Use Case
Bounding Box	Object location, roughly rectangular	Vehicle detection, retail product detection
Polygon	Irregular shapes, precise outlines needed	Crop disease zones, construction equipment
Semantic Segmentation	Full scene understanding, every pixel matters	Autonomous driving, medical imaging
Instance Segmentation	Count or track individual objects separately	Cell counting, crowd monitoring
Keypoint	Landmark positions matter more than shape	Pose estimation, facial landmark detection
Polyline	Path or line structure, not a closed shape	Lane detection, structural crack detection

4. How to Build an Image Annotation Workflow That Produces Quality Training Data

Picking the right image annotation technique is step one. Building a workflow around it that consistently produces high-quality labelled training data is what determines whether your dataset is actually useful for machine learning. Here are the five steps that matter, in the order they need to happen.

Step 1: Define Your Class Schema Before Anything Else

The class schema is the list of annotation labels your annotators will apply. Every label needs a precise definition. The test for a good definition: two annotators working independently should reach the same conclusion on any ambiguous case. Document edge cases explicitly 'does a forklift count as a vehicle?' should not be answered differently by different annotators.

For each class, complete these four steps before annotation begins:

List every class the model needs to detect or classify.

Write a one- to two-sentence definition for each annotation label.

Collect five to ten real examples of edge cases per class and document exactly how they should be labelled.

Review the labelling schema with whoever will use the model output misalignments between annotator labels and downstream team expectations are common and expensive to fix after a large dataset has been produced.

Step 2: Run a Calibration Batch Before Full-Scale Annotation

Before assigning your full image dataset to annotators, run a small calibration batch of 50–100 images through the full annotation workflow. Have multiple annotators label the same images independently, then compare results to measure inter-annotator agreement (IAA).

For bounding box tasks, use IoU (intersection over union) to measure annotator agreement. For classification tasks, use percentage agreement. A target of 85% or above is the industry threshold before scaling up a labelling project. Below 85% is a signal to revisit your annotation guidelines.

Why This Matters : A workflow problem discovered at image 50 costs a fraction of one discovered at image 5,000. The calibration batch is your quality insurance before full-scale labelling begins.

Step 3: Use AI-Assisted Annotation for Speed Without Sacrificing Control

AI-assisted image annotation tools generate initial labels that human annotators then review, adjust, and approve. The AI handles mechanical drawing tasks. The human annotator handles judgment calls that define label quality.

Scematics' image annotation platform uses SAM 3 (Meta's Segment Anything Model, third generation) as its AI-assisted annotation engine. An annotator clicks a point inside an object, or draws a rough bounding box around it, and SAM 3 generates a precise segmentation mask covering the object boundary.

For teams that have already fine-tuned their own model on domain-specific image data, Scematics also supports bring-your-own-model (BYOM) workflows, where a custom model serves as the pre-annotation engine instead of or alongside SAM 3 especially useful in specialist domains like pharmaceutical imaging or industrial defect inspection.

Important : AI-assisted annotation reduces per-label time significantly, but requires human review on every output. The annotator's role shifts from drawing to verifying which is faster, but still essential for training data quality.

Step 4: Build a Two-Stage Review Process

A single annotation pass is not enough for production-quality training datasets. Build at least two review stages into your image labelling workflow.

Stage 1 Annotator self-review: Annotators check their own work before submitting. This catches simple errors like missed objects or labels applied to the wrong class.

Stage 2 Independent QA review: A separate reviewer checks a sample of completed annotations. This is where systematic errors are caught where a labeller is consistently doing something wrong across many images. Catching a systematic error at stage two is far cheaper than discovering it when your model fails to generalise on validation data.

Step 5: Export in the Format Your Training Pipeline Expects

Image annotation files need to be in the export format your model training framework accepts. Choosing the wrong format and converting manually is a common source of label corruption in computer vision pipelines.

Format	Used For
COCO JSON	Segmentation and detection labelling the most widely supported format across frameworks
YOLO	Text files with normalised bounding box coordinates common for YOLO model variants
Pascal VOC	XML format used in classic object detection benchmarks (Faster R-CNN, SSD)
COCO Segmentation	Instance segmentation tasks with polygon masks
Mask PNG	Pixel-level semantic segmentation maps
CreateML JSON	Apple's CreateML framework

5. Common Image Annotation Mistakes and How to Avoid Them

Most image annotation quality problems trace back to a small set of recurring mistakes. Knowing them before starting a labelling project saves time and avoids producing a training dataset you will need to re-annotate later.

Mistake	Why It Happens	How to Fix It
Inconsistent label boundaries	Vague guidelines allow different annotators to use different amounts of bounding-box padding	Specify exact padding rules in annotation guidelines e.g. 'boxes should be tight with 2px margin'
Ignoring occluded objects	Annotators skip objects partially hidden behind other objects	Guidelines must specify minimum visible percentage before labelling is required
Labels drift over time	Annotator interpretation gradually shifts on long projects	Run regular calibration checks against the original calibration batch throughout the project
Skipping small objects	Small objects are easy to miss in busy scenes	Call out small object classes explicitly; add a QA review pass focused on dense scenes
Class confusion on similar categories	Visually similar classes (van vs. truck, bruise vs. lesion) are high-confusion pairs	Provide visual examples of each class and a higher QA rate for images where those classes appear together

6. What to Look for in Image Annotation Software

The image annotation tool you use shapes the speed, quality, and cost of your labelling work. These are the features that matter most when evaluating image annotation software for a production computer vision pipeline:

Support for all annotation types You should not need different tools for bounding boxes, polygons, segmentation, and keypoints. A single platform that handles all of them reduces onboarding friction and keeps your labelled dataset management in one place.

AI-assisted pre-annotation Look for native SAM 3 integration or the ability to connect your own fine-tuned model as the pre-annotation engine. Both matter depending on your domain specificity.

Configurable review workflows The platform should let you define how many review stages you need and route labelled images through them automatically, without manual handoffs.

Real-time quality metrics You should be able to see inter-annotator agreement rates, annotation task completion times, and labelling error rates without exporting to a separate analytics tool.

Format-flexible dataset export COCO, YOLO, Pascal VOC, and Mask PNG at minimum. Proprietary export-only formats create migration problems when you need to move labelled training data to a different framework.

Access control and data security Role-based access controls and the ability to restrict which annotators see which images within a labelling project. Critical for sensitive image data.

Scematics Image Annotation Software: Scematics covers all of the above all annotation types, native SAM 3 AI assistance with BYOM support, configurable review workflows, real-time quality metrics, six export formats (COCO, YOLO, Pascal VOC, YOLO Darknet, CreateML JSON, Mask PNG), and enterprise-grade access controls.

7. In-House vs. Managed Annotation Service

Running your own image annotation team makes sense when your image data is highly sensitive, when the labelling task requires specialized domain knowledge you have in-house, or when image annotation is a continuous and central part of your product workflow.Outsourcing to a managed data annotation service makes sense when you need to scale quickly, when your in-house team does not have annotation expertise, or when you have a time-limited computer vision project with a data volume your team cannot absorb.

Factor	In-House Team	Managed Service
Data sensitivity	Full control data never leaves your environment	Requires vendor data handling agreement; evaluate carefully
Speed to scale	Limited by hiring and onboarding time	Immediate scale-up vendor provides annotators on demand
Domain expertise	Strong if in-house team has the domain knowledge	Specialist services (e.g. Scematics) have CGI + domain-trained annotators
Cost model	Lower per-annotation; higher fixed overhead	Higher per-annotation; zero fixed overhead
Quality control	Dependent on internal QA processes	Vendor QA frameworks handle consistency across large projects

Most computer vision teams end up using a combination depending on project type, data sensitivity, and annotation volume. Scematics offers both: the self-serve platform for in-house teams, and a managed labelling service backed by annotators with 15+ years of CGI experience.

8. Frequently Asked Questions (FAQ)

What is image annotation?

Image annotation is the process of adding structured labels to raw image data using techniques like bounding boxes, polygon outlines, segmentation masks, or keypoints so that a machine learning model can learn to recognise, locate, or understand objects within images. It is also called image labelling or image tagging. The annotation technique used depends on the computer vision task the model is being trained to perform.

What are the different types of image annotation?

The six main image annotation types are: (1) bounding box rectangles drawn around objects for object detection; (2) polygon annotation irregular outlines for objects with complex shapes; (3) semantic segmentation every pixel assigned a class label for scene understanding; (4) instance segmentation individual object masks that distinguish between separate instances of the same class; (5) keypoint annotation landmark points for pose estimation and facial recognition; (6) polyline annotation open paths for lane markings and linear structures.

What is the difference between semantic segmentation and instance segmentation?

Semantic segmentation assigns a class label to every pixel in an image but treats all objects of the same class identically two cars are both labelled 'car' with no distinction between them. Instance segmentation goes further: it assigns each individual object its own pixel mask, so two cars in the same frame are labelled 'car 1' and 'car 2' separately. Use semantic segmentation for scene understanding; use instance segmentation when counting or tracking individual objects.

When should I use bounding box annotation vs. polygon annotation?

Use bounding box annotation when objects are roughly rectangular and precise shape boundaries are not critical for model performance it is the fastest and most cost-effective annotation method. Use polygon annotation when objects have irregular outlines (crop plants, medical lesions, construction equipment) and a bounding box would include significant background area that would harm model learning. Polygon annotation is two to four times slower per object but significantly more precise.

What is AI-assisted image annotation?

AI-assisted image annotation uses a foundation model such as Meta's SAM 3 to automatically generate initial label suggestions (segmentation masks or bounding boxes) that human annotators then review, correct, and approve. It shifts the annotator's role from drawing to verifying, which is significantly faster while keeping human judgment in the loop for every final label decision. AI assistance does not replace human review it reduces the time required per label.

What is inter-annotator agreement (IAA) and what score is good?

Inter-annotator agreement (IAA) measures how consistently different human annotators label the same images. For bounding box annotation, it is typically measured using IoU (intersection over union). For classification labelling, percentage agreement is used. An IAA of 85% or above is the industry threshold before scaling up a labelling project. Below 85% is a signal to revisit your annotation guidelines and run additional calibration before producing a large dataset.

What image annotation export formats should my tool support?

At minimum, your image annotation tool should support COCO JSON (the most widely used format for segmentation and detection), YOLO (for YOLO model variants), Pascal VOC (XML for classic object detection benchmarks), and Mask PNG (for semantic segmentation). Proprietary or export-only formats create migration problems when you need to move labelled training data to a different framework.

Start Annotating with Scematics

Scematics gives your team bounding boxes, polygons, semantic segmentation, instance segmentation, keypoints, polylines, and AI-assisted labelling with SAM 3 all in a single platform. Both self-serve and managed labelling options are available.

Scematics Copyrights Reserved