Step 2: Track Regions of Interest¶

The 2. Tracking ROIs tab is the most complex part of CASTLE. It contains two major sections: Single Video Tracking (for building and refining your ROI prompts) and Batch Videos Tracking (for scaling to multiple videos).

Overview¶

The tracking workflow has two phases:

Phase 1: Single Video Tracking (Build Your ROI Prompts)¶

Label ROIs on reference frames using SAM
Track across all frames using DeAOT
Review results and fix errors iteratively

Phase 2: Batch Video Tracking (Scale Up)¶

Once prompts work well on one video, apply them to all videos in the project.

Getting Started¶

Switch to the 2. Tracking ROIs tab
Select a video from the Select Video dropdown
Click Edit to load the video

Select video for tracking

Single Video Tracking¶

Label ROI¶

The Label ROI sub-tab is where you define what to track using point-and-click segmentation.

Navigate to the frame you want to label using the frame slider
Click on the animal (or body part) to create a segmentation mask
- SAM generates a mask from your click
- Click mode defaults to Add — each click refines the mask
- Use Change Mode to switch to Remove mode for excluding regions
To track multiple ROIs (e.g., body + head), click Label Next ROI to start a new ROI
Click Save ROIs when satisfied

Label ROI with SAM

What is an ROI?

An ROI (Region of Interest) is a segmented area you want to track. Common examples:

Body centroid: the entire animal body
Head: for head direction analysis
Tail: used as orientation reference for video alignment

Label on Multiple Frames

For robust tracking, label ROIs on multiple frames — especially frames where the animal's appearance changes significantly (different postures, partial occlusion, etc.). The more diverse your labels, the more robust the tracking.

ROI Prompts¶

The ROI Prompts sub-tab displays a gallery of all saved labels across the project. This is where you can review the diversity of your prompt set.

Labels are loaded from .npz files in the project's label/ directory
Each entry shows the frame with the segmentation mask overlaid
Labels from all videos in the project are shown together

Prompt Guidelines

Diversity = Stability: varied examples improve tracking robustness
Fewer = Faster: fewer prompts mean faster execution
Sweet spot: 5–30 prompts for optimal balance

Tracking¶

The Tracking sub-tab runs DeAOT to propagate masks across frames.

Select a tracking model:
- R50 (ResNet-50): faster, good for most cases
- SwinB (Swin Transformer-B): more accurate, better for challenging videos
Set Start Frame and Stop Frame (defaults to the full video)
Optionally check Skip existing to avoid re-processing
Click to start tracking

Run tracking

Monitor in Real-Time

Watch the tracking progress carefully. If you see errors (mask drifting, losing the animal), cancel immediately — don't waste time on bad tracking. Instead:

Go back to Label ROI
Label the frame where tracking failed
Adjust the Start Frame in Tracking settings
Re-run tracking from that point

Display Timing

During real-time monitoring, the displayed frame and mask may appear off by one frame. This is normal due to processing timing. Verify final results in the View tab after tracking completes.

View¶

The View sub-tab lets you browse tracking results frame by frame. Use the slider to scrub through frames and verify that masks are accurate.

Analysis (Post-Processing)¶

The Analysis sub-tab provides tools for reviewing and analyzing tracking quality after tracking is complete.

Batch Videos Tracking¶

Once your ROI prompts successfully track one video, use Batch Videos Tracking to process all videos in the project.

Switch to the Batch Videos Tracking sub-tab
The system uses all saved ROI prompts from the project
Each video is tracked independently using the shared prompt set
Progress is displayed for each video

Batch tracking

Iterative Refinement

If batch tracking fails on some videos, add labels from those failure frames and re-run. Your prompt set becomes more robust over time.

Tips and Best Practices¶

Start with one video: build a solid prompt set before scaling to batch
Quality over quantity: a few well-chosen prompts across diverse frames work better than many similar ones
R50 vs SwinB: start with R50 for speed; switch to SwinB if tracking quality is insufficient
Frame selection: label frames with different postures, lighting conditions, and occlusion scenarios
Cancel early: if tracking starts failing, cancel and add more labels rather than waiting for it to finish

Next Step¶

Once all videos are tracked, proceed to Step 3: Extract Features.