Step 3: Extract Latent Features¶

The 3. Extract Latent tab uses visual foundation models to extract feature representations from tracked ROIs. These features encode the animal's posture and movement in each frame as a high-dimensional vector.

Overview¶

Feature extraction transforms your tracked video into numerical data suitable for clustering:

Tracked Video (frames + masks) → Preprocessing → Visual Model → Latent Vectors (.npz)

Configuration¶

When you switch to the Extract Latent tab, the interface shows three columns:

Model & Target Settings (Left Column)¶

Parameter	Description	Default
Select Visual Model	Feature extraction backbone	`dinov2_vitb14_reg4_pretrain`
Enter ROI ID	Which tracked ROI to extract features from	`1`
Batch size	Frames processed per batch (increase if VRAM allows)	`32`
Select Target Video	Specific video or "All"	`All`
Skip existing files	Don't re-extract if output already exists	✅ Enabled

Available models:

dinov2_vitb14_reg4_pretrain — Meta's DINOv2 ViT-B/14 with registers (default, well-tested)
dinov3_vitb16 — DINOv3 ViT-B/16 (newer, potentially better representations)
dinov3_vitl16 — DINOv3 ViT-L/16 (larger model, higher quality but slower)

Preprocessing Settings (Middle Column)¶

These settings control how frames are preprocessed before feature extraction:

Parameter	Description	Default
Center ROI	Crop frames centered on a reference ROI	`False`
Center ROI ID	Which ROI to center on	`1`
Width / Height	Crop dimensions in pixels	`300 × 300`
Rotate based on Tail	Normalize orientation using a tail ROI	`False`
Tail ROI ID	Which ROI defines the tail direction	`2`
Remove Background	Mask out pixels outside the ROI	`False`

Click Apply First

After changing preprocessing settings, you must click the Apply button before extracting. The preview image shows the result of your preprocessing configuration on the first frame.

Preprocessing preview

Preprocessing Recommendations¶

Scenario	Center ROI	Rotate	Remove BG
General behavior	✅ On	❌ Off	❌ Off
Posture analysis	✅ On	✅ On	✅ On
Locomotion patterns	✅ On	✅ On	❌ Off
No preprocessing	❌ Off	❌ Off	❌ Off

Extraction Types¶

CASTLE offers three extraction modes, each triggered by a different button:

Extract (Standard Latent Extraction)¶

The primary extraction mode. Runs the selected visual model on preprocessed frames and saves latent vectors.

Output: .npz file in project/latent/model-name/
Filename pattern: {video}_ROI_{id}_{model}_{tags}.npz
Tags: ctr (centered), rmbg (background removed)

Extract Crop Video¶

Exports the preprocessed (centered, rotated, cropped) video as an MP4 file. Useful for:

Visual verification of preprocessing
Sharing aligned videos with collaborators
Input to external analysis tools
Output: .mp4 file in project/crop/video-name/

Extract Rotation Latent¶

Extracts features specifically capturing rotational information. Used when orientation is a key behavioral variable.

Output: .npz file with rotation-specific features

Running Extraction¶

Configure model, ROI, and preprocessing settings
Click Apply to confirm preprocessing
Click Extract (or the appropriate extraction button)
Monitor progress in the log output area

The log shows:

Pre-flight check (which videos need processing)
Per-video progress
Final summary with success/failure counts

Extraction progress

Output Format¶

The standard latent extraction produces .npz files containing:

import numpy as np

data = np.load('video_ROI_1_dinov2_vitb14_reg4_pretrain.npz')
latent_vectors = data['latent']  # Shape: (n_frames, feature_dim)

Feature dimension: depends on the model (768 for ViT-B, 1024 for ViT-L)
NaN values: frames where the ROI mask was empty produce NaN vectors

Processing Time¶

Processing time depends on video length, GPU, and model size.

[HUMAN TO CONFIRM: Approximate benchmarks for different configurations]

Speed Tips

Increase batch size if you have spare VRAM (e.g., 64 or 128)
Use Skip existing when re-running after adding new videos
dinov2_vitb14_reg4_pretrain is fastest; dinov3_vitl16 is slowest but potentially highest quality

Next Step¶

Once features are extracted for all videos, proceed to Step 4: Behavior Analysis.