Architecture¶
System Overview¶
┌─────────────────────────────────────────────────────────┐
│ app.py (Entry) │
│ Gradio Application │
└──────────────────────────┬──────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────┐
│ castle/ui/ │
│ main_ui ─┬─ project_ui (0. Project) │
│ ├─ source_ui (1. Upload Videos) │
│ ├─ edit_ui (2. Tracking ROIs) │
│ │ ├─ view_ui View frames │
│ │ ├─ label_ui Label ROIs (SAM) │
│ │ ├─ knowledge_ui ROI prompt gallery │
│ │ ├─ track_ui Run tracking (DeAOT) │
│ │ ├─ post_track_ui Post-process results │
│ │ └─ batch_track_ui Batch processing │
│ ├─ extract_ui (3. Extract Latent) │
│ └─ cluster_page_ui (4. Behavior Microscope) │
└──────────────────────────┬──────────────────────────────┘
│ calls
┌──────────────────────────▼──────────────────────────────┐
│ castle/core/ │
│ extractor.py ─ Feature extraction engine │
│ cluster.py ─ LatentAggregator, clustering logic │
│ data.py ─ Preprocess, VideoDataset │
│ models.py ─ Visual encoder abstraction │
│ config.py ─ Constants, model paths │
│ project.py ─ Project config I/O │
└──────────────────────────┬──────────────────────────────┘
│ uses
┌──────────────────────────▼──────────────────────────────┐
│ castle/utils/ │
│ project_manager.py Project CRUD operations │
│ video_manager.py Video import/scan │
│ video_io.py Video read/write (PyAV) │
│ video_align.py Center, rotate, crop frames │
│ image_segment.py SAM wrapper (Segmentor) │
│ video_object_segment.py DeAOT wrapper │
│ tracking_manager.py ROI tracking orchestration │
│ visual_latent_extract.py DINOv2/v3 wrapper │
│ latent_explorer.py Latent class, UMAP, clustering │
│ myumap.py Custom UMAP (cuml + spectral) │
│ h5_io.py HDF5 mask storage │
│ plot.py Visualization helpers │
│ roi_manager.py ROI utilities │
│ download.py Checkpoint download (gdown) │
└──────────────────────────┬──────────────────────────────┘
│ wraps
┌──────────────────────────▼──────────────────────────────┐
│ Vendored / External Models │
│ castle/sam/ ─ Segment Anything Model (Meta) │
│ castle/aot/ ─ DeAOT video object segmentation │
│ castle/configs/─ model_config.json │
└─────────────────────────────────────────────────────────┘
Module Map¶
castle/ui/ — User Interface¶
Built on Gradio. Each tab has its own module:
| Module | Tab | Purpose |
|---|---|---|
main_ui.py |
— | Creates the top-level app with all tabs |
project_ui.py |
0. Project | Create, open, delete projects |
source_ui.py |
1. Upload Videos | Upload local files or scan server directories |
edit_ui.py |
2. Tracking ROIs | Container for all tracking sub-UIs |
view_ui.py |
└─ View | Browse frames with slider |
label_ui.py |
└─ Label ROI | Point-and-click segmentation with SAM |
knowledge_ui.py |
└─ ROI Prompts | Gallery of all saved ROI labels |
track_ui.py |
└─ Tracking | Run DeAOT tracking with progress |
post_track_ui.py |
└─ Analysis | Post-process and review tracking |
batch_track_ui.py |
└─ Batch | Process multiple videos |
extract_ui.py |
3. Extract Latent | Configure and run feature extraction |
cluster_page_ui.py |
4. Behavior Microscope | UMAP + DBSCAN analysis |
castle/core/ — Core Business Logic¶
| Module | Purpose |
|---|---|
extractor.py |
Feature extraction execution engine |
cluster.py |
LatentAggregator — multi-video latent loading and frame retrieval |
data.py |
Preprocess dataclass, VideoDataset for batched extraction |
models.py |
VisualEncoder abstraction, model registry |
config.py |
Constants: checkpoint paths, model IDs, supported models |
project.py |
Project config read/write |
castle/utils/ — Utility Layer¶
| Module | Purpose |
|---|---|
project_manager.py |
Project CRUD (create, list, delete) |
video_manager.py |
Video import, directory scanning, format detection |
video_io.py |
Video read/write using PyAV, subtitle generation |
video_align.py |
Frame alignment: center, rotate, crop |
image_segment.py |
SAM wrapper (Segmentor class) |
video_object_segment.py |
DeAOT wrapper (model loading) |
tracking_manager.py |
ROITracker — orchestrates multi-frame tracking |
visual_latent_extract.py |
DINOv2/v3 wrapper functions |
latent_explorer.py |
Latent class — embedding, clustering, visualization |
myumap.py |
Custom UMAP using cuml + spectral layout |
h5_io.py |
H5IO — HDF5 file I/O for mask storage |
plot.py |
Visualization: frame+mask overlay, dot annotations |
roi_manager.py |
ROI color management and utilities |
download.py |
Checkpoint download via gdown |
castle/sam/ — SAM (Vendored)¶
Segment Anything Model from Meta AI. Forked from facebookresearch/segment-anything.
castle/aot/ — DeAOT (Vendored)¶
Decoupling features for video object segmentation. Forked from yoxu515/aot-benchmark.
castle/configs/ — Configuration¶
Contains model_config.json with paths and settings for all models.
Data Flow¶
Video File (.mp4)
│
▼
[1. SAM] Point clicks → segmentation masks (.npz labels)
│
▼
[2. DeAOT] Propagate masks → tracked masks (mask_list.h5)
│
▼
[3. Align] Center + rotate + crop → normalized frames
│
▼
[4. DINOv2/v3] Extract features → latent vectors (.npz)
│
▼
[5. UMAP] Dimensionality reduction → 2D embedding
│
▼
[6. DBSCAN] Clustering → behavioral syllables
│
▼
[Output] CSV labels, SRT subtitles, embedding NPZ
Project Directory Structure¶
After a complete analysis run:
projects/my-project/
├── config.json # Project metadata
├── sources/ # Video files
│ ├── video1.mp4
│ └── video2.mp4
├── label/ # ROI labels (SAM output)
│ └── video1.mp4/
│ ├── 0.npz # Label at frame 0
│ └── 247.npz # Label at frame 247
├── track/ # Tracking results (DeAOT output)
│ └── video1.mp4/
│ └── mask_list.h5 # HDF5 with per-frame masks
├── crop/ # Cropped/aligned videos
│ └── video1.mp4/
│ └── video1_ROI_1_crop.mp4
├── latent/ # Extracted features
│ └── dinov2_vitb14_reg4_pretrain/
│ └── video1_ROI_1_dinov2_vitb14_reg4_pretrain.npz
└── cluster/ # Analysis results
├── id.csv # Cluster ID → name mapping
├── time_series.csv # Frame-by-frame assignments
└── cluster_grooming_rearing_.npz # Embedding + labels