Step 5: Export and Use Results¶
After completing the behavior analysis in Step 4, CASTLE automatically generates several output files. This page explains the available outputs, their formats, and how to use them in your research.
Available Outputs¶
All outputs are saved in the cluster/ directory within your project:
projects/my-project/
└── cluster/
├── id.csv # Cluster ID → name mapping
├── time_series.csv # Frame-by-frame assignments
├── cluster_behavior1_behavior2_.npz # Embedding + cluster data
└── (SRT subtitle files)
Behavior ID CSV (id.csv)¶
Maps cluster IDs to their human-assigned names.
Time Series CSV (time_series.csv)¶
Frame-by-frame behavioral state assignments. Each row corresponds to one frame.
- Values correspond to cluster IDs from
id.csv -1indicates unclassified frames (e.g., frames with missing tracking data)- When using a time window > 1, values are repeated for each frame in the window
SRT Subtitles¶
Standard subtitle files (.srt format) that can be overlaid on the original videos. Each subtitle entry shows the behavioral label for that time segment. Useful for:
- Quick visual verification of results
- Presentations and lab meetings
- Sharing with collaborators who don't use CASTLE
Embedding NPZ¶
Contains the UMAP coordinates and cluster assignments:
import numpy as np
data = np.load('cluster_grooming_rearing_.npz')
embeddings = data['emb'] # Shape: (n_samples, 2) — UMAP coordinates
clusters = data['cls'] # Shape: (n_samples,) — cluster assignments
config = data['config'] # UMAP configuration used
- NaN values in embeddings indicate frames that were excluded from analysis
- Cluster value of
-1indicates unclassified frames
Using Results in Your Research¶
Loading Data in Python¶
import pandas as pd
import numpy as np
# Load behavioral time series
ts = pd.read_csv('projects/my-project/cluster/time_series.csv', index_col=0)
behaviors = ts['behavior'].values
# Load cluster names
ids = pd.read_csv('projects/my-project/cluster/id.csv')
id_to_name = dict(zip(ids['Id'], ids['Name']))
# Convert to named behaviors
named_behaviors = [id_to_name.get(b, 'unknown') for b in behaviors]
# Basic statistics
from collections import Counter
print(Counter(named_behaviors))
Loading Data in R¶
# Load behavioral time series
ts <- read.csv("projects/my-project/cluster/time_series.csv")
behaviors <- ts$behavior
# Load cluster names
ids <- read.csv("projects/my-project/cluster/id.csv")
# Merge
ts$name <- ids$Name[match(ts$behavior, ids$Id)]
# Summary
table(ts$name)
Common Analyses¶
Behavior Duration Distribution¶
import numpy as np
def get_bout_durations(behaviors, target_cluster, fps=30):
"""Get durations of consecutive bouts of a behavior."""
is_target = (behaviors == target_cluster)
changes = np.diff(is_target.astype(int))
starts = np.where(changes == 1)[0] + 1
ends = np.where(changes == -1)[0] + 1
if is_target[0]:
starts = np.concatenate([[0], starts])
if is_target[-1]:
ends = np.concatenate([ends, [len(behaviors)]])
durations = (ends - starts) / fps # Convert to seconds
return durations
Comparing Groups¶
# Compare behavior proportions between two conditions
from scipy import stats
group_a_time = behaviors_a # From condition A
group_b_time = behaviors_b # From condition B
for cluster_id, name in id_to_name.items():
prop_a = np.mean(group_a_time == cluster_id)
prop_b = np.mean(group_b_time == cluster_id)
print(f"{name}: Group A = {prop_a:.3f}, Group B = {prop_b:.3f}")
For Publication¶
Citing CASTLE
See the Citation page for BibTeX entries.
Recommended figures for publications:
- UMAP embedding colored by cluster — shows behavioral space structure
- Ethogram — timeline visualization of behavioral states
- Cluster representative frames — example frames from each behavioral category
- Duration/proportion bar charts — quantitative comparison between groups
Loading Latent Features¶
For advanced analysis, you can also work directly with the latent features extracted in Step 3:
import numpy as np
# Load latent features
data = np.load('projects/my-project/latent/dinov2_vitb14_reg4_pretrain/video_ROI_1_dinov2_vitb14_reg4_pretrain.npz')
latent = data['latent'] # Shape: (n_frames, feature_dim)
# Use with your own dimensionality reduction or clustering
from sklearn.decomposition import PCA
pca = PCA(n_components=50)
reduced = pca.fit_transform(latent[~np.isnan(latent).any(axis=1)])
Integration with Other Tools¶
CASTLE's outputs are in standard formats (CSV, NPZ) compatible with most analysis pipelines:
- DeepLabCut / SLEAP: compare pose estimation with CASTLE's training-free approach
- SimBA / B-SOiD: use CASTLE's behavioral labels as input
- Custom pipelines: load CSV time series directly
Summary¶
You've completed the full CASTLE workflow:
- ✅ Created a project and uploaded videos
- ✅ Tracked ROIs with SAM + DeAOT
- ✅ Extracted latent features with DINOv2/v3
- ✅ Discovered behavioral clusters with UMAP + DBSCAN
- ✅ Exported results for analysis and publication
For questions or issues, check the FAQ or open an issue on GitHub.