Step 5: Export and Use Results¶

After completing the behavior analysis in Step 4, CASTLE automatically generates several output files. This page explains the available outputs, their formats, and how to use them in your research.

Available Outputs¶

All outputs are saved in the cluster/ directory within your project:

projects/my-project/
└── cluster/
    ├── id.csv                          # Cluster ID → name mapping
    ├── time_series.csv                 # Frame-by-frame assignments
    ├── cluster_behavior1_behavior2_.npz # Embedding + cluster data
    └── (SRT subtitle files)

Behavior ID CSV (`id.csv`)¶

Maps cluster IDs to their human-assigned names.

Id,Name
0,init
1,grooming
2,rearing
3,locomotion

Time Series CSV (`time_series.csv`)¶

Frame-by-frame behavioral state assignments. Each row corresponds to one frame.

,behavior
0,1
1,1
2,1
3,3
4,3
...

Values correspond to cluster IDs from id.csv
-1 indicates unclassified frames (e.g., frames with missing tracking data)
When using a time window > 1, values are repeated for each frame in the window

SRT Subtitles¶

Standard subtitle files (.srt format) that can be overlaid on the original videos. Each subtitle entry shows the behavioral label for that time segment. Useful for:

Quick visual verification of results
Presentations and lab meetings
Sharing with collaborators who don't use CASTLE

Embedding NPZ¶

Contains the UMAP coordinates and cluster assignments:

import numpy as np

data = np.load('cluster_grooming_rearing_.npz')
embeddings = data['emb']    # Shape: (n_samples, 2) — UMAP coordinates
clusters = data['cls']      # Shape: (n_samples,) — cluster assignments
config = data['config']     # UMAP configuration used

NaN values in embeddings indicate frames that were excluded from analysis
Cluster value of -1 indicates unclassified frames

Using Results in Your Research¶

Loading Data in Python¶

import pandas as pd
import numpy as np

# Load behavioral time series
ts = pd.read_csv('projects/my-project/cluster/time_series.csv', index_col=0)
behaviors = ts['behavior'].values

# Load cluster names
ids = pd.read_csv('projects/my-project/cluster/id.csv')
id_to_name = dict(zip(ids['Id'], ids['Name']))

# Convert to named behaviors
named_behaviors = [id_to_name.get(b, 'unknown') for b in behaviors]

# Basic statistics
from collections import Counter
print(Counter(named_behaviors))

Loading Data in R¶

# Load behavioral time series
ts <- read.csv("projects/my-project/cluster/time_series.csv")
behaviors <- ts$behavior

# Load cluster names
ids <- read.csv("projects/my-project/cluster/id.csv")

# Merge
ts$name <- ids$Name[match(ts$behavior, ids$Id)]

# Summary
table(ts$name)

Common Analyses¶

Behavior Duration Distribution¶

import numpy as np

def get_bout_durations(behaviors, target_cluster, fps=30):
    """Get durations of consecutive bouts of a behavior."""
    is_target = (behaviors == target_cluster)
    changes = np.diff(is_target.astype(int))
    starts = np.where(changes == 1)[0] + 1
    ends = np.where(changes == -1)[0] + 1

    if is_target[0]:
        starts = np.concatenate([[0], starts])
    if is_target[-1]:
        ends = np.concatenate([ends, [len(behaviors)]])

    durations = (ends - starts) / fps  # Convert to seconds
    return durations

Comparing Groups¶

# Compare behavior proportions between two conditions
from scipy import stats

group_a_time = behaviors_a  # From condition A
group_b_time = behaviors_b  # From condition B

for cluster_id, name in id_to_name.items():
    prop_a = np.mean(group_a_time == cluster_id)
    prop_b = np.mean(group_b_time == cluster_id)
    print(f"{name}: Group A = {prop_a:.3f}, Group B = {prop_b:.3f}")

For Publication¶

Citing CASTLE

See the Citation page for BibTeX entries.

Recommended figures for publications:

UMAP embedding colored by cluster — shows behavioral space structure
Ethogram — timeline visualization of behavioral states
Cluster representative frames — example frames from each behavioral category
Duration/proportion bar charts — quantitative comparison between groups

Loading Latent Features¶

For advanced analysis, you can also work directly with the latent features extracted in Step 3:

import numpy as np

# Load latent features
data = np.load('projects/my-project/latent/dinov2_vitb14_reg4_pretrain/video_ROI_1_dinov2_vitb14_reg4_pretrain.npz')
latent = data['latent']  # Shape: (n_frames, feature_dim)

# Use with your own dimensionality reduction or clustering
from sklearn.decomposition import PCA
pca = PCA(n_components=50)
reduced = pca.fit_transform(latent[~np.isnan(latent).any(axis=1)])

Integration with Other Tools¶

CASTLE's outputs are in standard formats (CSV, NPZ) compatible with most analysis pipelines:

DeepLabCut / SLEAP: compare pose estimation with CASTLE's training-free approach
SimBA / B-SOiD: use CASTLE's behavioral labels as input
Custom pipelines: load CSV time series directly

Summary¶

You've completed the full CASTLE workflow:

✅ Created a project and uploaded videos
✅ Tracked ROIs with SAM + DeAOT
✅ Extracted latent features with DINOv2/v3
✅ Discovered behavioral clusters with UMAP + DBSCAN
✅ Exported results for analysis and publication

For questions or issues, check the FAQ or open an issue on GitHub.