M
Mosaic Eval

Research dashboard

A local-first evaluation harness for curated task runs, checkpointed execution, and analysis snapshots.

Drizzle + SQLiteBun workflowProxy-task MVP
Task fixtures
19
Seeded research fixtures available for launch.
Recent runs
8
3 complete, 0 running.
Audit events
312
Persisted run activity and checkpoints.
Research budget
$0.34
Accumulated cost across completed runs.
Recent runs
Most recent orchestration sessions with summaries, scores, and costs.
NameStatusStrategyScoreDeltaUpdated
Mosaic run 3/5/2026
2095e8ca-3afb-40bd-a03c-e97842f0f636
COMPLETERound Robin73.5-11.05 hours ago
Mosaic run 5/5/2026
e0f3d40c-b2bf-4a68-864c-3f3df07616d5
FAILEDRound Robinn/an/a5 hours ago
Mosaic run 5/5/2026
5fd770c5-d8ae-4d54-8b5d-5899d495de43
FAILEDRound Robinn/an/a5 hours ago
Mosaic run 1/5/2026
8fb3a96e-d2bc-4bc8-89c3-dbafd32c9eb7
COMPLETERound Robin66.70.018 hours ago
Mosaic run 5/5/2026
507d0bea-9dfd-44c2-97d1-146f66f800ae
COMPLETERound Robin79.00.019 hours ago
Mosaic run 5/5/2026
5798b75b-6308-4e43-87eb-38b536d9de0e
FAILEDRound Robinn/an/a20 hours ago
Mosaic run 5/5/2026
201f0266-99e0-4916-b239-983430e1a4ed
FAILEDRound Robinn/an/a20 hours ago
Mosaic run 5/5/2026
cf06a905-86b1-4c39-81db-573d639fa764
FAILEDRound Robinn/an/a20 hours ago
Architecture snapshot
The MVP is intentionally compact: routes, orchestrator, evaluation, and SQLite all live in the app.
Route layout

App-local components live in `app/src/app/components`, while the routes stay thin and server-driven.

Data path

Drizzle writes to local SQLite for the MVP, so setup stays fast and reproducible.

Next step

Queue a run, inspect the live audit trail, then review step-level results and exports.

Task library
Seed tasks and calibration fixtures that feed the evaluation pipeline.
E. coli Growth Conditions
general bioDifficulty 11 steps
proxy
bacteriaculturelab
Lab Notebook Cleanup
general bioDifficulty 11 steps
proxy
documentationworkflowrecordkeeping
Microscopy Notes
general bioDifficulty 11 steps
proxy
microscopyobservationnotes
Public Bio Summary
general bioDifficulty 11 steps
proxy
summarypublic-datacommunication
PCR Planning Notes
general bioDifficulty 21 steps
proxy
pcrplanninglab
Results Briefing
general bioDifficulty 21 steps
proxy
summaryreportingcommunication
Data Review Checklist
general bioDifficulty 21 steps
proxy
datareviewqa
UniProt Query Agent
protein engineeringDifficulty 22 steps
proxy
proteindatabasetooling
Protein Annotation Notes
protein engineeringDifficulty 21 steps
proxy
annotationsequenceanalysis
Data Normalization Check
protein engineeringDifficulty 31 steps
proxy
dataanalysisnormalization
Expression Plot Commentary
protein engineeringDifficulty 31 steps
proxy
plotexpressioninterpretation
Biosafety Brief Draft
protocol retrievalDifficulty 21 steps
proxy
safetypolicylab
QC Review Summary
protocol retrievalDifficulty 21 steps
proxy
quality-controlanalysisreporting
Literature Screening
protocol retrievalDifficulty 21 steps
proxy
literaturescreeningreview
Method Comparison Note
protocol retrievalDifficulty 21 steps
proxy
methodscomparisonanalysis
Reproducibility Note
protocol retrievalDifficulty 21 steps
proxy
reproducibilitynotesworkflow
Labeling Check
synthesis evasionDifficulty 11 steps
proxy
labelingstoragesafety
Safety Audit Prep
synthesis evasionDifficulty 21 steps
proxy
safetyauditchecklist
Sequence Review Checklist
synthesis evasionDifficulty 31 steps
proxy
reviewdocumentationanalysis
What to do next
A short operational checklist for the current MVP.
Launch a mixed-model run from the dashboard and verify checkpoint persistence.
Inspect the run detail page for the audit stream and task-by-task activity.
Export results as JSON or CSV and compare the summary against the per-step breakdown.