Computer vision benchmark
ShapeCodeBench
A synthetic benchmark for evaluating whether multimodal models can reconstruct executable drawing programs from rendered geometric scenes.
- Designed a deterministic image-to-program task with a small Python-like drawing DSL and render-based scoring.
- Released a frozen eval_v1 split, baseline model results, paper source, and reproducibility artifacts.
- Published the project as an arXiv preprint with a permanent Zenodo archive.
PythonComputer visionProgram synthesisBenchmarkingarXiv
Multimodal benchmarking
VLMBench / multimodal model evaluation
An evaluation tool for comparing multimodal language models with a consistent prediction-to-ground-truth methodology.
- Evaluated model outputs across image and text tasks using repeatable scoring flows.
- Built around provider APIs, cloud storage, structured samples, and result analysis.
- Focused on making model comparisons auditable and easier for teams to reason about.
PythonGCPOpenAI APIGemini APIEvaluation design
Computer vision and applied ML
DSM reconstruction ML system
A patented machine-learning system for generating Digital Surface Models from imagery, reducing reliance on external elevation sources.
- Built data collection and preprocessing workflows for large-scale model training.
- Worked across computer vision modeling, evaluation, and production integration.
- Extended operational coverage for geospatial workflows where elevation data was limited.
PythonPyTorchOpenCVGCPComputer vision