Scott Askinosie

AI Engineer & Open Source Contributor

Research scientist and AI engineer specializing in machine learning and Gen AI solutions for multimodal data (video, audio, text). Practical experience developing and operationalizing ML models for content understanding, recommendation systems, and RAG pipelines at scale. Active open-source contributor to Weaviate with expertise in supervised, unsupervised, and deep learning methods. Former educator and researcher (Ph.D. Quantitative Biology) with proven ability to collaborate with cross-functional stakeholders, design innovative ML systems, and deliver scalable AI solutions from experimentation to production deployment. Deep Python expertise with modern ML tooling and evaluation frameworks including LLM-as-a-judge methodologies.


skills

AI & Vector Database Systems
  • Python Expertise: Pydantic, FastAPI, modern tooling (uv, ruff, mypy), async programming
  • Vector Databases: Weaviate, semantic search, hybrid search, multi-tenant architectures, quantization, RBAC
  • RAG Systems: LlamaIndex, LangChain, retrieval-augmented generation pipelines, document processing
  • AI Agent Development: Pydantic AI, CrewAI, autonomous agents, tool-calling frameworks, evaluation benchmarks
  • LLM Integration: OpenAI GPT SDK, Anthropic Claude SDK, Google Gemini SDK, Meta Llama SDK, Hugging Face
  • Local LLMs: Ollama, vLLM, OpenAI SDK compatibility for on-prem deployments
  • Model Context Protocol (MCP): Building custom MCP servers and Claude integrations
  • NLP & Machine Learning: Neural Networks, Time Series Analysis, Transformer-based models

Programming Languages & Tools
  • Data & Analytics: Pandas, Numpy, Scikit-learn, SQL, PySpark, Streamlit
  • Machine Learning: TensorFlow, Scikit-Learn, Neural Networks, NLP libraries
  • Cloud Platforms: AWS, Microsoft Azure (Prompt Flow, Cosmos DB), Google Cloud Platform (GCP)
  • Development Tools: Git/GitHub, Docker, JupyterLab, VS Code, Google Colab, BigQuery
  • Containerization & Deployment: Docker, microservices architecture, async services

Statistical Methods, Machine Learning and Data Visualization
  • Data Collection, Sampling, Hypothesis Testing, Confidence Intervals, Probabilities, Sampling, Experimental Design, A|B Testing
  • Regression, Classification, Feature Engineering, Natural Language Processing/NLP, Neural Networks (sequential, CNN sequential)
  • Plotly, Matplotlib, Seaborn, GraphPad Prism, FIJI, Lyca InSight

portfolio

E-commerce Collaborative Filtering Recommendation System


Collaborative Filtering Recommendation System for Askinosie Chocolate Retail and Wholesale

Developed a production collaborative filtering recommendation system using 13 years of e-commerce transaction data from retail and wholesale shopping carts. Built end-to-end ML pipeline including data collection from legacy systems, feature engineering from user purchase behavior, and model development for personalized product recommendations. System operationalized for both retail and wholesale e-commerce platforms, directly contributing to revenue growth (research shows product recommendations drive up to 31% of e-commerce revenues). Demonstrates practical experience in unsupervised ML methods, scalable recommendation algorithms, and production ML deployment.

Technologies: Python, Collaborative Filtering, scikit-learn, Pandas, E-commerce Analytics


Presentation
View Code
2023

Building AI Agents for Vector Database Search


Build production-ready AI agents with Pydantic AI and Weaviate

Workshop presented at Data Science Dojo Agents Conference (2025). Complete tutorial on building type-safe agentic systems with query optimization, dynamic filtering, and semantic search. Demonstrates how to create production-ready agents that leverage vector databases for intelligent information retrieval. This hands-on workshop covers agent architecture, Pydantic AI framework integration, and best practices for building reliable AI systems.

Technologies: Pydantic AI, Weaviate, OpenAI, Python, Vector Search


Workshop Materials & Code
November 2025

End-to-End RAG with LlamaIndex


Production RAG pipeline for domain-specific literature

Presented at CalibrateAI Conference (2025). Demonstrates hierarchical document parsing with LlamaIndex and vector search with Weaviate for space medicine research. This project showcases a complete RAG (Retrieval-Augmented Generation) system that ingests scientific literature, chunks documents intelligently, creates embeddings, and enables natural language queries over specialized domain knowledge. Perfect example of production-grade RAG architecture.

Technologies: LlamaIndex, Weaviate, OpenAI, Python, Document Processing


View Project & Notebook
November 2025

Weaviate Claude Skills


Modular knowledge toolkit for Claude AI and Weaviate integration

Open-source Claude Skills framework providing five self-contained knowledge modules that enable Claude to manage local Weaviate vector databases. Skills include local setup, collection management, data ingestion (JSON, CSV, images), and semantic search with RAG capabilities. Demonstrates progressive disclosure design pattern where Claude can load specific capabilities on-demand. This toolkit empowers Claude users to leverage powerful vector search capabilities with no deployment required, making vector databases accessible through conversational AI.

Technologies: Claude Skills Framework, Weaviate, Docker, Python, Semantic Search


Code & Documentation
November 2025

Pandas DataFrame Agent


Ask your data questions with an AI Agent

Presented at ML OPS and Generative AI World Summit (2025). Enables non-technical users to perform exploratory data analysis through natural language. This AI-powered tool leverages LangChain's Pandas DataFrame Agent to allow anyone to analyze CSV data, create visualizations, and extract insights by simply asking questions. Demonstrates the power of making data science accessible through conversational AI interfaces.

Technologies: LangChain, Pandas, Streamlit, OpenAI


Try the Live App
View Code
October 2024

Vision-Enhanced Agentic RAG System for Technical Documentation


Multimodal Gen AI for visual and textual data understanding

Engineered an innovative multimodal RAG pipeline applying Gen AI solutions to visual and textual data sources. Developed system using GPT-4 Vision to automatically extract and interpret technical content from charts, diagrams, and maps within engineering documentation. Built production-ready async microservice architecture with Model Context Protocol integration, demonstrating practical experience operationalizing ML/AI solutions for content understanding and metadata extraction. System enables scalable document processing and intelligent content discovery workflows.

Technologies: GPT-4 Vision, Weaviate Query Agent, Model Context Protocol, Vector Embeddings, Async Architecture

2025

Speaking & Workshops

Building Production-Ready Pydantic Agents with Vector Databases

Data Science Dojo Agents Conference

Hands-on workshop teaching developers how to build type-safe AI agents using Pydantic AI framework integrated with Weaviate vector database. Covered agent architecture, query optimization, and production best practices.

November 2025

End-to-End RAG System with LlamaIndex

CalibrateAI Conference

Demonstrated building production RAG pipeline for space medicine research, covering document ingestion, hierarchical parsing, vector storage, and query optimization.

November 2025

Pandas DataFrame Agent for Data Analysis

ML OPS and Generative AI World Summit

Presented on making data science accessible through conversational AI, enabling non-technical users to perform exploratory data analysis using natural language.

October 2024

Open Source Contributions

Weaviate Ecosystem

Active Contributor

Contributing to the Weaviate ecosystem (27K+ stars on GitHub) through integration examples, documentation, workshop materials, and community education. Regular presenter at community Hack Nights and technical workshops.


View GitHub Profile
2025

experience

Lead Machine Learning Engineer & Technical Architect

Weaviate
  • Designed and evaluated multimodal content understanding systems leveraging GPT-4 Vision and vector embeddings to automatically extract and interpret technical metadata from text, diagrams, charts, and maps within engineering documentation, enabling AI-assisted content discovery workflows
  • Developed quality evaluation frameworks for RAG systems using LLM-as-a-judge methodologies to assess retrieval accuracy, content relevance, and metadata extraction quality across 500+ technical documents
  • Built and deployed production-ready agentic frameworks integrating LlamaIndex hierarchical parsing, vector search, and Model Context Protocol patterns, demonstrating scalable architecture patterns adopted by ecosystem partners
  • Collaborated with cross-functional product and engineering teams to design learning paths and documentation that reduced developer time-to-first-value by 40%, directly supporting feature launches and developer adoption
  • February 2024 - Present

    Lead AI Developer & Machine Learning Engineer

    Portions Master
  • Developed and deployed computer vision models for real-time food identification and volume estimation, transitioning from third-party to in-house ML solutions and delivering $2M+ in cost savings
  • Architected multimodal AI pipeline integrating OpenAI Vision API with production mobile application, processing 10K+ daily image classifications with 94% accuracy
  • Built semantic search system using Azure Prompt Flow by vectorizing Cosmos DB data, enabling context-aware query responses and improving user satisfaction scores by 28%
  • October 2022 - February 2024

    Data Scientist & Machine Learning Researcher

    Western Governors University
  • Developed transformer-based NLP classification systems to categorize student feedback and dropout reasons from unstructured text, identifying 15+ distinct categories that informed targeted retention interventions and reduced student drops by 4x
  • Built production semantic classification models using topic modeling and BERT-based approaches to analyze Net Promoter Score responses, achieving 89% classification accuracy across 5 critical dissatisfaction areas
  • Conducted large-scale quantitative analysis on 487,826 student-term records demonstrating communication strategy impacts, finding call-heavy approaches increased promoter rates by 7.2 percentage points and directly informing institutional policy changes
  • Designed A/B tests and statistical analyses to evaluate intervention effectiveness, collaborating with non-technical stakeholders to translate findings into actionable strategies
  • August 2017 - October 2023

    education

    Harvard Business School

    Leadership, Ethics and Corporate Accountability Pilot

    General Assembly

    Data Science Immersive
    480+ hour immersive data science program

    University of Missouri

    Ph.D. Quantitative Biology
    Thesis Project: The role blue light photoreceptor phototropin 1 relocation plays in phototropic signaling and how photoreceptors can improve solar technologies and crop yields.

    Broader Impact: Phototropism results in efficient harnessing of solar energy and water acquisition in plants. Plants with enhanced phototropism exhibit greater overall fitness. Understanding the mechanism of phototropism will assist in engineering crops that are more drought resistant with higher biomass and greater food production.

    Dissertation
    Publication in Plant Cell

    Missouri State University

    Master of Science - Biomedical Science and Molecular Biology
    Thesis Project: Effects of low frequency electromagnetic fields on telomeric regions of mammalian DNA.

    Broader impact: Understanding the effects that low frequency electromagnetic fields (EMFs), like those generated by cell phones, have on mammalian DNA. Low frequency EMFs appear to interact with hydrogen bonding within proteins stabilizing or disrupting interactions and protein stability. Preliminary results suggest an interaction between EMFs and telomeric maintenance.

    Dissertation

    Missouri State University

    Bachelor of Science - Biomedical Science
    Undergraduate Research: Computational modeling of molecular interactions.

    Broader impact: Modeling complex molecular interactions in silico allows for accurate prediction of molecular interaction in vivo e.g., drug-receptor affinity. Modeling can significantly reduce costs associated with discovery phase wet lab research.

    Publication in the Journal of Biomedical Nanotechnology

    interests

    • Music- playing guitar, bass guitar, clarinet, bass clarinet
    • Technology- I enjoy building computers and networks. I have volunteered at a non-profit in Austin, TX for the last 6 years where I act as Enterprise Architect. I also work with their clients to teach them basic computer skills.
    • Teaching- I have created dozens of videos on Youtube and Panopto that teach advanced science concepts like virology, genetics, thermonuclear fusion in star life cycle, cellular respiration, photsynthesis and fun at home experiments.
    • Mentorship- I grew up in one of the most blighted regions of the United States near the Fort Belknap and Hays Lodgpole Reservations in northeastern Montana. As a first generation high school and college grad, I understand the obstacles faced by underrepresented and underserved individuals when it comes to higher education. I have visited schools in small communities accross Montana and Missouri and mentored young adults to help them know what opportunities are avilable to them, help them build confidence and overcome imposter syndrome.