nvidia · may 2026seattle, wa
Javon Kitson
  now — ai/ml hpc cluster engineer @ nvidia  researching — diffusion models for topology optimization  studying — financial engineering & international relations  building — deep rl trading systems  optimizing — distributed training pipelines at scale  growing — vegetables & oyster mushrooms  listening — snarky puppy, on repeat  now — ai/ml hpc cluster engineer @ nvidia  researching — diffusion models for topology optimization  studying — financial engineering & international relations  building — deep rl trading systems  optimizing — distributed training pipelines at scale  growing — vegetables & oyster mushrooms  listening — snarky puppy, on repeat
§ 01 about

An engineer building at the intersection of ML infrastructure and research.

./about.md

Hi, I'm Javon. I'm an AI/ML infrastructure engineer, recently relocated from the Washington DC metro to Seattle. I studied Computer Science with a Biomedical Physics minor at Loyola University Maryland (2020) and earned my Master's in Applied Artificial Intelligence from the University of San Diego (2024).

My work sits at the intersection of machine learning and high-performance infrastructure. For the past five years at BlueHalo, AeroVironment, and now NVIDIA, I've architected GPU clusters for 100+ researchers, tuning SLURM schedulers, scaling distributed training, and building systems that enable large-scale ML experimentation. I also pursue independent research projects in generative models and deep reinforcement learning.

Outside of work, I research value investing strategies, grow vegetables and oyster mushrooms, and I'm a huge fan of Snarky Puppy.

§ 02 experience

Five years of GPU clusters, distributed training, and the plumbing in between.

./career.log
May 2026 — Present

NVIDIAincoming

AI/ML HPC Cluster Engineer
    Seattle, WA
    May 2025 — May 2026

    AeroVironment

    Research Engineer III
    • Architect and lead administrator for an AI/ML GPU cluster serving 100+ researchers; tuned SLURM scheduling and quotas to sustain 95%+ utilization across distributed workloads.
    • Engineered multi-node training pipelines and a reproducible YOLOX benchmarking suite across 48 A6000 GPUs, enabling data-driven capacity planning.
    • Built an enterprise dataset/model distribution system managing 500TB+ of training data — onboarding shrank from weeks to days.
    Germantown, MD
    Feb 2023 — May 2025

    BlueHalo

    Research Engineer III
    • Administered a 48-GPU AI/ML cluster for 100+ researchers; deployed Prometheus/Grafana observability and tuned SLURM to maintain 30–100% utilization based on workload demand.
    • Developed standardized distributed-training templates adopted by 10+ teams, cutting experiment setup time from days to hours.
    • Fine-tuned and evaluated deep-learning models with MLflow tracking, then containerized and shipped to clients.
    Germantown, MD
    Jul 2020 — Feb 2023

    Intelligent Automation Inc.

    Software Engineer I/II
    • Delivered full-stack features in React + Python (FastAPI) — UI components, REST endpoints, and data contracts used across multiple internal teams.
    • Built GitLab CI pipelines with unit/integration/e2e tests, linters, and static security scans to improve release reliability.
    Rockville, MD
    § 03 selected work

    Things I've built — six recent.

    title / descriptionstackyearopen
    01

    TopoDiff

    Conditional latent diffusion model for topology optimization — generates optimal 3D material distributions in ~500ms, replacing 50-200 iterative FEM solves with a single forward pass. [coming soon]
    PythonPyTorchDiffusersVAE
    2025
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2025
    stack
    Python · PyTorch · Diffusers · VAE
    02

    Simple Neural Architecture Search

    Framework for automated neural network design — searches the architecture space to find performant topologies without hand-tuning.
    PythonTensorFlowPyTorchStreamlit
    2024
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2024
    stack
    Python · TensorFlow · PyTorch · Streamlit
    03

    Deep RL Stock Trading

    Ensemble trading system combining PPO/SAC agents, genetic algorithm portfolio selection, and FNN price prediction — achieved +89% ROI in backtests.
    PythonPyTorchQuantConnectPandas
    2024
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2024
    stack
    Python · PyTorch · QuantConnect · Pandas
    04

    Temporal Hierarchical Clustering

    Financial-instrument relationship analysis over time — clusters tickers by behavior and tracks how groupings evolve through market regimes.
    PythonScikit-learnPandasMatplotlib
    2023
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2023
    stack
    Python · Scikit-learn · Pandas · Matplotlib
    05

    Geoestimation

    Satellite-imagery building-footprint extraction — semantic segmentation pipeline that turns orthophotos into clean polygon outputs.
    PythonTensorFlowEfficientNetUNet
    2022
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2022
    stack
    Python · TensorFlow · EfficientNet · UNet
    06

    javonkitson.com

    The site you're currently looking at.
    TypeScriptReactNext.jsCSS
    2026
    readme: a more thorough writeup lives on github.
    tap any other row to switch/collapse.
    year
    2026
    stack
    TypeScript · React · Next.js · CSS