Compression-as-clinical-context (MedCompress)
Document compression · structured summarization · LLM-adjacent decision support
Long medical documents are expensive to read and harder to reason over inside LLM pipelines. MedCompress explores structured, lossy-but-faithful compression of clinical text into representations that downstream decision-support tools can actually use — without pretending to replace clinician judgment.
Independent project, 2026.
Unsupervised + supervised pipelines for atmospheric data
Self-Organizing Maps · ANN · CNN · global classification
Combining Self-Organizing Maps for unsupervised regional clustering of surface temperature, precipitation, and pressure readings with ANN and CNN classifiers to highlight regions with the strongest climate change signal. Evaluated with accuracy, F1, and confusion matrices, visualized on global maps.
Beloit College coursework, Jan–May 2023.
RNN vs. CNN on clinical tabular data
UCI Heart Failure Clinical Records · binary classification · K-Means · SOM
Comparing recurrent and convolutional architectures on a clinical tabular dataset, with K-Means and SOM clustering used to surface natural patient groupings before supervised training. Reported with confusion matrices, precision, recall, and AUC.
Independent project, 2023.
Image captioning with CNN encoders + LSTM decoders
VGG16 · LSTM · CUDA tuning · BLEU
End-to-end pipeline on Flickr 30k: VGG16 feature extraction, LSTM-based caption generation, BLEU evaluation, and CUDA-level tuning (thread block size, shared memory, kernel configuration) for efficient batch training. Presented at the 47th Annual Beloit Student Symposium.
Beloit College, Aug–Dec 2023.
Fake news detection with classical NLP baselines
TF-IDF · Logistic Regression · Naive Bayes · Random Forest
A text classification pipeline on the Kaggle Fake News dataset using tokenization, stopword removal, stemming, and TF-IDF vectorization. Trained and compared three classical supervised classifiers, evaluated with accuracy, precision, recall, and F1 across a held-out test set, with the goal of establishing strong non-neural baselines before reaching for transformer-scale models.
Independent project, 2023.