Research — Abhishek Shekhar

Compression-as-clinical-context (MedCompress)

Document compression · structured summarization · LLM-adjacent decision support

Long medical documents are expensive to read and harder to reason over inside LLM pipelines. MedCompress explores structured, lossy-but-faithful compression of clinical text into representations that downstream decision-support tools can actually use — without pretending to replace clinician judgment.

Independent project, 2026.

Unsupervised + supervised pipelines for atmospheric data

Self-Organizing Maps · ANN · CNN · global classification

Combining Self-Organizing Maps for unsupervised regional clustering of surface temperature, precipitation, and pressure readings with ANN and CNN classifiers to highlight regions with the strongest climate change signal. Evaluated with accuracy, F1, and confusion matrices, visualized on global maps.

Beloit College coursework, Jan–May 2023.

RNN vs. CNN on clinical tabular data

UCI Heart Failure Clinical Records · binary classification · K-Means · SOM

Comparing recurrent and convolutional architectures on a clinical tabular dataset, with K-Means and SOM clustering used to surface natural patient groupings before supervised training. Reported with confusion matrices, precision, recall, and AUC.

Independent project, 2023.

Image captioning with CNN encoders + LSTM decoders

VGG16 · LSTM · CUDA tuning · BLEU

End-to-end pipeline on Flickr 30k: VGG16 feature extraction, LSTM-based caption generation, BLEU evaluation, and CUDA-level tuning (thread block size, shared memory, kernel configuration) for efficient batch training. Presented at the 47th Annual Beloit Student Symposium.

Beloit College, Aug–Dec 2023.

Fake news detection with classical NLP baselines

TF-IDF · Logistic Regression · Naive Bayes · Random Forest

A text classification pipeline on the Kaggle Fake News dataset using tokenization, stopword removal, stemming, and TF-IDF vectorization. Trained and compared three classical supervised classifiers, evaluated with accuracy, precision, recall, and F1 across a held-out test set, with the goal of establishing strong non-neural baselines before reaching for transformer-scale models.

Independent project, 2023.

Research interests

What I'm actively exploring

Interested in collaborating?