Umid Suleymanov

Umid Suleymanov

PhD Student in Computer Science, Virginia Tech

I am a second-year PhD student in Computer Science at Virginia Tech. My research focuses on Trustworthy Machine Learning, specifically investigating LLM Security (jailbreak attacks/defenses), Privacy (membership inference, unlearning), and Few-Shot Learning for Network Intrusion Detection.

Previously, I was a Data Science Intern at Amazon (AWS) working on proactive security. I was also a two-time finalist at the International Data Analysis Olympiads, obtaining 16th place among 2187 teams.


News


Selected Publications

Ali Asgarov, Umid Suleymanov, Aadyant Khatri
AAAI Conference on Artificial Intelligence (AAAI) Bridge, 2026
Umid Suleymanov, et al.
IEEE Military Communications Conference (MILCOM), 2025
L. Aliyeva, N. Abdullayev, Umid Suleymanov, et al.
IEEE International Conference on Application of Information and Communication Technologies (AICT), 2024
T. Alizada, Umid Suleymanov, Z. Rustamov
IEEE International Conference on Application of Information and Communication Technologies (AICT), 2024
Umid Suleymanov, V. Huseynov, et al.
International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 2022

View all publications on Google Scholar →


Experience

Amazon (AWS Proactive Security)

May 2025 – Aug 2025
Data Science Intern
  • Developed statistical and ML models and performed analysis to process large-scale security data.
  • Built ETL pipelines, dashboards, and metrics to deliver actionable insights using Python, SQL, and AWS tools.

Virginia Tech

Aug 2024 – Present
Graduate Research Assistant
  • Published research on few-shot learning for Network Intrusion Detection, outperforming current models by 3.5%.
  • Developed policy-grounded RAG-based agentic defenses for detecting and mitigating LLM jailbreak attacks, surpassing state-of-the-art detection methods by 4%.
  • Conducting research on LLM privacy leakage, including membership inference and memorization analysis; designed agent-based reflective defenses that reduced privacy leakage by 35%.

ADA University

Jan 2023 – Aug 2024
Instructor of Computer and Information Sciences
  • Instructed courses on Intro to Big Data Engineering, Deep Learning, and Data & Information Engineering; designed comprehensive syllabi, instructional materials, and assessments using Blackboard LMS.
  • Conducted research at the Center for Data Analytics and Research, focusing on pre-training and efficient fine-tuning of Large Language Models (LLMs) for low-resource languages and Explainable AI (XAI).

E-Gov Development Center

Jun 2022 – Jan 2023
Leading Data Scientist
  • Led the design and deployment of end-to-end ML pipelines, integrating predictive models into production systems.
  • Developed pretrained language models and word embeddings for the Azerbaijani language, achieving a 4.3% performance improvement on downstream NLP tasks.
  • Built custom Elasticsearch text analyzers to enhance search relevance, which increased click-through rates by 6%.

E-Gov Development Center

Nov 2019 – Jun 2022
Senior Data Scientist
  • Built an automated document anonymization pipeline using custom Named Entity Recognition (NER), accelerating manual processing speed by 32%.
  • Designed predictive models to forecast customer wait times and optimize queue management, reducing related complaints by 16%.

E-Gov Development Center

Jul 2018 – Nov 2019
Data Engineer
  • Optimized data reliability and quality for large-scale queue and NLP datasets; implemented robust data acquisition, preprocessing, transformation, and storage routines.
  • Gathered complex business requirements and successfully translated them into actionable data science strategies.

Center for Data Analytics Research

Dec 2017 – May 2018
Research Assistant
  • Developed automated text classification systems for news article analysis, utilizing Machine Learning and Deep Learning algorithms; Published research findings on automated news classification in the IEEE AICT Conference.

Teaching

I have served as an instructor for the following courses:


Invited Talks & Academic Service


Honors & Achievements


Projects

SIGMA Framework
LLM Agents Reasoning

SIGMA: Agentic Math Reasoning

Designed a multi-agent framework that orchestrates specialized agents to perform targeted searches and synthesize findings. Outperforms open and closed-source models on MATH500 and GPQA benchmarks by 7.4%.

SSPNet
Network Security Few-Shot Learning

SSPNet: Few-Shot NIDS

Proposed a semi-supervised prototypical network for Network Intrusion Detection (NIDS). The model leverages pseudo-labels from high-confidence samples to improve detection of novel attacks with limited labeled data.

Ducky Agent
Agentic AI DevTools

Ducky: Agentic AI Code Assistant

Ducky is an interactive, autonomous agent interface that bridges the gap between conversational LLMs and an IDE. It can autonomously read, write, navigate, and modify files within a secure sandbox environment.

Buffer of Thoughts
Reasoning RAG

Search-Augmented Buffer of Thoughts

Search-Augmented Buffer of Thoughts (SA-BoT) is an agentic AI framework that merges dynamic web search with a "Buffer of Thoughts." It autonomously retrieves external info to solve problems, distills successful reasoning into reusable templates stored in a meta-memory.

LLM Jailbreak Defense
LLM Security RAG

Policy-Grounded RAG Defense

Developing policy-grounded RAG-based agentic defenses for detecting and mitigating LLM jailbreak attacks. The system retrieves safety policies dynamically to ground model responses against adversarial inputs.

LLM Defense
GenAI Security Privacy

Agentic Defenses for LLM Privacy

Developing an agentic feedback loop to identify private information leakage in Large Language Model outputs and creating privacy-preserving defenses to mitigate these risks.

Document Segmentation
Computer Vision Segmentation

Historical Document Segmentation

Developed instance segmentation pipelines (U-Net and Mask R-CNN) to digitize handwritten historical archives. U-Net with Watershed transformation outperformed two-stage architectures.

Medical Image Unlearning
Healthcare AI Unlearning

Deep Unlearning for Histopathology

Applied influence-based unlearning techniques to DenseNet models for breast cancer classification (BreaKHis dataset), enabling data removal compliance while maintaining model utility.

Fairness Privacy Audit
Privacy Audit Fairness

Auditing Fairness-Privacy Trade-offs

Developed a unified empirical framework to audit how fairness-enhancing algorithms influence membership inference privacy risks at the subpopulation level, revealing disparate trade-offs between privacy, utility, and fairness.

Unlearning Quality Score
Machine Unlearning Metrics

Machine Unlearning Quality Score

A systematic metric that quantifies how closely an unlearning method mirrors a fully retrained model without requiring actual retraining.

Azerbaijani NLP
NLP Low-Resource Lang

Contextualized Embeddings (Az)

Trained RoBERTa and GPT-2 models on a 164M token Azerbaijani corpus. This work provided the first high-quality contextual embeddings for the language, significantly improving downstream NLP tasks.

Azerbaijani News Classification
NLP Text Classification

Online News Classification (AZ)

Developed an automated classification system for Azerbaijani news articles using Naive Bayes, SVM, and Neural Networks. Implemented stemming, stop-word removal, and feature reduction to enhance accuracy and performance.

Azerbaijani Sentiment Analysis
NLP Sentiment Analysis

Sentiment Polarity Detection (AZ News)

Built sentiment analysis models for Azerbaijani social news articles using SVM, Random Forest, and Naive Bayes with TF-IDF and frequency-based BOW. Evaluated on a dataset of 30k manually labeled articles; SVM achieved top performance.

MLOps
MLOps Docker

End-to-End MLOps Pipeline

Implemented a full ML lifecycle project using MLflow for experiment tracking and model registry, and Docker for containerization, following industry standard data science processes.

Spark
Big Data Spark

Song Recommendation System

Built a scalable recommendation engine using Apache Spark and MLlib. The system utilizes implicit feedback from user history logs to predict and recommend new songs.

Wiki Parser
Data Mining Analytics

Wikipedia Parser & Analyzer

Developed a heuristic-based parser to clean and analyze Azerbaijani Wikipedia data. Analyzed property distributions in templates and external reference usage.