Umid Suleymanov

PhD Student in Computer Science, Virginia Tech

I am a second-year PhD student in Computer Science at Virginia Tech. My research focuses on Trustworthy Machine Learning, specifically investigating LLM Security (jailbreak attacks/defenses), Privacy (membership inference, unlearning), and Few-Shot Learning for Network Intrusion Detection.

Previously, I was a Data Science Intern at Amazon (AWS) working on proactive security. I was also a two-time finalist at the International Data Analysis Olympiads, obtaining 16th place among 2187 teams.

News

March 2026
[Paper] Our paper "Auditing Fairness–Privacy Trade-offs: Subpopulation-Level Effects of Fairness-Enhancing Algorithms" was accepted at EuroS&P 2026.
Feb 2026
Invited as a Reviewer for ICML 2026!
Nov 2025
[Paper] Our paper "SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning" was accepted at AAAI 2026.
Aug 2025
[Paper] Our paper "SSPNet: Semi-Supervised Prototypical Networks for Few-Shot Network Intrusion Detection" was accepted at IEEE MILCOM 2025.
May 2025
Joined Amazon (AWS Proactive Security) as a Data Science Intern.
Apr 2025
Presented at annual CCI Student Researcher Showcase
Dec 2024
Presented at Commonwealth Cyber Initiative (CCI) Southwest Virginia Graduate Student Summit (SWVA)
Sep 2024
[Paper] Published research on "Deep Unlearning of Breast Cancer Histopathological Images" and "Contextualized Word Embeddings for Azerbaijani" at IEEE AICT 2024.
Aug 2024
Started my PhD in Computer Science at Virginia Tech.
Jun 2022
Promoted to Leading Data Scientist at E-Gov Development Center.
Nov 2019
Joined E-Gov Development Center as a Senior Data Scientist.
Oct 2018
Won 1st Place at the AzInTelecom Hackathon with a solution involving image processing and OCR.
Jul 2018
Started as a Data Engineer at E-Gov Development Center.

Selected Publications

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning

Ali Asgarov, Umid Suleymanov, Aadyant Khatri

AAAI Conference on Artificial Intelligence (AAAI) Bridge, 2026

SSPNet: Semi-Supervised Prototypical Networks for Few-Shot Network Intrusion Detection

Umid Suleymanov, et al.

IEEE Military Communications Conference (MILCOM), 2025

Deep Unlearning of Breast Cancer Histopathological Images

L. Aliyeva, N. Abdullayev, Umid Suleymanov, et al.

IEEE International Conference on Application of Information and Communication Technologies (AICT), 2024

Contextualized Word Embeddings for Azerbaijani

T. Alizada, Umid Suleymanov, Z. Rustamov

IEEE International Conference on Application of Information and Communication Technologies (AICT), 2024

Instance Segmentation of Handwritten Text on Historical Document Images Using Deep Learning Approaches

Umid Suleymanov, V. Huseynov, et al.

International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 2022

View all publications on Google Scholar →

Experience

Amazon (AWS Proactive Security)

May 2025 – Aug 2025

Data Science Intern

Developed statistical and ML models and performed analysis to process large-scale security data.
Built ETL pipelines, dashboards, and metrics to deliver actionable insights using Python, SQL, and AWS tools.

Virginia Tech

Aug 2024 – Present

Graduate Research Assistant

Published research on few-shot learning for Network Intrusion Detection, outperforming current models by 3.5%.
Developed policy-grounded RAG-based agentic defenses for detecting and mitigating LLM jailbreak attacks, surpassing state-of-the-art detection methods by 4%.
Conducting research on LLM privacy leakage, including membership inference and memorization analysis; designed agent-based reflective defenses that reduced privacy leakage by 35%.

ADA University

Jan 2023 – Aug 2024

Instructor of Computer and Information Sciences

Instructed courses on Intro to Big Data Engineering, Deep Learning, and Data & Information Engineering; designed comprehensive syllabi, instructional materials, and assessments using Blackboard LMS.
Conducted research at the Center for Data Analytics and Research, focusing on pre-training and efficient fine-tuning of Large Language Models (LLMs) for low-resource languages and Explainable AI (XAI).

E-Gov Development Center

Jun 2022 – Jan 2023

Leading Data Scientist

Led the design and deployment of end-to-end ML pipelines, integrating predictive models into production systems.
Developed pretrained language models and word embeddings for the Azerbaijani language, achieving a 4.3% performance improvement on downstream NLP tasks.
Built custom Elasticsearch text analyzers to enhance search relevance, which increased click-through rates by 6%.

E-Gov Development Center

Nov 2019 – Jun 2022

Senior Data Scientist

Built an automated document anonymization pipeline using custom Named Entity Recognition (NER), accelerating manual processing speed by 32%.
Designed predictive models to forecast customer wait times and optimize queue management, reducing related complaints by 16%.

E-Gov Development Center

Jul 2018 – Nov 2019

Data Engineer

Optimized data reliability and quality for large-scale queue and NLP datasets; implemented robust data acquisition, preprocessing, transformation, and storage routines.
Gathered complex business requirements and successfully translated them into actionable data science strategies.

Center for Data Analytics Research

Dec 2017 – May 2018

Research Assistant

Developed automated text classification systems for news article analysis, utilizing Machine Learning and Deep Learning algorithms; Published research findings on automated news classification in the IEEE AICT Conference.

Teaching

I have served as an instructor for the following courses:

Artificial Intelligence University Instructor
Deep Learning University Instructor
Intro to Big Data and Analytics University Instructor
Data and Information Engineering University Instructor
Instructor @ 30 Days of ML Organized by Google Developer Groups Baku

Invited Talks & Academic Service

ACM Transactions on Privacy and Security; AICT 2025; AICT 2024 Reviewer
Data, ERP and AI Summit Speaker: "Large Language Models: Reasoning and Security" (Oct 2025)
Elevate & Innovate, Azercell Speaker: "Artificial Intelligence and its Implications for Accessibility" (May 2023)
Azerbaijan Engineers Union Speaker: "Machine Learning and its Applications" (Mar 2023)
Intel International Science and Engineering Fair (Local Selection) Judge (CS Domain) (Feb 2023)
Bilim Baku Science Fair Judge (Sep 2022)
Google Developers Group Baku (Kaggle 30 Days of ML) Tutorial: "Intermediate Machine Learning" (Aug 2021)
Intl. Conf. on AI for Digital Governance (AI4DIGIGOV) Tutorial: "Applications of ML in Public Services" (Apr 2021)

Honors & Achievements

Finalist at International Data Analysis Olympiad
TensorFlow Developer Certificate
First Place, AzInTelecom Hackathon
Instructor at 30 Days of ML | Kaggle, Google (2021)
8+ peer-reviewed papers in AI/ML
Honorable Mention, ACM ICPC Contest

Projects

LLM Agents Reasoning

SIGMA: Agentic Math Reasoning

Designed a multi-agent framework that orchestrates specialized agents to perform targeted searches and synthesize findings. Outperforms open and closed-source models on MATH500 and GPQA benchmarks by 7.4%.

Paper

Network Security Few-Shot Learning

SSPNet: Few-Shot NIDS

Proposed a semi-supervised prototypical network for Network Intrusion Detection (NIDS). The model leverages pseudo-labels from high-confidence samples to improve detection of novel attacks with limited labeled data.

Paper

Agentic AI DevTools

Ducky: Agentic AI Code Assistant

Ducky is an interactive, autonomous agent interface that bridges the gap between conversational LLMs and an IDE. It can autonomously read, write, navigate, and modify files within a secure sandbox environment.

GitHub

Reasoning RAG

Search-Augmented Buffer of Thoughts

Search-Augmented Buffer of Thoughts (SA-BoT) is an agentic AI framework that merges dynamic web search with a "Buffer of Thoughts." It autonomously retrieves external info to solve problems, distills successful reasoning into reusable templates stored in a meta-memory.

GitHub

LLM Security RAG

Policy-Grounded RAG Defense

Developing policy-grounded RAG-based agentic defenses for detecting and mitigating LLM jailbreak attacks. The system retrieves safety policies dynamically to ground model responses against adversarial inputs.

(In Progress)

GenAI Security Privacy

Agentic Defenses for LLM Privacy

Developing an agentic feedback loop to identify private information leakage in Large Language Model outputs and creating privacy-preserving defenses to mitigate these risks.

(In-Progress)

Computer Vision Segmentation

Historical Document Segmentation

Developed instance segmentation pipelines (U-Net and Mask R-CNN) to digitize handwritten historical archives. U-Net with Watershed transformation outperformed two-stage architectures.

Paper

Healthcare AI Unlearning

Deep Unlearning for Histopathology

Applied influence-based unlearning techniques to DenseNet models for breast cancer classification (BreaKHis dataset), enabling data removal compliance while maintaining model utility.

Paper

Privacy Audit Fairness

Auditing Fairness-Privacy Trade-offs

Developed a unified empirical framework to audit how fairness-enhancing algorithms influence membership inference privacy risks at the subpopulation level, revealing disparate trade-offs between privacy, utility, and fairness.

(Under Review)

Machine Unlearning Metrics

Machine Unlearning Quality Score

A systematic metric that quantifies how closely an unlearning method mirrors a fully retrained model without requiring actual retraining.

(Under Review)

NLP Low-Resource Lang

Contextualized Embeddings (Az)

Trained RoBERTa and GPT-2 models on a 164M token Azerbaijani corpus. This work provided the first high-quality contextual embeddings for the language, significantly improving downstream NLP tasks.

Paper

NLP Text Classification

Online News Classification (AZ)

Developed an automated classification system for Azerbaijani news articles using Naive Bayes, SVM, and Neural Networks. Implemented stemming, stop-word removal, and feature reduction to enhance accuracy and performance.

Paper

NLP Sentiment Analysis

Sentiment Polarity Detection (AZ News)

Built sentiment analysis models for Azerbaijani social news articles using SVM, Random Forest, and Naive Bayes with TF-IDF and frequency-based BOW. Evaluated on a dataset of 30k manually labeled articles; SVM achieved top performance.

Paper

MLOps Docker

End-to-End MLOps Pipeline

Implemented a full ML lifecycle project using MLflow for experiment tracking and model registry, and Docker for containerization, following industry standard data science processes.

GitHub

Big Data Spark

Song Recommendation System

Built a scalable recommendation engine using Apache Spark and MLlib. The system utilizes implicit feedback from user history logs to predict and recommend new songs.

GitHub

Data Mining Analytics

Wikipedia Parser & Analyzer

Developed a heuristic-based parser to clean and analyze Azerbaijani Wikipedia data. Analyzed property distributions in templates and external reference usage.

GitHub