About Me
Hi, I'm Junhoo Lee. I am a Ph.D. candidate at Seoul National University (MIPAL), advised by Prof. Nojun Kwak.
My research studies how foundation models acquire reusable structure, and how that structure can be controlled, adapted, and diagnosed across language, vision, generative modeling, and embodied interfaces.
I work on inference-time planning for discrete diffusion language models, fast adaptation and task generalization in meta-learning, and model diagnosis for pretrained deep networks. Recently, I have been extending this line toward vision-language-action and embodied foundation models, where pretrained semantic structure must remain reliable under changing instructions, visual observations, and action interfaces.
I am always open to discussing new ideas and potential collaborations. Feel free to reach out to me via email at mrjunoo@snu.ac.kr.
Education
News
- [Jul 5, 2026]I will be at ACL 2026 in San Diego, presenting our long paper "Unlocking the Potential of Diffusion Language Models through Template Infilling" as an oral presentation!
- [Jun 5, 2026]I will be at CVPR 2026 in Denver, presenting our "CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models" paper!
- [May 2026]I was selected as an ICML 2026 Gold Reviewer, recognizing top reviewers for this year's conference.
- [Apr 4, 2026]Our paper "Unlocking the Potential of Diffusion Language Models through Template Infilling" is accepted to ACL 2026 as a long paper (oral presentation)!
- [Feb 20, 2026]Our paper "CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models" is accepted to CVPR 2026!
- [Dec 2025]I will be at NeurIPS 2025 in San Diego, presenting our "Deep Edge Filter" paper!
- [Oct 2025]New preprint "Unlocking the Potential of Diffusion Language Models through Template Infilling" is now on arXiv!
- [Sep 2025]Our paper "Deep Edge Filter" is accepted to NeurIPS 2025! (co-first author)
- [Jul 2025]Our paper "What's Making That Sound Right Now? Video-centric Audio-Visual Localization" is accepted to ICCV 2025!
- [Jun 2025]Our paper "The Role of Teacher Calibration in Knowledge Distillation" is published in IEEE Access!
- [Dec 2024]I will be at NeurIPS 2024 in Vancouver, presenting our "Deep Support Vectors" paper!
- [Sep 2024]Our paper "Deep Support Vectors" is accepted to NeurIPS 2024!
- [Jul 2024]Our paper "Practical Dataset Distillation Based on Deep Support Vectors" is presented at ECCV 2024 Workshop!
- [Apr 2024]Two workshop papers accepted to CVPR 2024 — "Do Not Think About Pink Elephant!" (co-first) and "Coreset Selection for Object Detection"!
- [Feb 2024]I will be at AAAI 2024 in Vancouver, presenting our "Any-Way Meta Learning" paper!
- [Dec 2023]Our paper "Any-Way Meta Learning" is accepted to AAAI 2024!
- [Dec 2023]I will be at NeurIPS 2023 in New Orleans, presenting our SHOT paper!
- [Sep 2023]Our paper "SHOT: Suppressing the Hessian along the Optimization Trajectory" is accepted to NeurIPS 2023!
Publications
View Google ScholarMain Conference
Unlocking the Potential of Diffusion Language Models through Template Infilling
Annual Meeting of the Association for Computational Linguistics
Unlike autoregressive LMs, diffusion LMs work better with template-then-fill rather than sequential prompting.
CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models
The IEEE/CVF Conference on Computer Vision and Pattern Recognition
A problem-first project page for black-box lineage attribution of fine-tuned text-to-image APIs using compositional semantic fingerprints.
Deep Edge Filter †
The Conference on Neural Information Processing Systems
Just as humans perceive edges (high-frequency) as core components, deep features in neural networks exhibit the same tendency.
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
The IEEE/CVF International Conference on Computer Vision
Video-centric audio-visual localization benchmark (AVATAR) with temporal dynamics.
Deep Support Vectors
The Conference on Neural Information Processing Systems
Deep learning has support vectors just like SVMs.
Any-Way Meta Learning
The AAAI Conference on Artificial Intelligence
Breaking fixed N-way constraint in meta-learning by exploiting label equivalence from episodic task sampling.
SHOT: Suppressing the Hessian along the Optimization Trajectory
The Conference on Neural Information Processing Systems
The key to meta-learning adaptation is flattening the learning trajectory.
Workshop
Do Not Think About Pink Elephant! †
The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
First discovery that negation doesn't work in large models — telling them not to generate something makes them generate it.
Coreset Selection for Object Detection
The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Efficient coreset selection method specifically designed for object detection tasks.
Practical Dataset Distillation Based on Deep Support Vectors
The European Conference on Computer Vision Workshops
Applying DeepKKT loss for dataset distillation when only partial data is accessible.
Journal
The Role of Teacher Calibration in Knowledge Distillation
Teacher's calibration error strongly correlates with student accuracy — well-calibrated teachers transfer knowledge better.