MIIL - Publications

Publications

2026

All-St3R SLAM: Real-time RGB Dynamic SLAM with Pixel-wise Motion Masking
Yewon Kim, Paul Hongsuck Seo, Sang Chul Ahn
In IROS 2026

Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision
Dohyun Kim, Seungwoo Lyu, Seung Wook Kim, Paul Hongsuck Seo
In ECCV 2026

GOAT: A Training Framework for Goal-Oriented Agent with Tools
Hyunji Min, Sangwon Jung, Junyoung Sung, Dosung Lee, Leekyeung Han, Paul Hongsuck Seo
In ACL 2026 (Findings)

CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo
In CVPR 2026

Robust Image Self-Recovery against Tampering using Watermark Generation with Pixel Shuffling
Minyoung Kim, Paul Hongsuck Seo
In CVPR 2026 (Findings)

2025

Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim, Heeseong Shin, Heeji Yoon, Eunbeen Hong, Anurag Arnab, Paul Hongsuck Seo, +Sunghwan Hong, +Seungryong Kim (+ corresponding authors)
In NeurIPS 2025

ReTAG: Retrieval-Enhanced, Topic-Augmented Graph-Based Global Sensemaking
Boyoung Kim, Dosung Lee, Sumin An, Jinseong Jeong, Paul Hongsuck Seo
In EMNLP 2025 (Findings)

DialNav: Multi-turn Dialog Navigation with a Remote Guide
Leekyeung Han, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, Paul Hongsuck Seo
In ICCV 2025

Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries
Minyoung Kim, Sehwan Park, Sungmin Cha, Paul Hongsuck Seo
In Interspeech 2025

DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization oral
*Geonyoung Lee, *Geonhee Han, Paul Hongsuck Seo (* equal contribution)
In Interspeech 2025

Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models
Seung-jae Lee, Paul Hongsuck Seo
In Interspeech 2025

ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
*Dosung Lee, *Wonjun Oh, Boyoung Kim, Minyoung Kim, +Joonsuk Park, +Paul Hongsuck Seo (* equal contribution, + corresponding authors)
In ACL 2025

Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
*Dohyun Kim, *Sehwan Park, Geonhee Han, Seung Wook Kim, Paul Hongsuck Seo (* equal contribution)
In CVPR 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs oral
Sumin An, Junyoung Sung, Wonpyo Park, +Chanjun Park, +Paul Hongsuck Seo (+ corresponding authors)
In NAACL 2025

Multi-Granularity Video Object Segmentation
*Sangbeom Lim, *Seongchan Kim, *Seungjun An, Seokju Cho, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In AAAI 2025

2024

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation spotlight
Seong Hyeon Park, Huiwon Jang, Byungwoo Jeon, Sukmin Yun, Paul Hongsuck Seo, Jinwoo Shin
In NeurIPS 2024

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In NeurIPS 2024

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, +Paul Hongsuck Seo, +Jeany Son (+ corresponding authors)
In ECCV 2024

CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation highlight
*Seokju Cho, *Heeseong Shin, Sunghwan Hong, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (* equal contribution; + corresponding authors)
In CVPR 2024

Learning Correlation Structures for Vision Transformers
Manjin Kim, +Paul Hongsuck Seo, Cordelia Schmid, +Minsu Cho (+ corresponding authors)
In CVPR 2024

2023

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2023

IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin
In CVPR 2023

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid
In CVPR 2023

Zero-shot Referring Image Segmentation with Global-Local Context Features
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
In CVPR 2023

2022

Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
In ECCV 2022

AVATAR: Unconstrained Audiovisual Speech Recognition oral
*Valentin Gabeur, *Paul Hongsuck Seo, *Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid (* equal contribution)
In Interspeech 2022

End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
In CVPR 2022

2021

Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2021

2020

Reinforcing an Image Caption Generator by Off-line Human Feedback oral
Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut
In AAAI 2020

2019

Combinatorial Inference against Label Noise
Paul Hongsuck Seo, Geeho Kim, Bohyung Han
In NeurIPS 2019

Regularizing Neural Networks via Stochastic Branch Layers oral
*Wonpyo Park, *Paul Hongsuck Seo, Bohyung Han, Minsu Cho (* equal contribution)
In ACML 2019

Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences
*Seonguk Seo, *Paul Hongsuck Seo, Bohyung Han (* equal contribution)
In CVPR 2019

2018

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps
Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han
In ECCV 2018

Attentive Semantic Alignment with Offset-Aware Correlation Kernels
Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho
In ECCV 2018

Progressive Attention Networks for Visual Attribute Prediction
Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
In BMVC 2018

2017

Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
In NIPS 2017

MarioQA: Answering Questions by Watching Gameplay Videos
*Jonghwan Mun, *Paul Hongsuck Seo, Ilchae Jung, Bohyung Han (* equal contribution)
In ICCV 2017

2016

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction oral
Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han
In CVPR 2016

A Corpus for a Multimodal Dialog System for Presentation Controls
Paul Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the International Workshop Series on Multimodal Corpora (MMC 2016)

2015

Conversational Knowledge Teaching Agent that Uses a Knowledge Base
Kyusong Lee, Paul Hongsuck Seo, Junhwi Choi, Sangjun Koo, Gary Geunbae Lee
In SIGDIAL 2015

2014

Grammatical Error Correction based on Learner Comprehension Model in Oral Conversation
Kyusong Lee, Seonghan Ryu, Paul Hongsuck Seo, Seokhwan Kim, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2014)

2012

Generating Grammar Questions using Corpus Data in L2 Learning
Kyusong Lee, Soo-ok Kweon, Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2012)

A Meta-Learning Approach to Grammatical Error Correction
Hongsuck Seo, Jonghoon Lee, Seokhwan Kim, Kyusong Lee, Sechun Kang, Gary Geunbae Lee
In ACL 2012

Grammatical Error Annotation for Korean Learners of Spoken English
Hongsuck Seo, Kyusong Lee, Gary Geunbae Lee, Soo-ok Kweon
In LREC 2012