Shao-Hua Sun

Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning

Chao-Chung Wu , Zhi Rui Tam , Chieh-Yen Lin , Yun-Nung Chen , Shao-Hua Sun , Hung-Yi Lee

in Neural Information Processing Systems (NeurIPS) 2025

This paper presents a systematic analysis revealing that fine-tuning with LLM-generated data not only improves target task performance but also reduces non-target task degradation compared to fine-tuning with ground truth data. Through analyzing the data sequence in tasks of various domains, we demonstrate that this enhancement of non-target task robustness stems from the reduction of high perplexity tokens found in LLM-generated sequences. To the best of our knowledge, this is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning, offering valuable insights for developing more robust fine-tuning strategies.

Implicit State Estimation via Video Replanning

Po-Chen Ko , Jiayuan Mao , Yu-Hsiang Fu , Hsien-Jeng Yeh , Chu-Rong Chen , Wei-Chiu Ma , Yilun Du , Shao-Hua Sun

in Building Physically Plausible World Models workshop at International Conference on Machine Learning (ICML) 2025 (Best paper)

We introduce a novel framework that integrates interaction-time data into the video planning process. Our approach updates model parameters online and filters out previously failed plans during generation. This enables implicit state estimation, allowing the system to adapt dynamically without explicitly modeling unknown state variables. We evaluate our framework through extensive experiments on a new simulated manipulation benchmark, demonstrating its ability to improve replanning performance and advance the field of video-based decision-making.

[ Paper ] [ Bibtex ]

Action-Constrained Imitation Learning

Chia-Han Yeh * , Tse-Sheng Nan * , Risto Vuorio , Wei Hung , Hung Yen Wu , Shao-Hua Sun , Ping-Chun Hsieh

in International Conference on Machine Learning (ICML) 2025

We study a new problem setting termed Action-Constrained Imitation Learning (ACIL), where an action-constrained imitator aims to learn from a demonstrative expert with larger action space. We tackle the mismatch of occupancy measure between the expert and the imitator through trajectory alignment and propose DTWIL, which replaces the original expert demonstrations with a surrogate dataset that follows similar state trajectories while adhering to the action constraints. Specifically, we recast trajectory alignment as a planning problem and solve it via model predictive control, which aligns the surrogate trajectories with the expert trajectories based on the dynamic time warping distance.

[ Paper ] [ Bibtex ]

SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas

Yu-Kai Hung , Yun-Chien Huang , Ting-Yu Su , Yen-Ting Lin , Lung-Pan Cheng , Bryan Wang † , Shao-Hua Sun †

in International Conference on Intelligent User Interfaces (IUI) 2025

We introduce SimTube, a generative AI system designed to simulate audience feedback in the form of video comments before a video’s release. SimTube features a computational pipeline that integrates multimodal data from the video—such as visuals, audio, and metadata—with user personas derived from a broad and diverse corpus of audience demographics, generating varied and contextually relevant feedback. Furthermore, the system’s UI allows creators to explore and customize the simulated comments.

[ Paper ] [ Project Page ] [ Bibtex ]

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Max Liu * , Chan-Hung Yu * , Wei-Hsu Lee , Cheng-Wei Hung , Yen-Chun Chen , Shao-Hua Sun

in International Conference on Learning Representations (ICLR) 2025

We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy — an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.

[ Paper ] [ Bibtex ]

HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Ayano Hiranaka * , Shang-Fu Chen * , Chieh-Hsin Lai * , Dongjun Kim , Naoki Murata , Takashi Shibuya , Wei-Hsiang Liao , Shao-Hua Sun † , Yuki Mitsufuji †

in International Conference on Learning Representations (ICLR) 2025

To effectively and efficiently utilize human feedback, we develop a framework, HERO, which leverages online human feedback collected on the fly during model learning. Specifically, HERO features two key mechanisms: (1) an online training method that captures human feedback and provides informative learning signals for fine-tuning, and (2) generating images from SD's refined initialization samples, enabling faster convergence towards the evaluator's intent.

[ Paper ] [ Bibtex ]

Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

Wei Hung , Shao-Hua Sun , Ping-Chun Hsieh

in International Conference on Learning Representations (ICLR) 2025

We propose a generic and computationally efficient framework that can adapt a standard unconstrained RL method to action-constrained reinforcement learning. To enforce the action constraints, we leverage the classic acceptance-rejection method, where we treat the unconstrained policy as the proposal distribution and derive a modified policy with feasible actions. To improve the acceptance rate of the proposal distribution, we construct an augmented two-objective Markov decision process, which include additional self-loop state transitions and a penalty signal for the rejected actions.

[ Paper ] [ Bibtex ]

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

Grace Zhang , Ayush Jain , Injune Hwang , Shao-Hua Sun , Joseph J. Lim

in International Conference on Learning Representations (ICLR) 2025

We propose a multi-task reinforcement learning method, Q-switch Mixture of policies (QMP), that can share exploratory behavior, which can be helpful even when the optimal behaviors differ. Furthermore, as we learn each task, we can guide the exploration by sharing behaviors in a task and state dependent way. QMP learns to selectively share exploratory behavior between tasks by using a mixture of policies based on estimated discounted returns to gather training data.

[ Paper ] [ Project Page ] [ Bibtex ]

Hierarchical Programmatic Option Framework

Yu-An Lin * , Chen-Tao Lee * , Chih-Han Yang * , Guan-Ting Liu * , Shao-Hua Sun

in Neural Information Processing Systems (NeurIPS) 2024

We propose the Hierarchical Programmatic Option framework (HIPO), which aims to solve long and repetitive RL problems with human-readable programs as options (low-level policies). Specifically, we proposed a method that retrieves a set of effective, diverse, and compatible programs as options (programmatic options). Then, we learn a high-level policy to effectively reuse these programmatic options to solve reoccurring subtasks.

[ Paper ] [ Bibtex ]

Diffusion Imitation from Observation

Bo-Ruei Huang , Chun-Kai Yang , Chun-Mao Lai , Dai-Jie Wu , Shao-Hua Sun

in Neural Information Processing Systems (NeurIPS) 2024

Learning from Observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. We propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state. Then, we reformulate the learning objective to train the diffusion model as a binary classifier and use it to provide "realness" rewards for policy learning.

[ Paper ] [ Project Page ] [ Bibtex ]

Diffusion-Reward Adversarial Imitation Learning

Chun-Mao Lai * , Hsiang-Chun Wang * , Ping-Chun Hsieh , Yu-Chiang Frank Wang , Min-Hung Chen , Shao-Hua Sun

in Neural Information Processing Systems (NeurIPS) 2024

This work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning.

[ Paper ] [ Project Page ] [ Code ] [ Bibtex ]

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Liang-Hsuan Tseng * , En-Pei Hu * , Cheng-Han Chiang , Yuan Tseng , Hung-Yi Lee , Lin-shan Lee , Shao-Hua Sun

in Neural Information Processing Systems (NeurIPS) 2024

We propose REBORN, which alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity.

[ Paper ] [ Bibtex ]

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

Li-Chun Lu * , Shou-Jen Chen * , Tsung-Min Pai , Chan-Hung Yu , Hung-Yi Lee , Shao-Hua Sun

in Conference on Language Modeling (COLM) 2024

Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs.

[ Paper ] [ Bibtex ]

Diffusion Model-Augmented Behavioral Cloning

Shang-Fu Chen * , Hsiang-Chun Wang * , Ming-Hao Hsu , Chun-Mao Lai , Shao-Hua Sun

in International Conference on Machine Learning (ICML) 2024

This work aims to augment BC by employing diffusion models for modeling expert behaviors, and designing a learning objective that leverages learned diffusion models to guide policy learning. To this end, we propose diffusion model-augmented behavioral cloning (Diffusion-BC) that combines our proposed diffusion model guided learning objective with the BC objective, which complements each other. Our proposed method outperforms baselines or achieves competitive performance in various continuous control domains, including navigation, robot arm manipulation, and locomotion.

[ Paper ] [ Project Page ] [ Code ] [ Bibtex ]

Learning to Act from Actionless Videos through Dense Correspondences

Po-Chen Ko , Jiayuan Mao , Yilun Du , Shao-Hua Sun , Joshua B. Tenenbaum

in International Conference on Learning Representations (ICLR) 2024 (Spotlight)

Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that “hallucinate” robots executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute in an environment without requiring any explicit action labels, allowing us to learn from RGB videos and acquire various robotic tasks.

[ Paper ] [ Project Page ] [ Bibtex ]

Integrating Planning and Deep Reinforcement Learning via Automatic Induction of Task Substructures

Jung-Chun Liu , Chi-Hsien Chang , Shao-Hua Sun , Tian-Li Yu

in International Conference on Learning Representations (ICLR) 2024

We propose a framework that integrates deep reinforcement learning with classical planning by automatically inducing task structures and substructures from a few demonstrations. Specifically, we adopt abstraction mapping formulation and define critical actions that lead to the transition at the abstraction level. Then, we propose to induce critical action schemata regarded as subtasks by employing genetic programming where the program model reflects prior domain knowledge of effect rules.

[ Paper ] [ Bibtex ]

Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines

Yu-An Lin * , Chen-Tao Lee * , Guan-Ting Liu * , Pu-Jen Cheng , Shao-Hua Sun

in Generalization in Planning Workshop at Neural Information Processing Systems (NeurIPS) 2023 (Contributed talk)

This work proposes Program Machine Policies (POMPs), which bridge the advantages of programmatic RL and state machine policies, allowing for the representation of complex behaviors and the address of long-term tasks. Specifically, we introduce a method that can retrieve a set of effective, diverse, compatible programs. Then, we use these programs as modes of a state machine and learn a transition function to transition among mode programs, allowing for capturing long-horizon repetitive behaviors.

[ Paper ] [ Slide ] [ Bibtex ]

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

Jesse Zhang , Jiahui Zhang , Karl Pertsch , Ziyi Liu , Xiang Ren , Minsuk Chang , Shao-Hua Sun , Joseph J. Lim

in Conference on Robot Learning (CoRL) 2023 (Oral)

Our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills.

[ Paper ] [ Project Page ] [ Bibtex ]

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Guan-Ting Liu * , En-Pei Hu * , Pu-Jen Cheng , Hung-Yi Lee , Shao-Hua Sun

in International Conference on Machine Learning (ICML) 2023

We re-formulate solving a reinforcement learning task as synthesizing a task-solving program that can be executed to interact with the environment and maximize the return. We first learn a program embedding space that continuously parameterizes a diverse set of programs sampled from a program dataset. Then, we train a meta-policy, whose action space is the learned program embedding space, to produce a series of programs (i.e., predict a series of actions) to yield a composed task-solving program.

[ Paper ] [ Project Page ] [ Slide ] [ Poster ] [ Bibtex ]

Location-Aware Visual Question Generation with Lightweight Models

Nicholas Collin Suwono , Justin Chih-Yao Chen , Tun Min Hung , Ting-Hao Kenneth Huang , I-Bin Liao , Yung-Hui Li , Lun-Wei Ku , Shao-Hua Sun

in Empirical Methods in Natural Language Processing (EMNLP) 2023

This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location (e.g.,surrounding images and its GPS coordinate). To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. We propose methods to train lightweight models which can reliably generate engaging questions from location-aware information.

[ Paper ] [ Code ] [ Bibtex ]

Hierarchical Neural Program Synthesis

Linghan Zhong , Ryan Lindeborg , Jesse Zhang , Joseph J. Lim , Shao-Hua Sun

in arXiv preprint

Recent works in program synthesis have demonstrated encouraging results in a variety of domains such as string transformation, tensor manipulation, and describing behaviors of embodied agents. Most existing program synthesis methods are designed to synthesize programs from scratch, generating a program token by token, line by line. This fundamentally prevents these methods from scaling up to synthesize programs that are longer or more complex. In this work, we present a scalable program synthesis framework that instead synthesizes a program by hierarchically composing programs.

[ Paper ] [ Bibtex ] [ Project Page ]

Skill-based Meta-Reinforcement Learning

Taewook Nam , Shao-Hua Sun , Karl Pertsch , Sung Ju Hwang , Joseph J. Lim

in International Conference on Learning Representations (ICLR) 2022

We devise a method that enables meta-learning on long-horizon, sparse-reward tasks, allowing us to solve unseen target tasks with orders of magnitude fewer environment interactions. Specifically, we propose to (1) extract reusable skills and a skill prior from offline datasets, (2) meta-train a high-level policy that learns to efficiently compose learned skills into long-horizon behaviors, and (3) rapidly adapt the meta-trained policy to solve an unseen target task.

[ Paper ] [ Project Page ] [ Code ] [ Bibtex ]

Learning to Synthesize Programs as Interpretable and Generalizable Policies

Dweep Trivedi * , Jesse Zhang * , Shao-Hua Sun , Joseph J. Lim

in Neural Information Processing Systems (NeurIPS) 2021

We present a framework that learns to synthesize a program, detailing the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task.

[ Paper ] [ Project Page ] [ Code ] [ Slide ] [ Bibtex ]

Generalizable Imitation Learning from Observation via Inferring Goal Proximity

Youngwoon Lee * , Andrew Szot * , Shao-Hua Sun , Joseph J. Lim

in Neural Information Processing Systems (NeurIPS) 2021

Task progress is intuitive and readily available task information that can guide an agent closer to the desired goal. Furthermore, a progress estimator can generalize to new situations. From this intuition, we propose a simple yet effective imitation learning from observation method for a goal-directed task using a learned goal proximity function as a task progress estimator, for better generalization to unseen states and goals. We obtain this goal proximity function from expert demonstrations and online agent experience, and then use the learned goal proximity as a dense reward for policy training.

[ Paper ] [ Project Page ] [ Code ] [ Slide ] [ Bibtex ]

Program Guided Agent

Shao-Hua Sun , Te-Lin Wu , Joseph J. Lim

in International Conference on Learning Representations (ICLR) 2020 (Spotlight)

We propose to utilize programs, structured in a formal language, as a precise and expressive way to specify tasks, instead of natural languages which can often be ambiguous. We then devise a modular framework that learns to perform a task specified by a program – as different circumstances give rise to diverse ways to accomplish the task, our framework can perceive which circumstance it is currently under, and instruct a multitask policy accordingly to fulfill each subtask of the overall task.

[ Paper ] [ Project Page ] [ Slide ] [ Bibtex ]

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Risto Vuorio * , Shao-Hua Sun * , Hexiang Hu , Joseph J. Lim

in Neural Information Processing Systems (NeurIPS) 2019 (Spotlight)

Model-agnostic meta-learners aim to acquire meta-prior parameters from a distribution of tasks and adapt to novel tasks with few gradient updates. Yet, seeking a common initialization shared across the entire task distribution substantially limits the diversity of the task distributions that they are able to learn from. We propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior according to the identified mode, allowing more efficient fast adaptation.

[ Paper ] [ Project Page ] [ Code ] [ Poster ] [ Spotlight Slide ] [ Spotlight Talk ] [ Bibtex ]

Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks

Minyoung Huh * , Shao-Hua Sun * , Ning Zhang

in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019

We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. We formulate the generation task as a recurrent framework, in which the generator conditions on the discriminator spatial output response and its previous generation to improve generation quality over time - allowing the generator to attend and fix its previous mistakes. To effectively utilize the feedback, we propose an adaptive spatial transform (AST) layer, which learns to spatially modulate feature maps from its previous generation and the feedback signal from the discriminator.

[ Paper ] [ Poster ] [ Bibtex ]

Composing Complex Skills by Learning Transition Policies

Youngwoon Lee * , Shao-Hua Sun * , Sriram Somasundaram , Edward Hu , Joseph J. Lim

in International Conference on Learning Representations (ICLR) 2019

Humans acquire complex skills by exploiting previously learned skills and making transitions between them. To empower machines with this ability, we propose a method that can learn transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards. To efficiently train our transition policies, we introduce proximity predictors which induce rewards gauging proximity to suitable initial states for the next skill. The proposed method is evaluated on a set of complex continuous control tasks in bipedal locomotion and robotic arm manipulation which traditional methods struggle at.

[ Paper ] [ Project Page ] [ Code ] [ Slide ] [ Poster ] [ Bibtex ]

Toward Multimodal Model-Agnostic Meta-Learning

Risto Vuorio , Shao-Hua Sun , Hexiang Hu , Joseph J. Lim

in Meta-Learning Workshop at Neural Information Processing Systems (NeurIPS) 2018

Model-agnostic meta-learners aim to acquire meta-prior parameters from a distribution of tasks and adapt to novel tasks with few gradient updates. Yet, seeking a common initialization shared across the entire task distribution substantially limits the diversity of the task distributions that they are able to learn from. We propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior according to the identified mode, allowing more efficient fast adaptation.

[ Paper ] [ Code ] [ Bibtex ]

Multi-view to Novel View: Synthesizing Novel Views with Self-Learned Confidence

Shao-Hua Sun , Minyoung Huh , Yuan-Hong Liao , Ning Zhang , Joseph J. Lim

in European Conference on Computer Vision (ECCV) 2018

We aim to synthesize a target image with an arbitrary camera pose from multipple given source images. We propose an end-to-end trainable framework which consists of a flow prediction module and a pixel generation module to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors. We introduce a self-learned confidence aggregation mechanism to merge the predictions produced by the two modules given multi-view source images.

[ Paper ] [ Project Page ] [ Code ] [ Poster ] [ Bibtex ]

Neural Program Synthesis from Diverse Demonstration Videos

Shao-Hua Sun * , Hyeonwoo Noh * , Sriram Somasundaram , Joseph J. Lim

in International Conference on Machine Learning (ICML) 2018

Interpreting decision making logic in demonstration videos is key to collaborating with and mimicking humans. To empower machines with this ability, we propose a framework that is able to explicitly synthesize underlying programs from behaviorally diverse and visually complicated demonstration videos. We introduce a summarizer module to improve the network’s ability to integrate multiple demonstrations and employ a multi-task objective to encourage the model to learn meaningful intermediate representations.

[ Paper ] [ Project Page ] [ Code ] [ Slide ] [ Poster ] [ Bibtex ]

Exploiting Image Structural Similarity for Single Image Rain Removal

Shao-Hua Sun , Shang-Pu Fan , Yu-Chiang Frank Wang

in International Conference on Image Processing (ICIP) 2014

Without any prior knowledge or user interaction, single image rain removal has been a challenging task. By observing the limitations of standard batch-mode learning-based methods, we propose to exploit the structural similarity of the image bases for solving this task. By formulating the basis selection as an optimization problem, we are able to disregard those associated with rain patterns while the detailed image information can be preserved. Experiments on both synthetic and real-world images will verify the effectiveness of our proposed method.

[ Paper ] [ Bibtex ]

Full Publication List

Back