Shao-Hua Sun (孫紹華)
Assistant Professor
at National Taiwan University
Shao-Hua Sun
My research interests span over the fields of Robot Learning, Reinforcement Learning, Machine Learning, and Program Synthesis.


I am an Assistant Professor at National Taiwan University (NTU) with a joint appointment in the Department of Electrical Engineering and the Graduate Institute of Communication Engineering. Prior to joining NTU, I recently completed my Ph.D. in Computer Science at the University of Southern California, where I worked in the Cognitive Learning for Vision and Robotics Lab (CLVR). Before that, I received my B.S. degree in Electrical Engineering from NTU. My research interests span Robot Learning, Reinforcement Learning, Program Synthesis, and Machine Learning.

Prospective students: I am looking for students interested in machine learning, robot learning, reinforcement learning, and program synthesis. Specifically, I am hiring M.S. and Ph.D. students admitted to the Data Science and Smart Networking Group at the Graduate Institute of Communication Engineering (電信所丙組/資料科學與智慧網路組) or the Data Science Degree Program (資料科學學位學程) at NTU. Also, I am seeking undergraduate students, research assistants, and visitors with different experience levels. If you are interested in joining my group, please check out this slide and fill in the Google form.

Industrial Outreach and Collaboration

TSMC (台積電), Nvidia (輝達), ASUS (華碩), 欣坊/神農氏
Adjunct Research Scientist
Appier (沛星互動)
Please reach out to me at if you are interested in setting up collaborations with us.


May 2024
Our paper Diffusion Model-Augmented Behavioral Cloning is accepted by ICML 2024.
Oct 2023
Our papers Learning to Act from Actionless Video through Dense Correspondences and Integrating Planning and Deep Reinforcement Learning via Automatic Induction of Task Substructures are accepted by ICLR 2024.
Nov 2023
Aug 2023
Our paper Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance is accepted by CoRL 2023 for an oral presentation.
June 2023

Research Highlight - Program-Guided Robot Learning

My research focuses on developing a robot learning framework that enables robots to acquire long-horizon and complex skills with hierarchical structures, such as furniture assembly and cooking. Specifically, I present an interpretable and generalizable program-guided robot learning framework, which represents desired behaviors as a program as well as acquires and utilizes primitive skills for learning to execute desired skills. Instead of learning in an end-to-end manner, I propose to design specialized learning modules that aim to (1) perform program inference to explicitly infer underlying programs that describe the skills of interest, (2) acquire primitive skills that can be used to compose more complex and longer-horizon skills, and (3) perform task execution by following the inferred program and utilizing acquired primitive skills to replicate the desired skills. This slide gives an overview of my research.


Diffusion Model-Augmented Behavioral Cloning
International Conference on Machine Learning (ICML) 2024

This work aims to augment BC by employing diffusion models for modeling expert behaviors, and designing a learning objective that leverages learned diffusion models to guide policy learning. To this end, we propose diffusion model-augmented behavioral cloning (Diffusion-BC) that combines our proposed diffusion model guided learning objective with the BC objective, which complements each other. Our proposed method outperforms baselines or achieves competitive performance in various continuous control domains, including navigation, robot arm manipulation, and locomotion.

Learning to Act from Actionless Videos through Dense Correspondences
International Conference on Learning Representations (ICLR) 2024   (Spotlight)

Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that “hallucinate” robots executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute in an environment without requiring any explicit action labels, allowing us to learn from RGB videos and acquire various robotic tasks.

Integrating Planning and Deep Reinforcement Learning via Automatic Induction of Task Substructures
International Conference on Learning Representations (ICLR) 2024

We propose a framework that integrates deep reinforcement learning with classical planning by automatically inducing task structures and substructures from a few demonstrations. Specifically, we adopt abstraction mapping formulation and define critical actions that lead to the transition at the abstraction level. Then, we propose to induce critical action schemata regarded as subtasks by employing genetic programming where the program model reflects prior domain knowledge of effect rules.

Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines
Generalization in Planning Workshop at Neural Information Processing Systems (NeurIPS) 2023   (Contributed talk)

This work proposes Program Machine Policies (POMPs), which bridge the advantages of programmatic RL and state machine policies, allowing for the representation of complex behaviors and the address of long-term tasks. Specifically, we introduce a method that can retrieve a set of effective, diverse, compatible programs. Then, we use these programs as modes of a state machine and learn a transition function to transition among mode programs, allowing for capturing long-horizon repetitive behaviors.

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Conference on Robot Learning (CoRL) 2023   (Oral)

Our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills.

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
International Conference on Machine Learning (ICML) 2023

We re-formulate solving a reinforcement learning task as synthesizing a task-solving program that can be executed to interact with the environment and maximize the return. We first learn a program embedding space that continuously parameterizes a diverse set of programs sampled from a program dataset. Then, we train a meta-policy, whose action space is the learned program embedding space, to produce a series of programs (i.e., predict a series of actions) to yield a composed task-solving program.

Location-Aware Visual Question Generation with Lightweight Models
Empirical Methods in Natural Language Processing (EMNLP) 2023

This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location (e.g.,surrounding images and its GPS coordinate). To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. We propose methods to train lightweight models which can reliably generate engaging questions from location-aware information.

Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing
Deep RL Workshop at Neural Information Processing Systems (NeurIPS) 2022

We propose a multi-task reinforcement learning method, Q-switch Mixture of policies (QMP), that can share exploratory behavior, which can be helpful even when the optimal behaviors differ. Furthermore, as we learn each task, we can guide the exploration by sharing behaviors in a task and state dependent way. QMP learns to selectively share exploratory behavior between tasks by using a mixture of policies based on estimated discounted returns to gather training data.

Skill-based Meta-Reinforcement Learning
International Conference on Learning Representations (ICLR) 2022

We devise a method that enables meta-learning on long-horizon, sparse-reward tasks, allowing us to solve unseen target tasks with orders of magnitude fewer environment interactions. Specifically, we propose to (1) extract reusable skills and a skill prior from offline datasets, (2) meta-train a high-level policy that learns to efficiently compose learned skills into long-horizon behaviors, and (3) rapidly adapt the meta-trained policy to solve an unseen target task.

Learning to Synthesize Programs as Interpretable and Generalizable Policies
Neural Information Processing Systems (NeurIPS) 2021

We present a framework that learns to synthesize a program, detailing the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task.

Generalizable Imitation Learning from Observation via Inferring Goal Proximity
Neural Information Processing Systems (NeurIPS) 2021

Task progress is intuitive and readily available task information that can guide an agent closer to the desired goal. Furthermore, a progress estimator can generalize to new situations. From this intuition, we propose a simple yet effective imitation learning from observation method for a goal-directed task using a learned goal proximity function as a task progress estimator, for better generalization to unseen states and goals. We obtain this goal proximity function from expert demonstrations and online agent experience, and then use the learned goal proximity as a dense reward for policy training.

Program Guided Agent
International Conference on Learning Representations (ICLR) 2020   (Spotlight)

We propose to utilize programs, structured in a formal language, as a precise and expressive way to specify tasks, instead of natural languages which can often be ambiguous. We then devise a modular framework that learns to perform a task specified by a program – as different circumstances give rise to diverse ways to accomplish the task, our framework can perceive which circumstance it is currently under, and instruct a multitask policy accordingly to fulfill each subtask of the overall task.

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
Neural Information Processing Systems (NeurIPS) 2019   (Spotlight)

Model-agnostic meta-learners aim to acquire meta-prior parameters from a distribution of tasks and adapt to novel tasks with few gradient updates. Yet, seeking a common initialization shared across the entire task distribution substantially limits the diversity of the task distributions that they are able to learn from. We propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior according to the identified mode, allowing more efficient fast adaptation.

Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019

We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. We formulate the generation task as a recurrent framework, in which the generator conditions on the discriminator spatial output response and its previous generation to improve generation quality over time - allowing the generator to attend and fix its previous mistakes.

Composing Complex Skills by Learning Transition Policies
International Conference on Learning Representations (ICLR) 2019

Humans acquire complex skills by exploiting previously learned skills and making transitions between them. To empower machines with this ability, we propose a method that can learn transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards. To efficiently train our transition policies, we introduce proximity predictors which induce rewards gauging proximity to suitable initial states for the next skill.

Multi-view to Novel View: Synthesizing Novel Views with Self-Learned Confidence
European Conference on Computer Vision (ECCV) 2018

We aim to synthesize a target image with an arbitrary camera pose from multipple given source images. We propose an end-to-end trainable framework which consists of a flow prediction module and a pixel generation module to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors. We introduce a self-learned confidence aggregation mechanism to merge the predictions produced by the two modules given multi-view source images.

Neural Program Synthesis from Diverse Demonstration Videos
International Conference on Machine Learning (ICML) 2018

Interpreting decision making logic in demonstration videos is key to collaborating with and mimicking humans. To empower machines with this ability, we propose a framework that is able to explicitly synthesize underlying programs from behaviorally diverse and visually complicated demonstration videos. We introduce a summarizer module to improve the network’s ability to integrate multiple demonstrations and employ a multi-task objective to encourage the model to learn meaningful intermediate representations.

Professional Activity

Journal reviewer Transactions on Machine Learning Research, IEEE Transactions on Pattern Analysis and Machine Intelligence, Machine Learning, IEEE Transactions on Image Processing, IEEE Transactions on Industrial Informatics, IEEE Transactions on Multimedia, ACM Transactions on Multimedia Computing Communications and Applications
Area chair ACML