I am a 2nd-year Ph.D. student in Computer Science at the University of Southern California (USC) as an Annenberg Fellow in Cognitive Learning for Vision and Robotics Lab (CLVR) working with Professor Joseph J. Lim. Before joining USC, I received my B.S. degree from Dept. of Electrical Engineering at National Taiwan University (NTU), Taipei, Taiwan. I open source my research projects as well as implementations of state-of-the-art papers on my GitHub and tweet exciting stuff on my Twitter.
My research interests span over the fields of Deep Learning, Computer Vision, Reinforcement Learning, Meta-learning, and Robot Learning. In particular, I am interested in developing learning algorithms that empower machines to efficiently master complex tasks as well as quickly adapt to novel tasks and environments with prior knowledge.
We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. We formulate the generation task as a recurrent framework, in which the discriminator feedback is integrated into the feedforward path of the generation process. Specifically, the generator conditions on the discriminator spatial output response and its previous generation to improve generation quality over time -- allowing the generator to attend and fix its previous mistakes. To effectively utilize the feedback, we propose an adaptive spatial transform (AST) layer, which learns to spatially modulate feature maps from its previous generation and the feedback signal from the discriminator.
Humans acquire complex skills by exploiting previously learned skills and making transitions between them. To empower machines with this ability, we propose a method that can learn transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards. To efficiently train our transition policies, we introduce proximity predictors which induce rewards gauging proximity to suitable initial states for the next skill. The proposed method is evaluated on a set of complex continuous control tasks in bipedal locomotion and robotic arm manipulation which traditional methods struggle at.
Gradient-based meta-learners such as MAML are able to learn a meta-prior from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. One important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In this paper, we augment MAML with the capability to identify tasks sampled from a multimodal task distribution and adapt quickly through gradient updates.
We address the task of multi-view novel view synthesis, where we are interested in synthesizing a target image with an arbitrary camera pose from given source images. We propose an end-to-end trainable framework which consists of a flow prediction module and a pixel generation module to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors. We introduce a self-learned confidence aggregation mechanism to merge the predictions produced by the two modules given multi-view source images.
Interpreting decision making logic in demonstration videos is key to collaborating with and mimicking humans. To empower machines with this ability, we propose a neural program synthesizer that is able to explicitly synthesize underlying programs from behaviorally diverse and visually complicated demonstration videos. We introduce a summarizer module as part of our model to improve the network’s ability to integrate multiple demonstrations varying in behavior. We also employ a multi-task objective to encourage the model to learn meaningful intermediate representations for end-to-end training.