About me
I am an applied scientist at Amazon working on reinforcement learning. My research goal is creating AI agents that are capable of making decisions in unknown and complex environments, besides perception and prediction. Such problems are often formalized under the reinforcement learning (RL) framework. I am interested in understanding and developing RL algorithms from the first principles, with a focus on causality [paper], scalability, and value function learning dynamics [paper] in RL. Besides the fundamental research in RL, I work on user alignment of foundation models (Amazon Bedrock), and pretrained/foundation models for efficient decision making [paper].
Previously, I was a research scientist at ByteDance from 2021 to 2022. I worked on bandit-based retrieval algorithms, and drove several launches in the TikTok and Douyin recommendation systems. I obtained my Ph.D. in computer science from Stanford in 2021, advised by Emma Brunskill. My dissertation work is about batch reinforcement learning. With my collaborators, we proposed the first finite sample error bound of batch RL without full coverage assumption [paper] and the first convergent batch policy gradient with function approximation [paper]. I also worked on batch RL applications in the real world: helping chronic condition patients, treatment policy evaluation on clinical data with practicing intensivist [paper], and teaching kids math [paper]. I complete my B.S. in machine intelligence from Peking University in 2016.
Preprints and Publications
-
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
Preprint
-
Budgeting Counterfactual for Offline RL
NeurIPS 2023
-
TD Convergence: An Optimization Perspective
NeurIPS 2023
-
Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task
Machine Learning Journal
-
Provably Sample-Efficient RL with Side Information about Latent Dynamics
NeurIPS 2022
-
Offline Policy Optimization with Eligible Actions
UAI 2022
-
Provably Good Batch Reinforcement Learning Without Great Exploration
NeurIPS 2020
-
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
ICML 2020
-
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
ICML 2020
-
All-Action Policy Gradient Methods: A Numerical Integration Approach
-
Off-Policy Policy Gradient with State Distribution Correction
UAI 2019 (Oral) -
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
ICML 2019 (Oral) -
Representation Balancing MDPs for Off-Policy Policy Evaluation
NeurIPS 2018
-
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
EWRL 2018
-
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
ICML 2018 Workshops
-
Switched Trajectories for Off-Policy Learning
ICML 2018 Workshops
-
Model Selection for Off-Policy Policy Evaluation
RLDM 2017, Extended Abstract
-
PAC Continuous State Online Multitask Reinforcement Learning with Identification
AAMAS 2016
-
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Journal of Computer Science and Technology, 31(3): 512-524, 2016.
Professional Service
Journal Reviewing: JMLR, IEEE TPAMI, MLJ, AIJ, Biometrika
Conference Reviewing: NeurIPS, ICLR, ICML, AISTATS, UAI, AAAI