Photo

Recent News

  • I joined Nvidia to work on Autonomous Vehicle and Robotics!

  • We will be hosting the tutorial of Object-centric Representations in Computer Vision in CVPR 2024. Stay tuned and see you in Seattle!

  • 🚀 Exciting News! 📘 Our latest survey paper is now released, presenting a comprehensive analysis of hallucination phenomena in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs). (Paper, Resource Repo).

  • Two papers got accpeted to CVPR 2024: Adaptive Slot Attention (Paper, Project Page), Learning for Transductive Threshold Calibration in Open-World Recognition (Paper).

  • Introduce our ICLR 2024 work, 🔥Instruct Video-to-Video🔥, an efficient approach for video editing that eliminates the need for per-video-per-model finetuning by constructing a synthetic paired video dataset. (Paper, Code)

  • Four papers got accpeted to ICCV 2023: OC-MOT (Paper, Code), Slot-Naming (Paper), C2F-Seg(Paper, Project Page), EoRaS(Paper).

  • One paper is accepted to ICLR 2023: Bridging the Gap to Real-World Object-Centric Learning. Paper link and code link.

  • One paper is accepted to NeurIPS 2022 (Spotlight): Self-supervised Amodal Video Object Segmentation. Paper link and code link.

About Me

  • I recently came back to the frontline of autonomous vehicle and robotics, working as a principal engineer at Nvidia.
  • I was an Applied Science Manager at Amazon Web Service AI Shanghai Lablet, leading computer vision efforts. I play a lot with objects. In this period, I will be focusing on object-centric learning, visual-language model, graph neural network and causal representation learning, exploring and exploiting their usage in applications like video analysis, 3D vision, autonomous driving and robotics. I also contributed to the Graph Neural Network framework DGL and Object-centric Learning Framework OCLF.
  • Before joining Amazon, I was a Staff Machine Learning Scientist at Tesla Autopilot AI/Vision team, working with Dr. Andrej Karpathy. I was one of the major contributors of the Autopilot vision neural network stack and the task owner of Autopilot (Dynamic and Static) Object Detection during 2017 - 2020. My working items have been shipped into hundreds of thousands of Tesla cars worldwide during major Autopilot releases, contributing to Autopilot functionalities like Traffic-Aware Cruise Control, Auto Lane Change, Automatic Emergency Braking, Navigation on Autopilot, Smart Summon, etc.
  • Prior to Tesla, I spent 3.25 years at Microsoft. I was a Software Engineer 2 at Microsoft Bing Multimedia team (now under Microsoft AI & Research Org) working with Dr. Linjun Yang, where I was working on Image-Text Semantic Embedding to contribute to functionalities like Image Annotation and Image Search in Bing Search Engine. And during my graduate years, I interned at Microsoft Research Asia, advised by Prof. Zheng Zhang and Dr. Kuiyuan Yang, where I was working on both training platform and vision applications of deep learning. I was a major contributor of the open-source deep learning training framework Minerva and also contributed to the machine learning library MXNet.
  • I received M.S degree in Computer Science from Wangxuan Institute Of Computer Technology, Peking University, advised by Prof. Yuxin Peng. And B.S degree in Computer Science from Nankai University.
  • My enthusiasm is to apply machine learning to large-scale, life-changing technologies, currently with a focus on computer vision related applications.