Recent News
I joined Nvidia to work on Autonomous Vehicle and Robotics!
We will be hosting the tutorial of Object-centric Representations in Computer Vision in CVPR 2024. Stay tuned and see you in Seattle!
🚀 Exciting News! 📘 Our latest survey paper is now released, presenting a comprehensive analysis of hallucination phenomena in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs). (Paper, Resource Repo).
Two papers got accpeted to CVPR 2024: Adaptive Slot Attention (Paper, Project Page), Learning for Transductive Threshold Calibration in Open-World Recognition (Paper).
Introduce our ICLR 2024 work, 🔥Instruct Video-to-Video🔥, an efficient approach for video editing that eliminates the need for per-video-per-model finetuning by constructing a synthetic paired video dataset. (Paper, Code)
Four papers got accpeted to ICCV 2023: OC-MOT (Paper, Code), Slot-Naming (Paper), C2F-Seg(Paper, Project Page), EoRaS(Paper).
One paper is accepted to ICLR 2023: Bridging the Gap to Real-World Object-Centric Learning. Paper link and code link.
One paper is accepted to NeurIPS 2022 (Spotlight): Self-supervised Amodal Video Object Segmentation. Paper link and code link.
About Me
- I recently came back to the frontline of autonomous vehicle and robotics, working as a principal engineer at Nvidia.
- I was an Applied Science Manager at Amazon Web Service AI Shanghai Lablet, leading computer vision efforts. I play a lot with objects. In this period, I will be focusing on object-centric learning, visual-language model, graph neural network and causal representation learning, exploring and exploiting their usage in applications like video analysis, 3D vision, autonomous driving and robotics. I also contributed to the Graph Neural Network framework DGL and Object-centric Learning Framework OCLF.
- Before joining Amazon, I was a Staff Machine Learning Scientist at Tesla Autopilot AI/Vision team, working with Dr. Andrej Karpathy. I was one of the major contributors of the Autopilot vision neural network stack and the task owner of Autopilot (Dynamic and Static) Object Detection during 2017 - 2020. My working items have been shipped into hundreds of thousands of Tesla cars worldwide during major Autopilot releases, contributing to Autopilot functionalities like Traffic-Aware Cruise Control, Auto Lane Change, Automatic Emergency Braking, Navigation on Autopilot, Smart Summon, etc.
- Prior to Tesla, I spent 3.25 years at Microsoft. I was a Software Engineer 2 at Microsoft Bing Multimedia team (now under Microsoft AI & Research Org) working with Dr. Linjun Yang, where I was working on Image-Text Semantic Embedding to contribute to functionalities like Image Annotation and Image Search in Bing Search Engine. And during my graduate years, I interned at Microsoft Research Asia, advised by Prof. Zheng Zhang and Dr. Kuiyuan Yang, where I was working on both training platform and vision applications of deep learning. I was a major contributor of the open-source deep learning training framework Minerva and also contributed to the machine learning library MXNet.
- I received M.S degree in Computer Science from Wangxuan Institute Of Computer Technology, Peking University, advised by Prof. Yuxin Peng. And B.S degree in Computer Science from Nankai University.
- My enthusiasm is to apply machine learning to large-scale, life-changing technologies, currently with a focus on computer vision related applications.