Jia Deng
Princeton University
Machine learningPattern recognitionObject detectionComputer visionComputer science
What is this?
Publications 86
#1Alejandro Newell (Princeton University)H-Index: 1
#2Jia Deng (Princeton University)H-Index: 31
Recent advances have spurred incredible progress in self-supervised pretraining for vision. We investigate what factors may play a role in the utility of these pretraining methods for practitioners. To do this, we evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. We prepare a suite of synthetic data that enables an endless supply of annotated images as well as full control over dataset difficulty. Our experiments offer insights i...
#1Zachary Teed (Princeton University)
#2Jia Deng (Princeton University)H-Index: 31
We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance, with strong cross-dataset generalization and high efficiency in inference time, training speed, and parameter count. Code is available ...
#1Hei Law (UM: University of Michigan)H-Index: 3
#2Jia Deng (UM: University of Michigan)H-Index: 31
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners...
98 CitationsSource
We consider the task of automated theorem proving, a key AI task. Deep learning has shown promise for training theorem provers, but there are limited human-written theorems and proofs available for supervised learning. To address this limitation, we propose to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover and adv...
#1Kristian M. Black (UM: University of Michigan)H-Index: 1
#2Hei Law (Princeton University)H-Index: 3
Last. Khurshid R. Ghani (UM: University of Michigan)H-Index: 25
view all 5 authors...
OBJECTIVES: To assess the recall of a deep learning (DL) method to automatically detect kidney stones composition from digital photographs of stones. MATERIALS AND METHODS: A total of 63 human kidney stones of varied compositions were obtained from a stone laboratory including calcium oxalate monohydrate (COM), uric acid (UA), magnesium ammonium phosphate hexahydrate (MAPH/struvite), calcium hydrogen phosphate dihydrate (CHPD/brushite), and cystine stones. At least two images of the stones, both...
3 CitationsSource
#1Kaiyu Yang (Princeton University)H-Index: 1
#2Klint Qinami (Princeton University)H-Index: 2
Last. Olga Russakovsky (Princeton University)H-Index: 1
view all 5 authors...
Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale...
5 CitationsSource
#1Jonathan C. Stroud (UM: University of Michigan)H-Index: 4
Last. Olga RussakovskyH-Index: 17
view all 5 authors...
Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions. Compositional modeling becomes central: we first ground atomic descriptions "girl eating an apple," "batter hitting the ball" to short video segments, and then establish the temporal relationships between the segments. This compositional structure enables models to recognize a wider variety of events not seen during training through recognizing their atomic sub-events...
Oct 1, 2019 in ICCV (International Conference on Computer Vision)
#1Lanlan Liu (UM: University of Michigan)
#1Lanlan Liu (UM: University of Michigan)H-Index: 2
Last. Li-Jia Li (Google)H-Index: 31
view all 5 authors...
This paper explores object detection in the small data regime, where only a limited number of annotated bounding boxes are available due to data rarity and annotation expense. This is a common challenge today with machine learning being applied to many new tasks where obtaining training data is more challenging, e.g. in medical images with rare diseases that doctors sometimes only see once in their life-time. In this work we explore this problem from a generative modeling perspective by learning...
2 CitationsSource
#1Alejandro Newell (Princeton University)H-Index: 1
#2Lu JiangH-Index: 1
Last. Jia DengH-Index: 31
view all 5 authors...
Multi-task learning holds the promise of less data, parameters, and time than training of separate models. We propose a method to automatically search over multi-task architectures while taking resource constraints into consideration. We propose a search space that compactly represents different parameter sharing strategies. This provides more effective coverage and sampling of the space of multi-task architectures. We also present a method for quick evaluation of different architectures by usin...
1 Citations
#1Lanlan LiuH-Index: 1
#2Mingzhe WangH-Index: 1
Last. Jia DengH-Index: 31
view all 3 authors...
1 Citations