VECTOR RESEARCHERS PRESENT “FIRST OF ITS KIND” AI-POWERED SEARCH ENGINE FOR AI AT CVPR 2020

By Ian Gormely

The ability to build and train neural networks remains out of reach for many individuals and organizations because of the large amount of data, time and compute power needed to train new AI models.

Now a “first of its kind” search engine is helping to bring the power of neural networks closer to the hands of your average coder. Neural Data Server (NDS) is a search engine for AI data built by Vector Faculty Member Sanja Fidler, Vector grad student David Acuna, and the University of Toronto’s and University of Toronto undergraduate student Xi Yan. It uses machine learning (ML) to find the most relevant data needed in pre-training, an important step in building high-performance deep learning models. This can save precious time, compute power, and ultimately money. The team will present NDS and its accompanying paper, “Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data” in an oral presentation at CVPR this month.

As its name implies, pre-training teaches or conditions a deep neural network before it begins a particular task. It can significantly boost a deep neural network’s performance, particularly for situations where there is a scarce amount of labeled data. “If you don’t pre-train a network, there is a noticeable gap in performance,” says Fidler who calls data the lifeblood of machine learning. “You need a lot of data to train really high-performance models.” But the largest data sets are made up of tens of millions of data points and running through all that data can take weeks and requires a lot of computational resources. “Particularly in academia, we have limited resources,” she says. “We just can’t afford pre-training on 10 or 20 million images, and across multiple experiments.”

NDS helps cut down on the time and compute power needed in pre-training by eliminating irrelevant data points. If a user is interested in building a fashion-related application, instead of training a neural net on a large data set of vehicle images, NDS will remove anything that doesn’t include a close-up of people and the clothes they are wearing. This leads to smaller but more efficient data sets. Notably, NDS does not hold data. Rather, like any other search engine, it indexes publicly available Machine Learning datasets, making the data searchable.

Unlike Google’s Dataset Search, NDS then recommends the data that will be most relevant to the user’s unique model. To run a search, users download and run a set of “experts,” tiny ML models, on their dataset. The results — a set of statistics, not the actual data — are sent back to NDS, which identifies the best subset of data for a model. The whole process takes a matter of minutes.

Free to use while protecting the privacy of users’ data, NDS opens the door for researchers outside of higher education or large companies to build their own neural networks. Fidler points to startups — companies with the skill set and innovative ideas to build new models but lack the computing infrastructure — as being particularly poised to benefit from NDS.

“It’s something that we see that the community needs,” says Acuna. “Pre-training is key to achieve state-of-the-art results, and we want to make it easy and feasible for everyone.”

“Machine learning systems can only be as good as their data,” says Xi. “We hope our NDS opens a more effective way to leverage the massive amount of data that’s available today.”

Also at CVPR 2020

Vector affiliated researchers have a number of papers at CVPR this year.

Auto-Tuning Structured Light by Optical Stochastic Gradient Descent

Wenzheng Chen, Parsa Mirdehghan, Sanja Fidler (Vector), Kiriakos N. Kutulakos (Vector)

Discovering optimal coding and decoding schema is key in active depth sensing 3D reconstruction. While most previous works adopt heuristic rules to design coding without considering actual device characteristics, in contrast, we put the real devices in the loop by jointly optimizing the neural network parameters with the specific hardware parameters. Our method, which we call Optical SGD, allows the chosen active depth imaging system to automatically discover the optimal illuminations & decoding algorithms it should use. One can simply put their favourite device in front of a textured board, select the evaluation metric they like, let our algorithm, and obtain their code and decoder that best match the device.

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal (Vector), Alla Sheffer

Reconstruction of 3D shape from 2D image is a classical computer vision problem that has wide practical applicability ranging from navigation and object manipulation to spatial reasoning and understanding. Despite significant progress in the field, it remains a challenge due to inherent ambiguity of recovering occluded and only partially observed surfaces. In this work, we leverage a set of perceptually guided geometric constraints to help improve such reconstructions. Mainly, we observe that most everyday objects (including man-made objects) are symmetric. We are able to estimate these symmetries and utilize them to predict occluded (back) views of objects from the observable front, using a neural network architecture. The observed front and predicted back views almost entirely expose the outer surface of the object. Hence, by fusing information from these views, we can reconstruct a complete and accurate surface. Our experiments demonstrate that our approach outperforms state-of-the-art 3D shape reconstruction from 2D and 2.5D data in terms of geometric fidelity and detail preservation.

Improved Few-shot Visual Classification

Peyman Bateni, Rghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal (Vector)

Learning from a limited amount of data is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled datasets. Most approaches to date have focused on progressively more complex ways to extract features from images or strategies for how the data should be structured for efficient learning. In this work, we consider a simpler approach of looking at learning a distribution-aware distance metric, which can significantly improve the performance of an existing state-of-the-art model (CNAPS). We find that it is possible to structure the model that makes learning of this metric possible with even a few samples. The resulting approach, which we call Simple CNAPS, has nearly 10% fewer parameters, but yet performs 6% better than the original.

Learning to Evaluate Perception Models Using Planner-Centric Metrics

Jonah Philion, Amlan Kar, Sanja Fidler (NVIDIA)

Evaluation metrics are important in self-driving because they determine what algorithms ultimately get deployed in cars. The issue with current evaluation metrics for self-driving perception is that they consist of long lists of heuristics that researchers hand-design in the hopes that the final output roughly correlates with driving performance. In this paper, we propose a more principled metric for 3D object detection specifically for the task of self-driving. The core idea behind our metric is to isolate the task of object detection and measure the impact the produced detections would induce on the downstream task of driving. Without hand-designing it to, we find that our metric penalizes many of the mistakes that other metrics penalize by design.

Learning to Simulate Dynamic Environments With GameGAN

Seung Wook Kim, Yuhao Zhou, Jonah Philion, Antonio Torralba, Sanja Fidler (NVIDIA)

GameGAN is a neural network-based AI model that learns to mimic game engines. It is trained by ingesting screenplays along with user actions from some game. Once trained, it can render the next screen given key-presses so that people can play the game without game engines, but only with AI! GameGAN also has a memory module that remembers what it has generated and can separate backgrounds from dynamically moving objects.

Related:

Anne Martel: Using AI to personalize cancer treatment

Mohamad Moosavi: Accelerating the search for climate solutions with AI

Hassan Ashtiani: Building trustworthy AI through mathematical foundations

Vector researchers advance representation learning and deep learning research at ICLR 2026

Remarkable 2026 Poster Session: 60 research projects shaping AI’s future

CRISPNAM-FG: An interpretable Fine-Gray deep survival model for competing risks in health care

The New Cartography of the Invisible

Vector researchers advance AI frontiers with 80 papers at NeurIPS 2025

When smart AI gets too smart: Key insights from Vector’s 2025 ML Security & Privacy Workshop

Vector Institute names 13 new Faculty Members, expanding core research leadership across Ontario

Vector researchers dive into deep learning at ICLR 2025

Vector researchers tackle real-world AI challenges at ICML 2025

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model

AI Weather Forecasting Breakthrough: How Canadian Innovation is Transforming Climate Prediction | Aardvark Weather

Exploring Intelligence: Vector Faculty Member Kelsey Allen’s Path from Particle Physics to Cognitive Machine Learning

Real World Multi-Agent Reinforcement Learning – Latest Developments and Applications

Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making

Recommender Systems: Where Academia Meets Industry

My Visiting Researcher Term at Vector Institute

Vector researchers presenting more than 98 papers at NeurIPS 2024

Unlocking the Potential of Prompt-Tuning in Federated Learning

New multimodal dataset will help in the development of ethical AI systems

Unveiling Alzheimer’s: How Speech and AI Can Help Detect Disease

Vector co-founder Geoffrey Hinton wins the Nobel Prize in Physics 2024

Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights

Vector Institute researchers reconvene for the second edition of the Machine Learning Privacy and Security Workshop

Vector researcher Wenhu Chen on improving and benchmarking foundation models

Vector Researchers present papers at ACL 2024

AtomGen: Streamlining Atomistic Modeling through Dataset and Benchmark Integration

Vector researchers presented more than 50 papers at ICML 2024

Vector researchers are presenting over a dozen papers at CVPR 2024

Vector Institute Computer Vision Workshop showcases the field’s current capabilities and future potential

Vector researcher Gautam Kamath breaks down the latest developments in robustness and privacy

World-leading AI Trust and Safety Experts Publish Major Paper on Managing AI Risks in the journal Science

Standardized protocols are key to the responsible deployment of language models

The known unknowns: Vector researcher Geoff Pleiss digs deep into uncertainty to make ML models more accurate

Breaking Ground: Natural language processing headlines Vector Institute’s latest workshop gathering

Vector Research Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers

How Vector Researcher Xi He uses differential privacy to help keep data private

Vector Research Blog: Structured Neural Networks for Density Estimation and Causal Inference

Vector Research Blog: Causal Effect Estimation Using Machine Learning

Machine learning theory takes centre stage at Vector Institute workshop

Introducing FlexModel: Breakthrough Framework for Unveiling the Secrets of Large Generative AI Models

Neutralizing Bias in AI: Vector Institute’s UnBIAS Framework Revolutionizes Ethical Text Analysis

Vector researchers presenting more than 65 papers at NeurIPS 2023

AI for Chemistry and Materials: blending old and new ways of thinking

AI & public health: using natural language processing for clinical database management

ICML 2023: Developing an adaptive computation model for multidimensional generative tasks

Vector Research Blog: Large Language Models, Prompting and PEFT

Dan Roy named Vector Research Co-Director

Unlocking AI-powered approaches to cancer treatment and detection

Vector community explores data privacy research at Machine Learning Privacy and Security Workshop

Machine Learning Meets Quantum Mechanics: Vector Workshop Showcases Groundbreaking Developments in Quantum Computing

Over 20 Vector research papers accepted at CVPR 2023

Vector research featured at ICLR 2023

AI Research Symposium highlights new Vector research

Vector researchers win top honours at NeurIPS 2022

Canada can lead in AI for Science

Vector researcher Alán Aspuru-Guzik delivers CIFAR Massey Talk

Deep Learning for Building Footprint Extraction in Aerial Imagery

Graham Taylor named Vector Research Director

Acceleration Consortium, Matter Lab, and Vector Institute collaborate on software to power self-driving labs

New Vector Faculty Member Jeff Clune’s quest to create open-ended AI systems

Vector research blog: Value Gradient weighted Model-Based Reinforcement Learning

New AI framework helps map and manage invasive mussel species in Canada’s lakes

Computer Vision Technical Report details insights from industry-academic collaborative project

Vector researchers recognized with awards at the 2022 International Conference on Learning Representations (ICLR)

Research Symposium brings together Vector community to celebrate student and postdoc work

Amateur hockey given professional viewing experience courtesy of machine vision startup co-founded by Vector researcher

AI-enabled tool that identifies COVID-19 variants co-developed by Vector researcher Bo Wang

Technology, including AI, increasingly plays a key role in our food chain

Spotlight on Health at NeurIPS 2021

Vector researchers presenting more than 50 papers at NeurIPS 2021

Vector researchers help institutions ensure privacy and confidentiality when sharing ML models

Vector researchers use OHDP to determine mortality predictors for long-term care residents with COVID-19

Vector researchers use machine learning to build better quantum computers

Machine learning model creates treatment plans for patients with prostate cancer