My Visiting Researcher Term at Vector Institute

December 13, 2024

2024 Research Research 2024

By Baoxiang Wang

Introduction

Earlier this year, I took a leave from CUHK Shenzhen to join Vector Institute as a visiting researcher. The Visiting Researcher Program is intended for both early-career and established scholars working in machine learning and deep learning research and their applications. It gives them up to one year to come to Vector, gain access to our resources, and collaborate with its community. From January to August, I worked with the greatest minds in AI and attempted to push the boundaries of what AI can achieve.

My research interests lie in learning algorithms amid strategic agents. Instead of dealing with static agents and static data, we investigate scenarios where the learning agent cannot dictate the behavior of other “components” of the learning process. For example, when collecting data to train a predictor, how could the agent ensure that the data source truthfully reports the data?  Similar problems arise during training, when other agents have objectives that are not aligned with the ego agent. These strategic agents could adapt their actions according to our learning process, resulting in highly non-stationary learning environments. Michael I Jordan, Professor of Electrical Engineering and Computer Sciences and Professor of Statistics, UC Berkeley, has previously discussed here how machine learning blends with game theory for better alignment with practical problems.

My motivations for working on game theoretic settings of machine learning are twofold. The first is to better understand how learning algorithms perform under strategic environments. We aim to extend the results of analyzing algorithms on the single-agent setting to multi-agent settings. The second is to use the characterization of the games to understand the interaction among strategic agents (some open problems are described in detail in this preprint). This allows us to design better mechanisms for promoting cooperation, improving social welfare, reducing exploitation, and enhancing equality. During my visit, I was fortunate to work with Vector researchers Yaoliang Yu (my host), Pascal Poupart, and William Cunningham on several problems related to this topic.

No-regret learning dynamics in games

One of the most popular solution concepts in multi-agent settings is Nash equilibrium, which originates from game theory and describes the behaviors of rational, selfish players (see this article on algorithmic game theory for more context). It characterizes a stable state between the players, where no individual has an incentive to unilaterally deviate from their chosen strategy. We focus on the decentralized dynamic between players in repeated games with bandit feedback. In this scenario, over a time horizon, each player independently chooses a strategy at each step and gets cost feedback. Beyond the objective of arriving at a Nash equilibrium, a player should try to minimize the cost experienced while assuming the other players are adversarial. One notion that evaluates the performance of an algorithm under possible adversaries is regret, which originates from the online learning community, and algorithms that achieve sublinear regrets are termed no-regret algorithms. 

For general games, however, solving a Nash equilibrium is known to be PPAD-Hard. It has been established that computing it becomes more feasible in specific game contexts like potential games where a potential function can quantify how individual strategy changes affect collective utility. An important result that builds the connection between no-regret learning and Nash equilibrium is that the empirical frequency of the strategies played by a no-regret learning algorithm forms a correlated equilibrium. Yet even when the empirical frequency converges to an equilibrium, it is not guaranteed that the overall dynamics evolved to be stable at any equilibrium point. 

During my time at Vector, we worked on two types of games. The first type is potential game and its Markov potential game extension. In the work, we introduced a variant of the Frank-Wolfe algorithm with exploration and recursive gradient estimation, which converges to the Nash equilibrium fast (in the sense of last-iterate) and ensures each player has sublinear regret. Our algorithm takes advantage of the ability of the Frank-Wolfe algorithm to control the duality gap of the game, which can be shown to be an upper bound of the gap to a Nash equilibrium. This allows us to directly analyze the ability of Frank-Wolfe to solve Nash equilibriums, without using the previously mentioned connections between no-regret learning and equilibrium findings. To ensure efficient convergence to Nash equilibrium, we use a recursive gradient to reduce the estimation error. The result we obtain is upper bounds for both Nash regret and regret of O(T4/5), where T is the length of the time horizon (which implies a O(T-1/5) convergence rate). This result can be extended to Markov potential games, which have important applications like routing in traffic congestions.

Beyond the potential games, we extend our investigation to monotone and smooth games. This encapsulates a wider array of common games that are not captured by the class of potential games, such as two-player zero-sum games, convex-concave games, and zero-sum polymatrix games. Yet the convergence of no-regret algorithms to Nash equilibrium in monotone and smooth games is not available unless one assumes exact gradient feedback and coordination of players. In our work, we present an uncoupled mirror-descent-based algorithm designed to converge to the Nash equilibrium in general monotone and smooth games. To extend the classic mirror descent algorithm to a monotone game, we make use of two regularizers: a self-concordant barrier regularizer to build an efficient ellipsoidal gradient estimator and contest the bandit feedback; and a regularizer to accommodate monotone and not strongly monotone utility functions. In general monotone and smooth games, our algorithm achieves a last-iterate convergence rate of O(T-1/4). In cases where the game exhibits strong monotonicity, our result improves to O(T-1/2), matching the current best available convergence rates for strongly monotone games.

Information mechanism design among strategic agents

Real-world scenarios are mixed-motive where agents aim to advance their interests by shaping the behavior of others. Approaches typically achieve this through incentive mechanisms (modifying payoffs), information mechanisms (modifying observations), or indirect methods (such as reputation systems and institutions). The problem we investigate is information mechanism design, which plays a significant role in modern economies, with estimates suggesting to take 25% to 30% of GDP.

On the one hand, information design has great economic and societal value including daily life (financing, insurance, price discrimination, routing, entertainment), workspace issues (grading, research procurement, employee feedback, contests), and societal topics (law enforcement deployment, financial sector stress tests, voter coalition formation). On the other hand, these topics are under-investigated in existing studies. They are especially limited to specific modeling of the problems, such as converting them to matrix games.

With the rise of LLMs, it is now possible to use natural language to describe and interpret the task and to interact with other agents. This motivates us to verbalize Bayesian persuasion (BP), a canonical formulation in information design, and propose general-purpose solvers in LLMs. Specifically, we map classic BP to a verbalized mediator-augmented game and propose a generalized equilibrium-finding algorithm in the prompt space. This is the first time BP has been extended to real-world games involving human dialogues. A classic matrix game in BP called “recommendation letter” describes how a professor could use recommendation letters to persuade an HR to hire more students. Now, the professor in our framework writes an actual letter (see this letter on page 30 of our manuscript).

Powered by LLM, we also examined how BP differs in its equilibrium outcomes and real-life outcomes. As one might expect, persuasion (and cheap talk as well) is much more honest in real life. The first reason for this, which we proved in our manuscript, is that BP is reducible to a bargaining game. The equilibrium of the game, which corresponds to a subgame perfect equilibrium in bargaining, is commonly rejected by the receiver (the one who listens to the persuasion) with their retaliatory strategy. Such retaliation is possible as long as the receiver is aware of the game structure, which, in practice, is usually the case. The second reason is alignment, where accurately providing information is related to merits such as honesty and integrity. Third is institution, where receivers in practice are correlated through mechanisms like reputation systems. Interestingly, these are observed in our LLM experiments, where the persuasion behavior of LLMs showcases the emergence of bargaining in the process. The insights from our results suggest potential implications for reducing exploitation and enhancing equality for agents with informational disadvantage. We are also excited to further work on the persuasiveness of LLMs and their alignment.

Conclusion

During my time as a visiting researcher at Vector, I had the privilege of collaborating with a vibrant community of researchers. Whether it was regular in-person meetings, casual coffee chats, or focused discussions, the experience was both enjoyable and highly productive. I am grateful for the wonderful journey Vector enabled and the amazing accommodations made by the Vector staff and community. Although my visit has concluded, our joint efforts persist. I am particularly excited to continue exploring game-theoretic settings of learning algorithms, aiming to deepen our understanding of how learning agents behave and interact. By better understanding learning agents amid strategic environments, we can gain new insights into designing better algorithms and mechanisms for their societal and economic implications.

Collaborate with Vector researchers

Join a thriving community of over 860 researchers at the Vector Institute in Toronto. Our visiting researcher program offers state-of-the-art computing resources, dedicated AI engineering support, and rich collaborative opportunities for scholars on sabbatical or taking a gap year between academic milestones.

Related:

2025
AI Engineering
Research
Research 2025

State of Evaluation Study: Vector Institute Unlocks New Transparency in Benchmarking Global AI Models

Sriram Ganapathi Subramanian headshot blog cover
2025
Research
Research 2025

Real World Multi-Agent Reinforcement Learning – Latest Developments and Applications

2025
AI Engineering
News
Research
Research 2025

Principles in Action: Introducing the Vector Institute’s Playbook for Responsible AI Product Development