Gerald Tesauro

Gerald Tesauro
Gerald Tesauro
Nationality	American
Alma mater	University of Maryland, College Park (B.S. Physics); Princeton University (Ph.D. Physics, 1986)
Known for	TD-Gammon, IBM Watson
Awards	Hertz Foundation Fellow (1980); Fellow of the AAAI (2013); Fellow of the ACM (2018)
	Scientific career
Fields	Artificial neural network, Reinforcement learning, Autonomic computing
Institutions	IBM Research, University of Illinois Urbana-Champaign (postdoc)
Thesis	Steady-State Dynamics and Selection Principles in Nonequilibrium Pattern-Forming Systems (1986)
Doctoral advisor	Philip W. Anderson, Michael C. Cross

Gerald J. "Gerry" Tesauro is an American computer scientist and a researcher at IBM, known for his development of TD-Gammon, a backgammon program that taught itself to play at a world-championship level through self-play and temporal difference learning, an early success in reinforcement learning and neural networks. He subsequently researched on autonomic computing, multi-agent systems for e-commerce, and contributed to the game strategy algorithms for IBM Watson.

Career

Education

Tesauro earned a B.S. in physics from the University of Maryland, College Park. He then pursued graduate studies in plasma physics at Princeton University, supported by a Hertz Foundation Fellowship starting in 1980.^[1] He completed his Ph.D. in theoretical physics in 1986 under the supervision of Nobel laureate Philip W. Anderson.^[2]

Backgammon

After completing his Ph.D., he undertook postdoctoral research at the Center for Complex Systems Research, University of Illinois at Urbana-Champaign.^[3]^[4] During this period, he began applying neural networks to games, co-authoring a NeurIPS paper in 1987 with Terrence Sejnowski on a neural network that learned to play backgammon.^[5] By the late 1980s, Tesauro joined IBM's Thomas J. Watson Research Center (IBM Research) as a research scientist, where he would spend several decades, eventually rising to the position of Principal Research Staff Member in AI Science.^[1]

During late 1980s, he developed Neurogammon, a backgammon program trained on expert human games using supervised learning. Neurogammon won the backgammon tournament at the 1st Computer Olympiad in 1989, demonstrating the potential of neural networks in game AI.^[3]

He developed TD-Gammon during the 1990 to 1998 period, using reinforcement learning, specifically temporal-difference (TD) learning. TD-Gammon learned through self-play, using a neural network to evaluate board positions and improving its strategy over millions of games. The program achieved world-championship-level play, capable of challenging top human players.^[6] It is often regarded as an early success of neural networks, machine learning, and RL, and often cited as a precursor in publications on later game-playing systems, such as AlphaZero.^[7]

During this period, Tesauro also contributed to computer chess research at IBM, exploring machine learning methods for training evaluation functions, although the main Deep Blue project was led by others. Specifically, some linear evaluation function weights were trained by discretized comparison training.^[8] The weights primarily evaluated king safety.^[9] Since 2010, he also contributed to computer Go by working on a program called Fuego.^[3]

E-commerce

In the late 1990s, Tesauro shifted his focus towards multi-agent systems and their application in e-commerce, such as autonomous "pricebots", which are software agents designed to learn optimal pricing and bidding strategies in electronic marketplaces.^[10] Methods included Q-learning for dynamic pricing strategies (e.g., cooperation or undercutting) in competitive environments.^[11]^[12] It was an early application of multi-agent reinforcement learning to economic modeling and automated trading. He also explored applying neural networks to computer virus detection.^[13]

Autonomic computing

From the early 2000s, Tesauro became a key contributor to IBM's autonomic computing initiative, which aimed to create self-managing IT systems. He applied reinforcement learning to automate tasks like resource allocation, performance tuning, and power management in data centers and distributed systems. Examples include multiple cooperating RL agents that learned to optimize server resources (CPU, memory, power) to meet performance goals or minimize energy consumption.^[14]^[15]^[16]^[17]

Tesauro is listed as an inventor on numerous U.S. patents, largely focused on autonomic computing and AI applications for systems management, filed primarily between 2004 and 2007. These usually included methods for reward-based learning of system policies, utility-based dynamic resource allocation, and autonomic model transfer in computing systems.^[18]

IBM Watson

Around 2009, Tesauro joined the IBM Research team, led by David Ferrucci,^[3] that developed IBM Watson, the question-answering system famous for defeating human champions Ken Jennings and Brad Rutter on the quiz show Jeopardy! in 2011.

Tesauro focused on Watson's game strategy components, including algorithms for buzzer timing, clue selection, and wagering decisions (especially for Daily Doubles and Final Jeopardy!). He and colleagues developed a Game State Evaluator and used simulation-based optimization, employing techniques from Bayesian inference, game theory, dynamic programming, and reinforcement learning to refine Watson's strategic play. These strategic algorithms contributed significantly to Watson's success, enabling it to manage risk effectively and make near-optimal wagering decisions.^[19]^[20]^[21]

During this time, Tesauro also continued research in core AI algorithms, co-authoring a paper on Monte Carlo Simulation Balancing with David Silver (later of DeepMind) at ICML 2009.^[22] After Watson, Tesauro continued research at IBM, on areas such as deep reinforcement learning,^[23] hierarchical RL, multi-agent systems,^[24] and continual learning.^[25]

Honors and awards

Hertz Foundation Fellow (Class of 1980)^[1]
Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), elected 2013, "For significant contributions to neural computation, game-playing (Backgammon, Chess and Jeopardy!), autonomic computing, and economic agents."^[26]
Fellow of the Association for Computing Machinery (ACM), elected 2018, "for contributions to reinforcement learning, neural networks, and intelligent autonomous agents."^[27]

References

^ ^a ^b ^c "Gerald Tesauro". Hertz Foundation. Retrieved 2025-05-12.
^ https://mail.mathgenealogy.org/id.php?id=268642
^ ^a ^b ^c ^d "Gerald Tesauro - Chess Programming Wiki". www.chessprogramming.org. Retrieved 2025-05-12.
^ Tesauro, Gerald. Neural Network Defeats Creator in Backgammom Match. Center for Complex Systems Research, University of Illinois, 1988.
^ Tesauro, Gerald; Sejnowski, Terrence J. (1987). "A 'neural' network that learns to play backgammon". In Anderson, Dana Z. (ed.). Neural information processing systems. Neural Information Processing Systems. pp. 794–803. ISBN 0-88318-569-5. Archived from the original (PDF) on 2003-07-02.
^ Tesauro, Gerald (2002). "Programming backgammon using self-teaching neural nets". Artificial Intelligence. 134 (1–2): 181–199. doi:10.1016/S0004-3702(01)00110-2.
^ Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George (October 2017). "Mastering the game of Go without human knowledge". Nature. 550 (7676): 354–359. Bibcode:2017Natur.550..354S. doi:10.1038/nature24270. ISSN 1476-4687. PMID 29052630.
^ Tesauro, Gerald (1988). "Connectionist Learning of Expert Preferences by Comparison Training". Advances in Neural Information Processing Systems. 1. Morgan-Kaufmann.
^ Tesauro, Gerald (2001-01-01), "Comparison training of chess evaluation functions", Machines that learn to play games, USA: Nova Science Publishers, Inc., pp. 117–130, ISBN 978-1-59033-021-0
^ Greenwald, Amy R.; Kephart, Jeffrey O.; Tesauro, Gerald J. (November 1999). "Strategic pricebot dynamics". Proceedings of the 1st ACM conference on Electronic commerce. ACM. pp. 58–67. doi:10.1145/336992.337008. ISBN 978-1-58113-176-5.
^ Tesauro, Gerald; Kephart, Jeffrey O. (2002-09-01). "Pricing in Agent Economies Using Multi-Agent Q-Learning". Autonomous Agents and Multi-Agent Systems. 5 (3): 289–304. doi:10.1023/A:1015504423309. ISSN 1573-7454.
^ Tesauro, Gerald J.; Kephart, Jeffrey O. (March 2000). "Foresight-based pricing algorithms in agent economies". Decision Support Systems. 28 (1–2): 49–60. doi:10.1016/S0167-9236(99)00074-3.
^ Tesauro, Gerald; Kephart, Jeffrey O.; Sorkin, Gregory B. (1997). "Neural networks for computer virus recognition". IEEE Expert: Intelligent Systems and Their Applications. 11 (4): 5–6. doi:10.1109/64.511768.
^ Kephart, Jeffrey O.; Chan, Hoi; Das, Rajarshi; Levine, David W.; Tesauro, Gerald; Rawson, Freeman; Lefurgy, Charles (June 2007). "Coordinating Multiple Autonomic Managers to Achieve Specified Power-Performance Tradeoffs". Fourth International Conference on Autonomic Computing (ICAC'07). p. 24. doi:10.1109/ICAC.2007.12. ISBN 978-0-7695-2779-6.
^ Tesauro, Gerald (2007). "Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies". IEEE Internet Computing. 11 (1): 22–30. doi:10.1109/MIC.2007.21. ISSN 1089-7801.
^ Tesauro, G.; Jong, N.K.; Das, R.; Bennani, M.N. (2006). "A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation". 2006 IEEE International Conference on Autonomic Computing. IEEE. pp. 65–73. doi:10.1109/ICAC.2006.1662383. ISBN 978-1-4244-0175-8.
^ Tesauro, G.; Das, R.; Walsh, W.E.; Kephart, J.O. (2005). "Utility-Function-Driven Resource Allocation in Autonomic Systems". Second International Conference on Autonomic Computing (ICAC'05). IEEE. pp. 342–343. doi:10.1109/ICAC.2005.65. ISBN 0-7965-2276-9. {{cite book}}: Check |isbn= value: checksum (help)
^ "Patents by Inventor Gerald Tesauro". Justia Patents. Retrieved 2025-05-12.
^ "ISR Distinguished Lecture: How Watson Learns Superhuman Jeopardy! Strategies". Institute for Systems Research, University of Maryland. 2011-11-07. Retrieved 2025-05-12.
^ Tesauro, G.; Gondek, D. C.; Lenchner, J.; Fan, J.; Prager, J. M. (2013-05-31). "Analysis of Watson's Strategies for Playing Jeopardy!". Journal of Artificial Intelligence Research. 47: 205–251. arXiv:1402.0571. doi:10.1613/jair.3834. ISSN 1076-9757.
^ "Day 2 of the Watson Challenge… and the Daily Double Controversy?". Howell's Gameshow Gallery. 2011-02-16. Retrieved 2025-05-12.
^ Silver, David; Tesauro, Gerald (2009-06-14). "Monte-Carlo simulation balancing". Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 945–952. doi:10.1145/1553374.1553495. ISBN 978-1-60558-516-1.
^ Machado, Marlos C.; Rosenbaum, Clemens; Guo, Xiaoxiao; Riemer, Matthew; Tesauro, Gerald; Campbell, Murray (2018). "Eigenoption Discovery through the Deep Successor Representation". International Conference on Learning Representations (ICLR) 2018. https://iclr.cc/Conferences/2018. {{cite conference}}: |conference-url= missing title (help)
^ Kim, Dong-Ki; Liu, Miao; Riemer, Matthew; Sun, Chuangchuang; Abdulhai, Marutaro; Habibi, Golnaz; Srinivasan, Vikram; Tesauro, Gerald; How, Jonathan P. (2022). "Influencing Long-Term Behavior in Multiagent RL". In Oh, Alice H. (ed.). Advances in Neural Information Processing Systems (NeurIPS) 35. pp. 31914–31927.
^ Riemer, Matthew; Cases, Ignacio; Ajemian, Robert; Liu, Miao; Rish, Irina; Tu, Yuhai; Tesauro, Gerald (2019). "Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference". International Conference on Learning Representations (ICLR) 2019. https://iclr.cc/Conferences/2019. {{cite conference}}: |conference-url= missing title (help)
^ "Elected AAAI Fellows". AAAI. Retrieved 2025-05-12.
^ "Dr. Gerald Tesauro". awards.acm.org. Retrieved 2025-05-12.

External links

Gerald Tesauro page on Chess Programming Wiki.
Gerald Tesauro bibliography at DBLP.

[hertz-1] "Gerald Tesauro". Hertz Foundation. Retrieved 2025-05-12.

[2] ttps://mail.mathgenealogy.org/id.php?id=268642

[chessprogramming_bio-3] "Gerald Tesauro - Chess Programming Wiki". www.chessprogramming.org. Retrieved 2025-05-12.

[4] Tesauro, Gerald. Neural Network Defeats Creator in Backgammom Match. Center for Complex Systems Research, University of Illinois, 1988.

[5] Tesauro, Gerald; Sejnowski, Terrence J. (1987). "A 'neural' network that learns to play backgammon". In Anderson, Dana Z. (ed.). Neural information processing systems. Neural Information Processing Systems. pp. 794–803. ISBN 0-88318-569-5. Archived from the original (PDF) on 2003-07-02.

[aij2002-6] Tesauro, Gerald (2002). "Programming backgammon using self-teaching neural nets". Artificial Intelligence. 134 (1–2): 181–199. doi:10.1016/S0004-3702(01)00110-2.

[7] Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George (October 2017). "Mastering the game of Go without human knowledge". Nature. 550 (7676): 354–359. Bibcode:2017Natur.550..354S. doi:10.1038/nature24270. ISSN 1476-4687. PMID 29052630.

[8] Tesauro, Gerald (1988). "Connectionist Learning of Expert Preferences by Comparison Training". Advances in Neural Information Processing Systems. 1. Morgan-Kaufmann.

[9] Tesauro, Gerald (2001-01-01), "Comparison training of chess evaluation functions", Machines that learn to play games, USA: Nova Science Publishers, Inc., pp. 117–130, ISBN 978-1-59033-021-0

[10] Greenwald, Amy R.; Kephart, Jeffrey O.; Tesauro, Gerald J. (November 1999). "Strategic pricebot dynamics". Proceedings of the 1st ACM conference on Electronic commerce. ACM. pp. 58–67. doi:10.1145/336992.337008. ISBN 978-1-58113-176-5.

[11] Tesauro, Gerald; Kephart, Jeffrey O. (2002-09-01). "Pricing in Agent Economies Using Multi-Agent Q-Learning". Autonomous Agents and Multi-Agent Systems. 5 (3): 289–304. doi:10.1023/A:1015504423309. ISSN 1573-7454.

[12] Tesauro, Gerald J.; Kephart, Jeffrey O. (March 2000). "Foresight-based pricing algorithms in agent economies". Decision Support Systems. 28 (1–2): 49–60. doi:10.1016/S0167-9236(99)00074-3.

[virus1997-13] Tesauro, Gerald; Kephart, Jeffrey O.; Sorkin, Gregory B. (1997). "Neural networks for computer virus recognition". IEEE Expert: Intelligent Systems and Their Applications. 11 (4): 5–6. doi:10.1109/64.511768.

[14] Kephart, Jeffrey O.; Chan, Hoi; Das, Rajarshi; Levine, David W.; Tesauro, Gerald; Rawson, Freeman; Lefurgy, Charles (June 2007). "Coordinating Multiple Autonomic Managers to Achieve Specified Power-Performance Tradeoffs". Fourth International Conference on Autonomic Computing (ICAC'07). p. 24. doi:10.1109/ICAC.2007.12. ISBN 978-0-7695-2779-6.

[15] Tesauro, Gerald (2007). "Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies". IEEE Internet Computing. 11 (1): 22–30. doi:10.1109/MIC.2007.21. ISSN 1089-7801.

[16] Tesauro, G.; Jong, N.K.; Das, R.; Bennani, M.N. (2006). "A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation". 2006 IEEE International Conference on Autonomic Computing. IEEE. pp. 65–73. doi:10.1109/ICAC.2006.1662383. ISBN 978-1-4244-0175-8.

[17] Tesauro, G.; Das, R.; Walsh, W.E.; Kephart, J.O. (2005). "Utility-Function-Driven Resource Allocation in Autonomic Systems". Second International Conference on Autonomic Computing (ICAC'05). IEEE. pp. 342–343. doi:10.1109/ICAC.2005.65. ISBN 0-7965-2276-9. {{cite book}}: Check |isbn= value: checksum (help)

[patent1-18] "Patents by Inventor Gerald Tesauro". Justia Patents. Retrieved 2025-05-12.

[isr_umd-19] "ISR Distinguished Lecture: How Watson Learns Superhuman Jeopardy! Strategies". Institute for Systems Research, University of Maryland. 2011-11-07. Retrieved 2025-05-12.

[20] Tesauro, G.; Gondek, D. C.; Lenchner, J.; Fan, J.; Prager, J. M. (2013-05-31). "Analysis of Watson's Strategies for Playing Jeopardy!". Journal of Artificial Intelligence Research. 47: 205–251. arXiv:1402.0571. doi:10.1613/jair.3834. ISSN 1076-9757.

[gameshow_watson-21] "Day 2 of the Watson Challenge… and the Daily Double Controversy?". Howell's Gameshow Gallery. 2011-02-16. Retrieved 2025-05-12.

[22] Silver, David; Tesauro, Gerald (2009-06-14). "Monte-Carlo simulation balancing". Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 945–952. doi:10.1145/1553374.1553495. ISBN 978-1-60558-516-1.

[iclr2018_eigen-23] Machado, Marlos C.; Rosenbaum, Clemens; Guo, Xiaoxiao; Riemer, Matthew; Tesauro, Gerald; Campbell, Murray (2018). "Eigenoption Discovery through the Deep Successor Representation". International Conference on Learning Representations (ICLR) 2018. https://iclr.cc/Conferences/2018. {{cite conference}}: |conference-url= missing title (help)

[neurips2022_influence-24] Kim, Dong-Ki; Liu, Miao; Riemer, Matthew; Sun, Chuangchuang; Abdulhai, Marutaro; Habibi, Golnaz; Srinivasan, Vikram; Tesauro, Gerald; How, Jonathan P. (2022). "Influencing Long-Term Behavior in Multiagent RL". In Oh, Alice H. (ed.). Advances in Neural Information Processing Systems (NeurIPS) 35. pp. 31914–31927.

[iclr2019_forget-25] Riemer, Matthew; Cases, Ignacio; Ajemian, Robert; Liu, Miao; Rish, Irina; Tu, Yuhai; Tesauro, Gerald (2019). "Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference". International Conference on Learning Representations (ICLR) 2019. https://iclr.cc/Conferences/2019. {{cite conference}}: |conference-url= missing title (help)

[26] "Elected AAAI Fellows". AAAI. Retrieved 2025-05-12.

[27] "Dr. Gerald Tesauro". awards.acm.org. Retrieved 2025-05-12.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]