
Zusammenfassungen
The prevailing methods to make large language models more powerful and amenable have been based on continuous scaling up (that is, increasing their size, data volume and computational resources1) and bespoke shaping up (including post-filtering2,3, fine tuning or use of human feedback4,5). However, larger and more instructable large language models may have become less reliable. By studying the relationship between difficulty concordance, task avoidance and prompting stability of several language model families, here we show that easy instances for human participants are also easy for the models, but scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors. We also find that early models often avoid user questions but scaled-up, shaped-up models tend to give an apparently sensible yet wrong answer much more often, including errors on difficult questions that human supervisors frequently overlook. Moreover, we observe that stability to different natural phrasings of the same question is improved by scaling-up and shaping-up interventions, but pockets of variability persist across difficulty levels. These findings highlight the need for a fundamental shift in the design and development of general-purpose artificial intelligence, particularly in high-stakes areas for which a predictable distribution of errors is paramount.
Von Lexin Zhou, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri & José Hernández-Orallo im Text Larger and more instructable language models become less reliable (2024)
Dieser wissenschaftliche Zeitschriftenartikel erwähnt ...
![]() Personen KB IB clear | Sandhini Agarwal , Dario Amodei , Amanda Askell , Maria Bannert , Christopher Berner , Tamay Besiroglu , Tom B. Brown , Mark Chen , Szu Yu Chen , Benjamin Chess , Rewon Child , Jack Clark , M. J. Crockett , Daryna Dementieva , Kewal Dhariwal , Prafulla Dhariwal , Frank Fischer , Urs Gasser , Scott Gray , Georg Groh , Stephan Günnemann , Lennart Heim , Tom Henighan , Ariel Herbert-Voss , Christopher Hesse , Anson Ho , Marius Hobbhahn , Eyke Hüllermeier , Jared Kaplan , Gjergji Kasneci , Enkelejda Kasneci , Gretchen Krueger , Stephan Krusche , Stefan Küchemann , Jochen Kuhn , Gitta Kutyniok , Mateusz Litwin , Benjamin Mann , Sam McCandlish , Lisa Messer , Tilman Michaeli , Arvind Neelakantan , Claudia Nerdel , OpenAI , Jürgen Pfeffer , Oleksandra Poquet , Alec Radford , Aditya Ramesh , Nick Ryder , Michael Sailer , Girish Sastry , Kevin Schaul , Albrecht Schmidt , Tina Seidel , Kathrin Sessler , Jaime Sevilla , Pranav Shyam , Eric Sigler , Matthias Stadler , Melanie Subbiah , Ilya Sutskever , Nitasha Tiku , Pablo Villalobos , Jochen Weller , Clemens Winter , Jeffrey Wu , Daniel M. Ziegler | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Begriffe KB IB clear | ![]() ![]() ![]() ![]() ![]() ![]() ![]() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Bücher |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Texte |
|
Dieser wissenschaftliche Zeitschriftenartikel erwähnt vermutlich nicht ... 
![]() Nicht erwähnte Begriffe | Chat-GPT, Generative Pretrained Transformer 3 (GPT-3), GMLS & Bildung |
Tagcloud
Zitationsgraph
Zitationsgraph (Beta-Test mit vis.js)
Volltext dieses Dokuments
![]() | ![]() ![]() ![]() ![]() ![]() |
Anderswo suchen 
Beat und dieser wissenschaftliche Zeitschriftenartikel
Beat hat Dieser wissenschaftliche Zeitschriftenartikel erst in den letzten 6 Monaten in Biblionetz aufgenommen. Er hat Dieser wissenschaftliche Zeitschriftenartikel einmalig erfasst und bisher nicht mehr bearbeitet. Beat besitzt kein physisches, aber ein digitales Exemplar. Eine digitale Version ist auf dem Internet verfügbar (s.o.). Es gibt bisher nur wenige Objekte im Biblionetz, die dieses Werk zitieren.