Data, data, everywhere - enough for AGI?

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

A tartalmat a Turpentine, Erik Torenberg, and Nathan Labenz biztosítja. Az összes podcast-tartalmat, beleértve az epizódokat, grafikákat és podcast-leírásokat, közvetlenül a Turpentine, Erik Torenberg, and Nathan Labenz vagy a podcast platform partnere tölti fel és biztosítja. Ha úgy gondolja, hogy valaki az Ön engedélye nélkül használja fel a szerzői joggal védett művét, kövesse az itt leírt folyamatot https://hu.player.fm/legal.

1M ago 1:01:40

MP3•Epizód kép

In this podcast, Nathan and Nick dive deep into the data requirements for achieving Artificial General Intelligence. They explore the current paradigms, the role of data in approximating intelligence, and the scaling trends for GPT models. The discussion covers various datasets, from email and Twitter to YouTube and genomic data, as they analyze the feasibility of reaching the target of 100 trillion high-quality tokens. While the bull case suggests an abundance of data, the bear case highlights the limits on high-quality data, prompting a fascinating exploration of what makes data good for AI and the potential for AI to generate its own data.

Chapters

(00:00) Introduction

(05:04) Scaling Hypothesis of Intelligence

(07:32) Is There Enough High Quality Data?

(10:19) Algorithms Impacting Data Requirements

(17:42) Sponsor : Omneky

(18:04) Estimating High Quality Token Requirements

(24:07) Astronomy and YouTube Data Scale

(29:42) Genomics Data

(37:58) Sponsors : Brave / Plumb / Squad

(41:16) Code Datasets and Synthetic Data

(45:48) The Bear Case: Quality and Usability of Data

(50:54) Investment Trends and Compute Efficiency

(54:19) Training Run

(57:21) Synthetic Data Generation and Self-Play

135 epizódok

#Tech #Society #Entrepreneur #Business #Turpentine #Erik Torenberg #Nathan Labenz #AI #Artificial Intelligence #Founders