Training AI Using ‘Pirated’ Content Can Be Fair Use, Law Professors Argue

01011@monero.town · 2 days ago

Training AI Using ‘Pirated’ Content Can Be Fair Use, Law Professors Argue

Sternhammer@aussie.zone · 11 hours ago

LLMs don’t ’remember the entire contents of each book they read’. The data are used to train the LLMs predictive capabilities for sequences of words (or more accurately, tokens). In a sense, it develops of lossy model of its training data not a literal database. LLMs use a stochastic process which means you’ll get different results each time you ask any given question, not deterministic regurgitation of ‘read texts’. This is why it’s a transformative process and also why LLMs can hallucinate nonsense.

This stuff is counter-intuitive. Below is a very good, in-depth explanation that really helped me get a sense of how these things work. Highly recommended if you can spare the 3 hours (!):

https://www.youtube.com/watch?v=7xTGNNLPyMI&list=PLMtPKpcZqZMzfmi6lOtY6dgKXrapOYLlN

Training AI Using ‘Pirated’ Content Can Be Fair Use, Law Professors Argue

Training AI Using ‘Pirated’ Content Can Be Fair Use, Law Professors Argue

Training AI Using 'Pirated' Content Can Be Fair Use, Law Professors Argue * TorrentFreak