Discover more from Latent Space
The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Emad Mostaque, et al!
Over 31,000 subscribers
The Latent Space crew will be at NeurIPS on Tuesday! Reach out with any parties and papers of interest. We have also been incubating a smol daily AI Newsletter and Latent Space University is making progress.
If ChatGPT was introduced as a low-key research preview in Nov 2022, it’s 1 year anniversary was as high-key as it gets, starting with the first Dev Day (our coverage here) and ending with leadership drama that amounted to “agi delayed four days”.
We’re already tired of the nonstop coverage so we’ll refrain, but if you want good 1 year ChatGPT retrospectives we recommend those from our friends at ThursdAI and the AI Breakdown, as well as traditional media like the Verge.
A nice paper from NTU Singapore recaps the evolution in open source LLMs:
However one underrated perspective on ChatGPT impact we like comes from the slowly rising AI background noise we are all having to contend with on a day-to-day basis, going very rapidly from amusing to alarming.
It is likely that history will look back at Nov 2022 as the last time we were able to easily obtain low-background tokens:
After returning from the Manhattan Project, Willard Libby continued exploring the implications of radioactivity in the atmosphere, inventing radiocarbon dating in 1946, for which he received the 1960 Nobel Prize in Chemistry.
Libby immediately recognized that tools which measure very low levels of radioactivity must themselves be low-radioactivity, and that since the Trinity nuclear bomb in 1945, our atmosphere has had elevated levels of background radiation, causing high demand for pre-WW2 “low-background” steel for everything from Geiger counters to photonics equipment.
We can make a similar analogy for “low-background tokens”.
As LLMs became both viable and easily available, the amount of background AI-generated noise rose noticeably in the past year:
Elon Musk’s Grok AI, announced Nov 3rd, is among the latest of many LLMs trained post-ChatGPT that now exhibit influence from OpenAI output.
from X. The solution cannot simply be doing a “search and delete of all mentions of OpenAI”, because then you’d end up with a model that doesn’t know what OpenAI is.
Nov 2023 was also the month we started putting LLMs online – not merely adding Web Search (as was done in Sept), nor updating the knowledge cutoff (first to Jan 2022 then to Apr 2023) – but making live online search an integral part of the experience, with Grok boasting this feature and Perplexity releasing 7b and 70b online LLMs, referencing the FreshLLMs paper, closing the loop for AI consuming AI-created content in ever-faster fashion.
While rising AI concentrations in our content atmosphere is a concern, we should bear in mind that humans are perfectly capable of spouting industrial quantities of “complete nonsense” as well – this was also the month that Q* mass hysteria swept the AI-content-creator industrial complex, hallucinating days’ and books’ worth of video and essays based on a “leak” of a single letter codename, gleefully egged on by 4chan (We do take some pride in calling out the frontier of Long Inference a few months before it was cool, but not much).
While it may be impossible to screen out AI-generated content in massive datasets, many of us believe that we can still spot AI content in isolated, individual cases, particularly in 1:1 communication. It might be tempting to conclude that higher-bandwidth content, like voice and video, should be even easier to trip our AI spidey senses. However it might be counter-intuitively easier to be more convincing in those form factors, because there is more data that can be learned:
video example with pika labs. less alarming… but does show editing a real video
We are not far off from having to worry about background “radiation” in our video and audio content as well.
The last two generative AI trends this month were great illustrations of multimodal AI:
Consistency models: Realtime Stable Diffusion was teased in Jan/Feb of this year, but OpenAI only published the Consistency Models paper in March, quickly followed by LCM and LCM-LoRA (discussed in the Latent Space Discord Paper Club). This was the final breakthrough that enabled both Krea and TLDraw to put it in their products, enabled by fal.ai, which is now generating images at 10fps with (upcoming guest!) TLDraw.
Not to be outdone, Stability AI also released SDXL Turbo, which uses a different “Adversarial Diffusion Distillation” technique (paper here, claims to outperform LCMs) to do realtime images, in the same month.
GPT4V Coding: TLDraw was also at the heart of the other big trend of the month, TLDraw’s Make It Real. We cover this in further detail in our upcoming podcast with
, but you can read the synopsis on their Substack.
It was a packed month packed with a lot of drama and we are still recovering from the incredible pace of news. Personally, the conversations about the LLM OS and the June to November evolution from human to systems analogies (aka the Sour Lesson) have been the biggest personal perspective shift.
Nov 3 — Beating GPT-4 with Open Source LLMs — with Michael Royzen of Phind
Nov 7/8 — AGI is Being Achieved Incrementally (OpenAI DevDay Recap)
Full Duration (Live audio, a bit noisy)
Cleaned up Audio (Part 2 only: h/t to Klaus Breyer)
Nov 17 — The State of Silicon and the GPU Poors – with Dylan Patel of SemiAnalysis
Nov 18 — The End of OpenAI Hegemony
Nov 29 — Notebooks=Chat++ and RAG=RecSys! — with Bryan Bischof of Hex Magic
And last month’s recap if you missed it: The New Kings of Open Source AI (Oct 2023 Recap)
The raw notes from which I draw everything above. You can always see the raw material on Github and of course I can’t see everything, check the other newsletters and podcasters that do.
As always, not everything that we’ve taken note of will make it into this post. To see the entire list of notes for November, check out the ‘AI Notes’ github repo.
We won’t be covering the sama drama in this post. You all know what happened.
DevDay (Nov 6) – Opening Keynote is a must watch if you missed it
Announced 100m MAU, later revealed to be 14m DAU
The notable highlights include the release of Whisper v3, a new generation open-source ASR model, and the progress made in integrating OpenInterpreter. There was buzz around GPT-4 Turbo due to its 128k context, but with mixed reviews(1, 2) regarding its claimed superiority over GPT-4.
GPTs: Custom versions of ChatGPT are emerging. These were some notable ones we saw following the announcement. Greg Brockman’s highlight.
Typist for enhanced typing assistance
and DesignerGPT for web design
Simon Willison’s Dejargonizer and “ChatGPT Classic”
While they are exciting, jailbreaking them is easy. You can also download uploaded files causing Levels.fyi leaked data (response from founder).
Here are prompts for many many other GPTs.
JSON Mode
JSON mode integration has been a topic of interest, with the goal of improving (but not guaranteeing) schema matching.
‘functions’ parameter deprecated without much notice.
Assistants API – open source clone here
Nov 8: Major outages across ChatGPT and API
also
discussing GPT3.5 issues
Occasional reports of GPT4 nerfing from biased individuals and on the OpenAI discord
Nov 14: Sama paused ChatGPT+ signups due to demand – being sold on Ebay
Nov 30: Deno SDK
2 interesting factoids of upcoming capabilities:
improved memory in ChatGPT
will be releasing usage tracking based on API key for the OpenAI API
Claude 2.1: offers an industry-leading 200K token context window (over 500 pages), a 2x decrease in hallucination rates, system prompts, tool use, and updated pricing.
https://www.anthropic.com/index/claude-2-1
model card
$1k test showing declining utilization of the 200k context (“skill issue”)
Inflection 2
https://news.ycombinator.com/item?id=38380377
5,000 NVIDIA H100 GPUs in fp8 mixed precision for ~10²⁵ FLOPs. guess is 300b model on 5T tokens
slightly better than llama-2
1% of rumored GPT5 compute
Yarn-Mistral-7b-128k:
4x more context than GPT-4. Open-source is the new long-context king! This thing can easily fit entire books in a prompt.
tweet
Yi 01 ai 34B released
with 100B rumored soon
Orca 2 https://arxiv.org/abs/2311.11045
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval.
In Orca 2, we continue exploring how improved training signals can enhance smaller LMs’ reasoning abilities.
In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.).
Orca 2 significantly surpasses models of similar size and attains performance levels similar ****or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.
Amazon Mistral – 32k
Qwen-72B and Qwen-1.8B release: 32K context, trained on 3T tokens,
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://www.latent.space/i/139368545/the-concept-of-low-background-tokens