AI Daily — 2026-05-20

English 中文

General-Purpose AI Solves Major Open Mathematical Problem · Gemini 3.5 Flash Claims Superior Codi...

Covering 33 AI news items

🔥 Top Stories

1. General-Purpose AI Solves Major Open Mathematical Problem

A general-purpose AI model reportedly solved a major open problem in mathematics, marking a milestone for AI-driven discovery. The post expresses excitement about AI expanding our understanding, while noting mixed feelings. Timothy Gowers warns mathematicians to be prepared before reading further. Source-twitter

2. Gemini 3.5 Flash Claims Superior Coding and Speed

Google’s Gemini 3.5 Flash markets significant gains over 3.1 Pro in coding and agentic tasks. It claims up to 4x faster performance than frontier models and 12x faster in Antigravity, at 800 tokens/sec and lower cost. The post promotes trying it via Antigravity, GeminiApp, and related channels. Source-twitter

3. AI Solves Erdős Unit Distance Problem, Disproves Conjecture

An OpenAI model is claimed to have solved the long-standing unit distance problem, one of Erdős’s best-known questions in discrete geometry. The development purportedly disproves a central conjecture in the field, marking a notable milestone for AI-assisted mathematics. The claim circulated via social media, highlighting OpenAI’s involvement. Source-twitter

📰 Featured

AI Tools

Google’s AI Studio Mobile coming to app stores — Google announces AI Studio Mobile, a mobile version of its AI studio platform that lets users build ideas on the go. It is coming soon to app stores and includes features like HLS playback. Source-twitter

LLM

Anthropic pays SpaceX $1.25B monthly for compute — Anthropic is reportedly paying SpaceX about $1.25 billion per month to access compute resources. The arrangement underscores the scale of hardware and cloud compute required for large AI models and highlights SpaceX’s role in supporting AI infrastructure. This deal signals the ongoing trend of enormous compute partnerships behind cutting-edge AI development. Source-twitter
Qwen3.7 Max tops AI benchmark; 27B/35B waitlist — Qwen 3.7 Max ranks 5th in Artificial Analysis’s benchmark, roughly matching GPT-5.4 (xhigh) and edging out Gemini 3.5 Flash. DSV4 Flash and Qwen3.6 27B sit about six points behind the max version. The report also notes a waiting room for 27B/35B models. Source-reddit
obra/superpowers: Open-source agentic coding framework — obra/superpowers is an open-source software development methodology for coding agents, built on composable skills and starter instructions. It guides users to define their goal, tease a design spec, and deliver an implementation plan suitable for junior engineers. Quickstart lists integrations with Claude Code, Codex CLI, Gemini CLI, and other AI tools to support development workflows. Source-github
Free Claude Code proxy enables multi-backend routing — A drop-in proxy for Claude Code’s Anthropic API calls now supports ten provider backends, allowing per-model routing and native model discovery via /v1/models. It routes traffic to providers like NVIDIA NIM, Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, Ollama, OpenCode Zen, and Z.ai while preserving Claude Code’s client protocol. The tool targets developers using Claude Code CLI, VS Code, JetBrains ACP, or chatbots with Anthropic-compatible proxies, enabling free, paid, or local models. Source-github
Cohere Unveils Command A+ Open-Weights MoE Model — Cohere announced Command A+, the first mixture-of-experts (MoE) model in their Command series with open weights under Apache 2.0. The model emphasizes efficiency and fast responsiveness, with quantization enabling deployment on just 1-2 GPUs and a focus on practical use for small teams and developers building Cohere-powered agents. Source-reddit
Qwen 3.6 35B GGUF: NTP vs MTP Quantization Across GPUs and CPUs — ByteShape released Qwen 3.6 35B GGUF quantizations in two families: NTP and MTP. For NTP, the largest quant that fits performed well, and lower bpw wasn’t automatically better; MTP provided GPU speedups around 20-40% but increased memory footprint, limiting what fits, and CPU MTP was not attractive. MMLU was excluded due to answer-format issues, and the release emphasizes hardware benchmarking over a simple model drop. Source-reddit
HalBench Benchmarks 4 Frontier LLMs on Sycophancy and Hallucination — HalBench is presented as an open benchmark for measuring LLMs’ propensity for sycophancy and hallucination. It evaluates four frontier models—Sonnet 4.6, Grok 4.3, GPT-5.4, and Gemini 3.1 Pro—across 3,200 false-premise prompts (12,800 responses), with 100 items human-validated. The dataset, space, and code are open, and the author seeks input on which OSS models to run next, noting Sonnet leads while Gemini and GPT-5.4 trail. Source-reddit
ik_llama.cpp with MTP beats llama.cpp on limited VRAM — A Reddit user reports that llama.cpp’s MTP performance regressed after a merged PR, and demonstrates that ik_llama.cpp with MTP delivers substantially higher throughput on an RTX 4070 12GB. Benchmarks using Qwen3.6-35B-A3B-IQ4_XS-4.19bpw.gguf show tok/s around 100–122 across tasks, highlighting practical speed gains from ik_llama.cpp. Source-reddit
LM Studio adds MTP Speculative Decoding support — LM Studio’s 0.4.14 Build 2 (Beta) update enables MTP Speculative Decoding for compatible models. Users must use the llama.cpp engine 2.15.0 and manually enable MTP by selecting ‘Manually choose model load parameters’ before loading the model, as it is not on by default. The update announcement was shared by Reddit user /u/pigeon57434 in the LocalLLaMA community. Source-reddit
LLama.cpp PR updates backend sampling for MTP draft path — A pull request in the ggml-org/llama.cpp repository introduces backend sampling for the MTP draft path, aiming to improve performance. The PR #23287 was submitted by user jacek2023 and discussed on Reddit by gaugarg-nv. This update targets performance optimizations within the MTP workflow of the project. Source-reddit
CohereLabs Command-A+ 05-2026 bf16 on Hugging Face — A Reddit post highlights CohereLabs’ command-a-plus-05-2026-bf16 model available on Hugging Face. The post, submitted by user /u/coder543, links to the model page and discussion, noting the bf16 variant. Source-reddit

AGI

Sam Altman: AGI accelerates research, companies, and personal AI — Sam Altman frames AGI as accelerating three areas: research, companies, and personal AGI. He notes a ‘unit distance’ result and announces a plan to invest $2M in OpenAI credits for every YC company, signaling strong support for AI-enabled startups. He calls for greater focus on personal AGI to help individuals achieve their goals. Source-twitter

AI

rtk-ai/rtk CLI proxy cuts LLM token usage 60-90% — rtk is a Rust-based CLI proxy that compresses command outputs before they reach the LLM context, cutting token usage by 60-90% on common development commands. It ships as a single binary with 100+ supported commands and cross-language documentation, designed to speed up LLM interactions. The project is open-source on GitHub under rt k-ai/rtk. Source-github

Google Gemini

Google Gemini Tool Lineup and Rebranding Roundup — A tongue-in-cheek thread maps Google Gemini products to various AI tools, highlighting deprecations and renamed offerings (e.g., Gemini Pro/Ultra, AI Studio, Antigravity CLI) and related IDEs, agents, and notebooks. It catalogs what to use for coding, video, images, search, and research as Gemini evolves. Source-twitter

Industry

DeepSeek to Build Code Harness, Hiring in Beijing — DeepSeek announced the formation of a new Harness team to build Code Harness, a ground-up tooling effort for AI research and product integration. The company posted two open roles—Harness Product Manager and Harness R&D Engineer—based in Beijing. The post ties the project to broader debates about active learning and interactive prompting. Source-twitter

Multimodal

Vision Speaks for Sound: Multimodal AI’s Audio Misread — Despite rapid progress in video-capable multimodal LLMs, researchers find that their audio understanding in videos is often inferred from visual cues rather than verified. The issue extends across both open-source omni models and closed-source offerings from Google and OpenAI, described as an audio-visual Clever Hans effect where models appear to ‘hear’ without real audio verification. This highlights a fundamental misalignment between audio and visual processing in current AI systems. Source-huggingface

RL

GoLongRL Unveils Open-Source Long-Context RL with RLVR — GoLongRL introduces a fully open-source, capability-oriented approach to long-context reinforcement learning with verifiable rewards (RLVR). It argues that prior long-context RL relies on complex retrieval paths, leading to uneven task coverage and misaligned rewards. The authors outline two contributions, including capability-oriented data construction and open release for reproducibility. Source-huggingface

AI Safety

OpenComputer Enables Verifiable Worlds for Computer-Use Agents — OpenComputer is a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. It integrates four components: app-specific state verifiers, a self-evolving verification layer, a task-generation pipeline for realistic desktop tasks, and an evaluation harness. Source-huggingface

Open Source

ViMax: All-in-One AI Video Generator — ViMax is an open-source project from HKUDS that positions itself as an end-to-end AI video solution. It aims to automate scriptwriting, storyboarding, character creation, and final video generation to overcome common issues like short clips, inconsistent continuity, and lack of narrative depth in current tools. Detailed demos and architecture are available on its GitHub page. Source-github

⚡ Quick Bites

Ex-Board Talk: Anthropic Declined; Ideology Won’t Survive AI — The author recalls Anthropic discussing a potential board seat, which he declined as he didn’t think he’d be a good fit. He mentions sending Aristotle’s Politics to Dario Amodei and Daniela, though they may not have read it. The post also cites Dario Amodei’s view that ideology won’t survive the realities of AI. Source-twitter
Active Learners Improve PRP Reranking Efficiency — Pairwise Ranking Prompting (PRP) collects pairwise preferences from LLMs to form a ranking, but judgments are noisy, order-sensitive, and sometimes intransitive, so standard sorting can fail for top-K. The authors propose reframing PRP reranking as active learning from noisy pairwise comparisons, aiming to improve efficiency and reliability under budget constraints. Source-huggingface
Anti-Self-Distillation for Reasoning RL via PMI — The article discusses on-policy self-distillation, where a student is guided by a copy of itself conditioned on privileged context, to improve reasoning without a stronger external teacher. While the approach shows promise for advancing reasoning capabilities, gains in math reasoning are inconsistent. A pointwise mutual information analysis links the failure to the privileged context, which inflates the teacher’s confidence on tokens already exposed. Source-huggingface
Multica Unveils Karpathy-Inspired Claude Code Guidelines — Multica AI released a CLAUDE.md file that distills Andrej Karpathy-inspired guidelines to improve Claude Code behavior. The guidance highlights common LLM coding pitfalls—overcomplicated code, missed clarifications, and failure to surface inconsistencies—and promotes cleaner, more robust code. The project is part of Multica’s open-source platform for running and managing coding agents with reusable skills. Source-github
HuggingFace datasets now filterable by model size — HuggingFace’s benchmark datasets now support filtering by model size, enabling easier comparisons for models under 32B on benchmarks such as swebenchverified. The feature was highlighted in a Reddit post linking to the official dataset page sorted by trending. This update improves accessibility for researchers evaluating small-model performance on benchmark datasets. Source-reddit
Codex hailed as team-driven success; ajambrosino named driving force — A tweet credits Codex to a collaborative team effort and singles out @ajambrosino as the driving force behind the Codex app. The author praises the team’s collaboration and suggests Codex is an example others want to emulate, noting that they’re barely getting started. Source-twitter
Gemma 4 MTP: Work in Progress Release — Gemma 4 MTP is a work-in-progress project shared by Reddit user u/am17an. The post notes that it must be compiled by the user and may not function reliably yet. It was submitted to the r/LocalLLaMA community by user u/jacek2023. Source-reddit
AMD Ryzen AI Halo PC Priced at $3999 with 128GB RAM — A Reddit post claims AMD’s Ryzen AI Halo PC will cost $3999 and come with 128GB of memory on board. The report, posted on the LocalLLaMA subreddit by user /u/Mochila-Mochila, highlights a high-end AI-focused PC configuration. Source-reddit
Critics: AI Isn’t General Intelligence; Just Brute-Force Reading — A Twitter user argues that AI isn’t generally intelligent, claiming it merely reads every book and paper and makes connections between them. The post describes AI as thinking for twenty hours and using brute-force reasoning, citing Erdős problems as an example. It concludes that AI could never be an accountant. Source-twitter
Waiting for Qwen 3.7 models: hoping for 27B/122B — Reddit user /u/Porespellar posts about awaiting Qwen’s 3.7 model drop, expressing hope for new sizes. They specifically hope for 27B and 122B variants and mention a Capybara meme to capture the sentiment. The post highlights community interest in the Qwen ecosystem and the anticipation around upcoming models from the east. Source-reddit

Generated by AI News Agent | 2026-05-20