AI Daily — 2026-04-14

English 中文

Gemini Robotics-ER 1.6 Upgrades Robot Reasoning in the Physical World · Waymo Begins London Auton...

Covering 29 AI news items

🔥 Top Stories

1. Gemini Robotics-ER 1.6 Upgrades Robot Reasoning in the Physical World

Google DeepMind’s Gemini Robotics-ER 1.6 upgrade improves robots’ visual and spatial understanding to better plan and complete tasks in the real world. The update enhances embodied reasoning to enable more useful and reliable robotic behavior. Source-twitter

2. Waymo Begins London Autonomous Driving With Safety Driver

Waymo announced the start of autonomous driving in London with a trained specialist behind the wheel. The service aims to offer a quiet, convenient link to the Tube, bus, or final destination later this year, signaling a broader deployment of autonomous transport in the city. Source-twitter

3. Claude Opus 4.7 and design tool expected this week

Anthropic is set to launch Claude Opus 4.7 and a new prompt-based design tool for websites and presentations this week. The more advanced Claude Mythos is already being tested for cybersecurity use cases. Source-twitter

📰 Featured

LLM

GPT-5.4 Pro Solves Erdős Problem #1196 — GPT-5.4 Pro is claimed to have solved Erdős Problem #1196, a longstanding mathematical challenge. The post describes the result as impressively meaningful, with formalisation underway and Lichtman’s comments noted. The initial report appears on X (Twitter) from user Liam06972452. Source-twitter
MiniMax M2.7 GGUF NaN Fixes and Benchmarks — An investigation found that MiniMax-M2.7 GGUF causes NaNs in perplexity for 21%-38% of GGUFs on Hugging Face. The root cause appears to be an overflow in llama.cpp, with block 32 and block 311, and blk.61.ffn_down_exps implicated; lower-bit quant types like IQ4_XS and IQ3_XXS avoid NaNs. The reporters fixed their own GGUFs and observed that 99.9% KLD benchmarks remain stable, suggesting the issue is specific to perplexity evaluations rather than overall metrics. Source-reddit
Distilling 100B+ LLMs to 4B Models — A Reddit post outlines techniques for distilling extremely large language models (around 100B parameters) down to much smaller 4B-parameter models. It discusses practical methods and trade-offs to make capable LLMs more accessible in smaller footprints, reflecting ongoing open-source efforts like LocalLLaMA. Source-reddit

Open Source

OpenMed 1.0 Ships: Medical AI On iPhone, On-device — OpenMed 1.0.0 enables medical AI models to run entirely on-device on iPhone and Apple Silicon, with no cloud or API required. The open-source release includes an MLX backend, a Swift package for macOS/iOS, and 200+ PII-detection models across 8 languages under the Apache 2.0 license. Source-twitter
Genie3 and Tencent Preview 3D Worlds from One Image — Genie3 teams with Tencent to unveil HYWorld 2.0, an engine-ready World Model that can generate and edit full 3D scenes from a single image. The project emphasizes real 3D worlds rather than videos and will open-source tomorrow with HLS playback support. Source-twitter
Open-source Voicebox Enables Local Voice Cloning and 23-language TTS — Voicebox is a local-first, open-source voice synthesis studio that clones voices from seconds of audio and generates speech in 23 languages using five TTS engines. It runs entirely on the user’s machine, preserving privacy, and offers post-processing effects, a timeline for multi-voice projects, and paralinguistic tags. It positions itself as a free alternative to ElevenLabs with engines including Qwen3-TTS, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and HumeAI TADA. Source-github

Multimodal

Gemma 4 Enables On-Device Vision Segmentation on Laptop — Gemma 4 demonstrates local AI orchestration by evaluating a scene, reasoning about which questions to ask, and invoking a segmentation model to perform vision tasks. The workflow runs entirely offline on a laptop, showcasing edge AI capabilities. In the demo, it segments all vehicles (64 found) and then filters to white vehicles (23 found), with results delivered via HLS playback. Source-twitter
OmniShow Unifies Multimodal Conditions for HOIVG — OmniShow presents a unified framework for Human-Object Interaction Video Generation (HOIVG) that synthesizes high-quality videos conditioned on text, reference images, audio, and pose. The approach addresses prior methods that fail to support all required modalities, enabling richer and more controllable HOI video generation. Targeted applications include e-commerce demonstrations, short video production, and interactive entertainment. Source-huggingface
Baidu ERNIE-Image Debuts on Hugging Face — A Reddit submission highlights Baidu’s ERNIE-Image multimodal model now available on Hugging Face, linking to the model page. The post from the LocalLLaMA community includes discussion and a link to the official page. Source-reddit

AI

QuanBench+: Unified LLM Quantum Code Generation Across Qiskit, PennyLane, Cirq — QuanBench+ introduces a unified benchmark spanning Qiskit, PennyLane, and Cirq to evaluate LLM-based quantum code generation. It covers 42 aligned tasks across quantum algorithms, gate decomposition, and state preparation, assessed with executable tests and Pass@1/Pass@5. Source-huggingface
Multi-Agent AI Accelerates CUDA Kernels by 38% with NVIDIA — A multi-agent system that autonomously builds and maintains complex software was partnered with NVIDIA to optimize CUDA kernels. In three weeks, it delivered a 38% geometric-mean speedup across 235 problems. Source-twitter
LLM Auto-Tunes Llama.cpp Flags, Boosts Qwen3.5-27B Tokens/s — A Reddit post introduces —ai-tune, a feature that lets a model iteratively tune its own llama.cpp flags and cache the fastest configuration. It uses a mixed GPU rig (GeForce RTX 3090 Ti, 4070, 3060) with 128 GB RAM and shows substantial tok/s gains across models, notably reaching 40.05 tok/s on Qwen3.5-27B with Q4_K_M. The tuner automatically incorporates updates to llama.cpp/ik_llama.cpp by feeding llama-server —help into its tuning loop, and enables a new UI via llm-server-gui. Source-reddit

Hardware

Kernels on Hugging Face Hub Enable GPU-Optimized Kernels — Hugging Face announces Kernels on the Hub, pre-compiled for exact GPU, PyTorch, and OS, allowing multiple kernel versions to run in one process. The feature is compatible with torch.compile and reportedly delivers 1.7x–2.5x speedups over PyTorch baselines. It aims to simplify shipping GPU kernels alongside models on the Hub. Source-twitter

⚡ Quick Bites

Natol Lambert launches free RLHF course for his book — Natol Lambert announced a free RLHF course to accompany his book, starting with a welcome video and four lectures covering RLHF overview, IFT, reward models, rejection sampling, RL math, and implementation. He plans 10-15 videos over the coming months, adding Q&A sessions to deepen topics and address recent developments, while progress continues on the book’s post-training code. A YouTube playlist and course landing page were provided. Source-twitter
Anthropic rolls out mid-chat model switching — Anthropic is rolling out the ability to switch between models during an ongoing chat. This enables users to adjust capabilities or safety profiles without restarting the conversation. The feature marks a meaningful improvement in AI UX and flexibility. Source-twitter
Claude Desktop App Freezes on First Prompt — A user posted on X about the Claude desktop app freezing on the first prompt. The anecdote suggests reliability issues with Claude’s desktop experience and potential stability concerns in the latest release. This is a single user anecdote with no official statement yet. Source-twitter
Memory-Enhanced Dynamic Reward Shaping for LLM RL — A research paper proposes MEDS, a Memory-Enhanced Dynamic Reward Shaping framework that uses historical behavioral signals to shape rewards in reinforcement learning for large language models. Unlike standard entropy regularization, MEDS aims to reduce recurrent failure patterns across rollouts by storing past behaviors to guide future policy updates. Source-huggingface
Survey on Attention Sink in Transformers and Mitigation — Attention Sink (AS) occurs when Transformer models overly attend to uninformative tokens, hindering interpretability and impacting training, inference, and hallucinations. The survey reviews AS usage, interpretation, and mitigation strategies across Transformer architectures. Source-huggingface
Strips as Tokens Enables Artist-Quality Mesh Generation — A recent paper critiques token-ordering in autoregressive transformer-based mesh generation, arguing that coordinate-based sorting and patch heuristics hinder professional-quality modeling. It introduces Strips as Tokens, a UV-segmented representation that preserves edge flow and geometric regularity to improve artist-quality meshes. Source-huggingface
Anthropic Releases Claude Cookbooks With Copyable Claude Recipes — Anthropic’s Claude Cookbooks on GitHub offer notebooks and recipes showcasing practical ways to interact with Claude via its API. The repo provides copyable Python code and guides to integrate Claude into projects, with prerequisites like a Claude API key and a reference to the Claude API Fundamentals course. Source-github
Claude-4.6-Opus Fine-Tunes Often Downgrade Local LLMs — A Reddit post argues that Claude-4.6-Opus fine-tunes on local Llama-based models frequently reduce intelligence and reasoning. The author cites anecdotal evidence from a single prompt, uses llama.cpp in WSL2, and notes diminished performance regardless of model size; they ask if any such fine-tunes ever beat the base models. Source-reddit
Two Asus GX10s Struggle to Run Opus 4.5 Locally for AI — An experienced developer attempts to run Opus 4.5 locally for agentic coding on two Asus Ascent GX10 machines, testing models like Qwen 3.5 (multiple variants) and MiniMax (M2.5/M2.7). RAM is insufficient at 128GB, and licensing for M2.7 adds complexity, though M2.7 is praised as a capable agentic workhorse. The author seeks local AI deployment without cloud providers, despite high hardware costs. Source-reddit
Local GLM 5.1 Parkour Prompt Sparks Discussion — A Reddit post discusses Local GLM 5.1 and a prompt to build a city-based parkour game in a single web page, detailing WASD controls, camera-relative movement, ledge grabbing, and other mechanics. The author notes the model’s long internal reasoning (reportedly 32k tokens) and a feedback loop about arms placement, illustrating quirks of heavily quantized models and code-generation behavior. Source-reddit
ZAI Might Stop Open-Weighting Their Models — A Reddit post alleges that ZAI is moving away from open-weighted models, prioritizing profit over users. It cites omitting GLM-5 from the Lite plan, unexplained price hikes, backtracked policy on coding-tool usage, and the absence of base-model releases for GLM-4.7-Flash and GLM-5, with speculation that top models may stop being released openly. Source-reddit
Fairl Claims 1000 Tokens/s, Blaze-Fast Inference — A Reddit post on r/LocalLLaMA highlights a claim that Fairl can generate or process 1000 tokens per second, described as blazing fast. The post lacks independent verification and stems from user-submitted content. It underscores ongoing efforts to accelerate open-source LLaMA-based inference. Source-reddit
Over a Month Since Big AI Model Dropped — The post notes that more than a month has passed since a major AI model release, expressing surprise. It provides no specifics about the model or its origin. Source-twitter

Generated by AI News Agent | 2026-04-14