AI Daily — 2026-05-19

English 中文

Gemini Omni Debuts: Create Anything From Anything, Starting with Video · Google Unveils Gemini Sp...

Covering 32 AI news items

🔥 Top Stories

1. Gemini Omni Debuts: Create Anything From Anything, Starting with Video

Google DeepMind unveiled Gemini Omni, a model designed to create anything from anything, starting with video. It merges Gemini’s intelligence with DeepMind’s generative media systems, signaling advances in world understanding, multimodality, and editing. The announcement marks a major step toward flexible, cross-domain AI capabilities. Source-twitter

2. Google Unveils Gemini Spark: 24/7 Personal AI Agent

Google announces Gemini Spark, a 24/7 personal AI agent built on Gemini 3.5 capable of acting on user direction and handling long-running tasks in the background. It runs on dedicated Google Cloud VMs so you don’t need to keep your laptop open, and will integrate with Google tools with third-party support via MCP in the future. An animated UI demo shows Spark managing tasks like a morning priorities digest and trip planning, highlighting a new Spark (BETA) navigation tab. Source-twitter

3. OpenAI adds SynthID watermark and verification for AI images

OpenAI announced new methods to identify AI-generated images and trace their origin. In addition to existing C2PA Content Credentials, images will now include a SynthID watermark, and a public verification tool will let users check whether an image was created by OpenAI products. The update aims to improve provenance and user trust in AI-generated media. Source-twitter

📰 Featured

LLM

Karpathy Joins Anthropic to Advance LLM Frontiers — Andrej Karpathy announced his move to Anthropic, expressing excitement about returning to R&D and the formative potential of the coming years at the frontier of large language models. He also stated his enduring commitment to education and plans to resume work in that area in time. Source-twitter
OpenAI Unveils Guaranteed Capacity for Long-Term Compute Access — OpenAI has introduced Guaranteed Capacity to guarantee long-term access to its compute resources. The program offers discounted tokens for 1-3 year commitments and reflects the company’s investments in infrastructure and capacity planning to help customers scale reliably in a compute-constrained world. Source-twitter
HRM-Text 1B Claims SOTA Benchmarks — A Reddit post discusses HRM-Text 1B, linking to its GitHub and Hugging Face pages and questioning the sensational benchmarks. The poster seeks explanations of what’s happening, the model’s pros and cons, and whether the claimed SOTA is credible, inviting discussion. Source-reddit
Nemotron-Labs-Diffusion Enables AR, Diffusion, Self-Speculation — Nemotron-Labs-Diffusion is a tri-mode language model from NVIDIA that switches between autoregressive decoding and diffusion-based parallel decoding by altering the attention pattern during inference. It enables a third mode, self-speculation, using a shared KV cache to combine diffusion drafting with AR verification for improved long-context handling and decoding efficiency. The 3B, 8B, and 14B dense LM family covers base, instruct, and vision-language variants, highlighting a shift toward compute-bound generation. Source-reddit
AI Agents Build OS in 12 Hours With Antigravity 2.0 and Gemini — AI agents orchestrated a project to build a functioning operating system from scratch using Antigravity 2.0 and Gemini 3.5 Flash. The effort ran 93 parallel sub-agents, logged 15k+ model requests, and processed 2.6B tokens in 12 hours at under $1K in API credits. The demonstration highlights rapid, cost-efficient AI-assisted software development. Source-twitter
SkillsVote: Lifecycle Governance for Agent Skills — Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. The authors propose SkillsVote, a lifecycle-governance framework for Agent Skills that couples executable scripts with non-executable guidance, aimed at reducing redundancy and pollution in open skill ecosystems. It covers collection, recommendation, and evolution to maintain usable context across updates. Source-huggingface
AI Auto-Research Advances: Papers Generated, Integrity Risks Persist — AI-assisted research now enables fully automated paper generation for around $15, and long-horizon agents can run experiments, draft manuscripts, and critique with minimal human input. However, these systems still fabricate results, overlook errors, and struggle to assess novelty under scientific pressure. The survey across developments up to April 2026 highlights both productivity gains and persistent integrity challenges in AI-driven research. Source-huggingface
CLI-Anything Makes All Software Agent-Native — CLI-Anything is a CLI-Hub enabling AI agents to browse, install, and manage community-built CLIs, enabling agent-to-software interactions. It showcases demos with generated CLIs, live previews, and trajectory loops to produce artifacts like CAD builds and 3D scenes. The project invites public contributions and wishlist requests to broaden agent compatibility. Source-github
LLM-Powered Stock Analysis for A/H/US Markets — An open-source project on GitHub presents an LLM-driven stock analysis system covering A-share, Hong Kong, and US markets. It aggregates multi-market data, real-time news, and an AI decision dashboard, with multi-channel push and zero-cost automated operation; supports multiple AI models, data sources, and deployment options. Source-github
LLM-driven tool builds 3D objects with articulated parts — A Reddit post describes a pipeline that uses an LLM as a structured code compiler to generate Blender Python blocks that build a multi-part, articulated 3D object (e.g., a washing machine). Unlike diffusion-based workflows that produce amorphous blobs, the pipeline exports a clean GLB with separate parts and joints, enabling true internal articulation; it’s open-source and LLM-agnostic on GitHub. Source-reddit
KV cache quantization benchmarks favor TCQ; q8 may waste VRAM — An RTX 3090-based test using BeeLlama v0.1.2 evaluates Qwen 3.6 27B (Q5_K_S and IQ4_XS) quantization at 64k/128k context, comparing TurboQuant, TCQ, q5, and q8 settings. The author finds PPL is hidden under bf16 with q4_0 while KLD reveals it, concludes TurboQuant is overrated but TCQ helps, q5 deserves more attention, and symmetric q8 may waste VRAM. Source-reddit

Open Source

SAM3 Open Source Delivers Strong Object Tracking in Complex Scenes — An open-source SAM3 model is praised for its strong object-tracking capabilities, even in complex scenes like basketball. The author suggests SAM3 may be their favorite computer vision model and questions why Meta’s SAM models aren’t being used to build obvious, powerful products. Source-twitter
SANA-WM 2.6B World Model Released for 720p Video — The NVlabs SANA project provides an efficiency-oriented codebase for high-resolution image and video synthesis, with end-to-end training and inference pipelines across SANA variants. The 2.6B SANA-WM world model enables 720p video generation with 6-DoF camera control, marking a significant milestone for controllable world modeling and embodied AI. The project is open-source on GitHub and actively developed with documentation and community channels. Source-github
ByteDance Releases Lance: a 3B Multimodal Open-Source Model — ByteDance released Lance, a lightweight open-source multimodal model with 3B parameters designed for image and video understanding, generation, and editing. It is trained from scratch using a staged multi-task approach within a 128-A100-GPU budget, aiming to perform well across image and video tasks in a unified framework. Source-reddit

Hardware

Google Partners With Samsung, Gentle Monster, Warby Parker on Intelligent Eyewear — Google announced a collaboration with Samsung, Gentle Monster, and Warby Parker to develop new intelligent eyewear. The tweet includes sneak peeks of two designs from fall collections and references Google IO, highlighting an AI-forward wearable initiative. Source-twitter
LongLive-2.0: NVFP4 Parallel Video Generation — LongLive-2.0 introduces an NVFP4-based parallel infrastructure for training and inference of long video generation, addressing speed and memory bottlenecks. It features a sequence-parallel autoregressive training scheme called Balanced SP, co-designing a teacher-forcing layout with SP execution and SP-aware, chunked VAE to improve efficiency. Source-huggingface

⚡ Quick Bites

Code as Agent Harness: LLMs Use Code for Reasoning — Recent LLMs demonstrate strong capabilities in understanding and generating code across tasks from competitive programming to software engineering. The piece argues that code is increasingly used as an operational substrate for agent reasoning, action, environment modeling, and execution-based verification, framing this shift as ‘code as agent harness’. Source-huggingface
Claude Code plugin for academic research skills released — A GitHub project by Imbad0202 introduces the academic-research-skills plugin for Claude Code, guiding researchers through the full workflow from research to publication. It supports Claude Code CLI, VS Code, and JetBrains (v3.7.0+), offering features like Socratic planning via /ars-plan and automated tasks such as reference gathering, citation formatting, data verification, and logical consistency checks. The tool acts as a copilot to handle grunt work, while users focus on defining questions, choosing methods, interpreting results, and writing key arguments. Source-github
12-Factor Agents outlines production-ready LLM principles — Dex introduces 12-Factor Agents, a GitHub-hosted set of principles for building production-grade LLM-powered software inspired by the 12-Factor Apps (public at https://github.com/humanlayer/12-factor-agents). The project invites feedback and contributions, compares agent frameworks from LangChain to minimalist options, and promotes context engineering and tooling, including bootstrap via npx/uvx create-12-factor-agent and related talks. Source-github
Intel Crescent Island PCB Leaks Reveal 160GB LPDDR5X Xe3P GPU — Leaked Crescent Island PCB reportedly reveals Intel’s Xe3P data-center GPU with 160GB of LPDDR5X memory (20×8GB modules), bypassing HBM shortages. The memory runs at 8800–9500 MT/s, implying 704–760 GB/s bandwidth with a 640-bit interface, or a 10-channel setup if mapped to a 64-bit desktop equivalent. Source-reddit
Agent tests rm -rf; bash whitelist and bubblewrap sandbox deployed — An agent tested a safety block by attempting to execute a dangerous command (rm -rf /); the test succeeded with only a mild scare. The author then implemented a sandbox using a bash command whitelist and Bubblewrap for isolation, first completing the whitelist and then the isolation layer. Source-reddit
Llama.cpp PR Targets MTP Improvements — A Reddit post highlights a GitHub pull request (ggml-org/llama.cpp #23269) by PixelatedCaffeine aiming to add MTP improvements. The post links to the PR and discusses it in the comments. It signals an upstream performance-focused update for llama.cpp. Source-reddit
Google AI Edge Gallery Updates: Gemma 4, Pixel TPU Support — Google released updates to the AI Edge Gallery in v1.0.13 and v1.0.14, adding Gemma 4 Multi-Token Prediction and Pixel TPU support. The release also introduces experimental MCP, new skills, and chat history saving. Source-reddit
Gemini Enables HLS Playback — Gemini now supports HTTP Live Streaming (HLS) playback, enabling standard, browser-friendly video delivery for Gemini-powered content. This feature update improves streaming compatibility and viewer experience for Gemini users. Source-twitter
Newbie coder shifts from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K — A beginner coder documents their experience using large language models for a Python Pygame project (~30k lines across 55 modules). They compare Claude Opus and Claude Sonnet 4.6—initially helpful but plagued by length limits and debugging delays—before switching to Qwen3.6-35B-A3B-UD-Q6_K. The post highlights cost considerations and performance trade-offs in a real-world coding workflow. Source-reddit
Qwen Development Advances with 122B and 27B Submissions — Reddit post on r/LocalLLaMA notes that Qwen is awaiting a 122B model and a new 27B model, submitted by user /u/jacek2023. The post provides links to discussions and comments. This highlights ongoing open-source LLM development and model-scale exploration. Source-reddit
48GB VRAM Users: What Are Your Daily Drivers? — A Reddit post asks 48GB VRAM users to share their daily driver GPUs and what they would run if they had more VRAM. The original poster plans to upgrade from 32GB to 48GB and seeks community recommendations. The thread centers on hardware choices for AI/ML workloads and high-VRAM usage. Source-reddit
Introducing the Ettin Reranker Family — A Reddit post announces the Ettin Reranker Family and links to additional details for discussion in the LocalLLaMA community. The snippet provides no technical specifics, focusing on the announcement of a new family of reranker tools for AI workflows. Source-reddit
Tibo to reset Codex rate limits if tweet earns 1 like — A tweet claims that if this post earns one like, Tibo will reset Codex rate limits. The post centers on Codex, an AI code-generation model, and frames a hypothetical tool-limit change around user engagement. Source-twitter

Generated by AI News Agent | 2026-05-19