AI Daily — 2026-05-22

English 中文

Gemini Omni Debuts, Showcasing Standout AI Creations This Week · Anthropic launches Project Glass...

Covering 43 AI news items

🔥 Top Stories

1. Gemini Omni Debuts, Showcasing Standout AI Creations This Week

Google’s Gemini Omni has arrived, accompanied by a week-long showcase of impressive AI creations. The post highlights standout demonstrations from the community and invites readers to view more in the linked thread. It signals strong interest in Gemini Omni’s capabilities across modalities. Source-twitter

2. Anthropic launches Project Glasswing, discovers 10k+ high-severity vulnerabilities

Anthropic says it launched Project Glasswing last month as a collaborative AI cybersecurity initiative. Since then, the company and its partners have identified more than ten thousand high- or critical-severity vulnerabilities in essential software. The update underscores AI’s growing role in proactive security analysis and vulnerability discovery. Source-twitter

3. Qwen 3.7-Max Beats Opus 4.7 and GPT-5.5 in Agentic Tetris Test

Three frontier models were tested on an agentic Tetris task where each could read its own code, benchmark, and rewrite itself over 10 iterations. Qwen 3.7-Max achieved the largest improvement (+56%) at a training cost of $1.32, compared with Claude Opus 4.7 at +28% (+$12.15) and GPT-5.5 at +7% (+$2.85). Qwen led on all metrics and was far cheaper (9x vs Claude, 2x vs GPT), highlighting the value of long agentic loops. Source-twitter

📰 Featured

LLM

TransitLM Publishes 13M Transit Route Dataset for Map-Free Planning — TransitLM releases a large-scale dataset of over 13 million transit route planning records from four Chinese cities, covering 120,845 stations and 13,666 lines, released as continual pre-training corpus and benchmark data for three evaluation tasks. Early experiments show LLMs can leverage the dataset for map-free transit route generation. Source-huggingface
June Set to Bring Gemini 3.5 Pro Confirmed; GPT-5.6 Rumored — Gemini 3.5 Pro is confirmed for a June release, with GPT-5.6 rumored to follow. The item also mentions Claude Sonnet 4.8 and Claude-Code/Source-Map leaks, but official announcements are still pending. Source-twitter
DeepSeek Advances $10.29B Funding, Focuses on Open-Source AGI — DeepSeek is moving forward with a $10.29 billion financing round. Founder Liang Wenfeng commits to continuing development of open-source AI models rather than pursuing short-term commercialization, signaling a long-term AGI-oriented vision. The Bloomberg report frames this as a significant push in the open-source AI ecosystem. Source-reddit
Cursor Composer 2.5 Cheaper, Faster Than Opus 4.7 and GPT-5.5 — Cursor Composer 2.5 is significantly cheaper than Opus 4.7 in Claude Code and GPT-5.5 in Codex based on API pricing. It uses far fewer tokens and achieves faster Time per Task, averaging about 9 minutes (1.3x faster) for Composer 2.5 and 7 minutes for Composer 2.5 Fast (1.8x faster) in Coding Agent Index benchmarks. Full benchmark results are linked in the source. Source-twitter
π-Bench benchmarks proactive personal assistant agents in long-horizon workflows — The piece notes that large language model-powered personal assistant agents, such as OpenClaw, may assist users across daily tasks. A core challenge is proactive assistance when users provide underspecified requests and hidden constraints. Existing benchmarks rarely test whether agents can infer and act on these hidden intents in long-horizon workflows, a gap π-Bench aims to fill. Source-huggingface
Full Attention Becomes Sparse with Few Training Steps — Researchers argue full-attention LLMs are intrinsically sparse and can be transformed into highly sparse models with minimal adaptation. The work questions reliance on native sparse training or heuristic token eviction, aiming to improve efficiency and cost-accuracy trade-offs in long-context inference. Source-huggingface
ChromeDevTools MCP Lets AI Agents Control Live Chrome Browser — ChromeDevTools/chrome-devtools-mcp exposes a Model-Context-Protocol server that lets AI coding agents control and inspect a live Chrome browser. It supports performance tracing, advanced debugging (network, screenshots, console with source-mapped traces), and automated actions via Puppeteer, with a CLI option for non-MCP use. A disclaimer notes that it exposes browser content to the MCP. Source-github
Attention LLMs: Please Read This — This item links to a post about LLMs on Hacker News. It has high visibility (709 points) and a large discussion (399 comments), indicating strong interest in the topic. The linked article is hosted at annas-archive.gl/blog/llms-txt.html. Source-hackernews
Antigravity 2.0 Leads OpenSCAD 3D LLM Benchmark — Antigravity 2.0 topped the OpenSCAD Architectural 3D LLM Benchmark. The report appears on Model Rift and is discussed on Hacker News (339 points, 131 comments), highlighting a niche benchmark at the intersection of 3D CAD and language models. Source-hackernews
BeeLlama v0.2.0 DFlash Update Boosts RTX 3090 Speed — BeeLlama v0.2.0 adds full Gemma 4 31B support with an efficient DFlash implementation and vision. It brings major performance updates for Qwen 3.6 27B, improved prefill handling and safer CUDA execution, and adds DFlash GGUFs support. Benchmarks on a single RTX 3090 show up to 164 tps for Qwen 3.6 27B and 177.8 tps for Gemma 4 31B, with reasoning off for non-chat prompts. Source-reddit
ByteShape Qwen3.6-35B-A3B quant 30% faster on 6GB VRAM — A Reddit post compares ByteShape’s CPU-quantized Qwen3.6-35B-A3B against Unsloth UD-IQ4_XS on a 6GB VRAM laptop, finding ByteShape’s CPU-5 quant ~30% faster on TG but slightly slower on PP when partially offloaded to CPU. The test uses an Asus ROG Zephyrus G14 with Ryzen 7 5800HS and RTX 3060, running Linux Mint 22.2 and llama.cpp v9203. The result highlights modest gains in quantized LLM inference under tight VRAM constraints. Source-reddit
Qwen3.6-35B-A3B: 262k Context on 8GB GPU, +30TPS — A Reddit post shows Qwen3.6-35B-A3B running with 8GB VRAM using a Mixture of Experts design, keeping active layers around 3GB and KV cache near 2.5GB. Context lengths up to 262k are demonstrated, with potential extensions to 320k–1M, though performance slows after ~150k; larger GPUs may require smaller contexts for better TPS. The author discusses quants like APEX-I-Quality and Q4_K_XL and notes engine parameter tweaks can impact throughput. Source-reddit
Qwen-27B IQ4 KS for ik_llama.cpp on 16GB NVIDIA GPUs — A KS/KSS quantization of Qwen-27B tailored for 16GB VRAM NVIDIA GPUs yields a 14.1GB model compatible with ik_llama.cpp. In testing, it runs 1.5x-1.75x faster and closely matches the prior 14.7GB IQ4_XS quantization, with a 105k context window enabled by a Q4_0 Hadamard KV cache. The setup is CUDA/CPU-only and not available on AMD or Apple Silicon. Source-reddit
Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM — A Reddit post reports that Qwen3.6-27B can be quantized to Q4_K_M GGUF and run on a 16 GB RTX 5060 Ti. The experiment yields ~40 tokens per second (tok/s) for some setups, with two variants (MTP and non-MTP) around 15.1–15.4 GB and instructions to run with llama.cpp. The model is available on Hugging Face with the provided URL. Source-reddit
SupraLabs Releases Supra-50M, a 50M-Parameter LLM — SupraLabs unveiled Supra-50M, a compact 50M-parameter causal language model available in BASE and INSTRUCT variants, built from scratch with a Llama-style architecture and trained on 20 billion tokens. The model delivers competitive results on several benchmarks (BLiMP, SciQ, ARC-Easy, PIQA) despite its small size, and SupraLabs outlines future plans for Supra-124M and Supra-350M variants. Source-reddit
Llama.cpp Fork Introduces Experts MoE for 12GB VRAM GPUs — An experimental fork of llama.cpp adds an MoE (Mixture-of-Experts) implementation to run on GPUs with limited VRAM, such as a 12GB RTX 2060. The author notes that early layers require VRAM access for efficiency and is exploring CPU/offload strategies, with a UI to monitor active experts. They achieved about 22 tokens per second on their setup. Source-reddit
UD-Q5_K_M Wins Qwen3-Coder Quantization Shootout — An enthusiast ran a quantization shootout on Qwen3-Coder-Next using four formats (MXFP4_MOE, Q4_K_M, Q5_K_M, UD-Q5_K_M) on 3×R9700 PRO GPUs with llama.cpp Vulkan and WikiText-2 evaluation. UD-Q5_K_M outperformed all others on top-1 accuracy and KL divergence, though it was about 10 GB larger in file size; it achieved 94.0% top-1 and 0.0217 mean KL, versus 89.4-93.0% and higher KL. The author notes Unsloth’s dynamic precision approach is promising and suggests testing at lower quantizations. Source-reddit

MLLM

Grounded Personality Reasoning for Multimodal LLMs — Researchers challenge current MLLM benchmarks that only predict Big Five traits and introduce Grounded Personality Reasoning (GPR), a new task requiring models to infer personality from grounded behavioral cues rather than superficial patterns. The work aims to distinguish genuine behavioral understanding from prejudice, offering three contributions to formalize GPR and advance evaluation of personality perception in multimodal AI. Source-huggingface

RL

DelTA: Discriminative Token Credit Assignment for RLVR — Researchers introduce DelTA, a discriminative perspective on reinforcement learning from verifiable rewards (RLVR) for large language models. They show that policy-gradient updates implicitly act as a linear discriminator over token-gradient vectors, clarifying how response-level rewards influence token probabilities. This framework aims to improve understanding and design of RL-based reasoning enhancements in LLMs. Source-huggingface

Open Source

Models.dev: Open-source AI model specs and pricing database — Models.dev is an open-source database that catalogs AI model specifications, pricing, and capabilities. It aims to help developers compare options across providers by surfacing key details in a centralized resource. Source-hackernews
Fine-tunes Cohere Transcribe for diarization and timestamps — An Reddit post reports a fine-tuned version of Cohere Transcribe that adds speaker diarization and timestamped transcripts using a standard format. The enhancement is claimed to provide precise timestamps (average ~0.097s, 90% within 0.006s) and supports multiple speakers, with up to four per 30 seconds and up to 32 with diarize_long.py, released for free on Hugging Face. Source-reddit
OpenBMB Unveils BitCPM-CANN 1.58 Bit Model — OpenBMB introduces the BitCPM-CANN 1.58 Bit model. Reported tests run on Huawei Ascend 910B indicate ongoing evaluation of the release. Source-reddit

AI Hardware

CODA Rewrites Transformer Blocks as GEMM-Epilogue Programs — CODA proposes rewriting transformer blocks as GEMM-epilogue programs to map transformer computations onto GEMM backends. The approach aims to improve performance and efficiency of transformer-based models on accelerator hardware by fusing operations and improving data locality. The work is presented as an arXiv preprint, highlighting a compiler-style technique for AI workloads. Source-hackernews

AI

Cloudflare CEO on how he chooses which employees to replace with AI — An opinion piece in the Wall Street Journal where Cloudflare’s CEO explains his criteria for automating work with AI. He describes evaluating tasks versus individuals, aiming for productivity gains and ROI while considering the impact on workers and governance. Source-hackernews

⚡ Quick Bites

Codex Enables Mac Control From Phone, Even When Locked — Codex can securely operate apps on a Mac from a phone, even when the Mac is locked and the screen is off. The item cites a tweet and references a Codex page, presenting this as a cross-device capability rather than an official public update. A side remark jokes that OpenAI may have more macOS engineers than Apple. Source-twitter
Cursor SDK 2.5 Enables Python/TypeScript Agents with Composer — Cursor announces Composer 2.5 in the Cursor SDK, adding Python and TypeScript support for building custom agents. A long-weekend promo offers 90% off SDK usage to encourage experimentation, with excitement about what developers create using the new tools. Source-twitter
Transformer Becomes GEMM+Epilogue; CODA Accelerates All Ops — A mathematical rewrite suggests transformer workloads can be expressed as GEMM operations with an epilogue. CODA reparameterizes surrounding ops to hide them within the matmul path and fuse them into the epilogue, boosting on-chip throughput. The piece also claims LLMs can generate fast CODA kernels, approaching Speed-of-Light performance. Source-twitter
Auto mode updates: Pro plan now; Sonnet 4.6 supports Opus 4.7 — Two updates to auto mode are announced. Auto mode is now available on the Pro plan, expanding access. Sonnet 4.6 is supported alongside Opus 4.7, with a note to Shift+Tab to run Claude. Source-twitter
Microsoft starts canceling Claude Code licenses — Microsoft has begun canceling licenses for Claude Code, Anthropic’s coding-focused AI. The move affects developers and teams relying on Claude Code for code generation and assistance. It underscores shifting licensing strategies for AI coding tools in enterprise products. Source-hackernews
AI Multiplies Existing Technical Skills — AI tools are described as multiplying the effectiveness of existing technical skills, boosting productivity and speeding up problem solving. The piece discusses how developers can leverage AI to augment coding, automation, and learning, with practical considerations for adoption. Source-hackernews
dotnet/skills: Open-source AI Coding-Agent Skills for .NET — dotnet/skills provides the .NET team’s curated core skills and custom agents to assist AI coding agents working with .NET and C#. It features a dashboard for accuracy and efficiency of contained plugins and includes areas such as data access, performance debugging, MSBuild, NuGet, and project upgrades. Source-github
Tell HN: Frustrated with AI-generated answers across platforms — One user recounts finding GitHub repos spreading malware and asking an AI what to do, only to receive unhelpful answers that later appear verbatim in a GitHub discussion. They describe business owners sending ChatGPT screenshots that have nothing to do with the task, and a Reddit DM exchange where the user realizes they are talking to an AI agent. The post expresses fatigue with AI-generated responses and a desire to interact with real people instead of automated answers. Source-hackernews
AI Is Unauthorized Plagiarism at Scale — The article argues that AI models effectively copy from existing works without proper authorization, expanding plagiarism to a larger scale. It discusses the legal and ethical implications for authors and data rights, and calls for policy reforms and clearer accountability in AI training and outputs. Source-hackernews
Gemini AI randomly dumps its system prompt — Reportedly, Google’s Gemini AI randomly dumped its system prompt, with details shared in a public gist. A Hacker News thread about the gist has 94 points and 42 comments, signaling strong community interest. The incident underscores risks around prompt leakage and security practices for AI systems. Source-hackernews
PopuLoRA: Co-Evolving LLMs for Reasoning Self-Play — PopuLoRA proposes co-evolving populations of large language models to improve reasoning through self-play. By enabling multiple LLMs to interact and learn from one another, the approach aims to develop stronger reasoning strategies and solutions. The item is hosted on Hacker News with modest engagement. Source-hackernews
LLMs Treat Data Center GPUs as Optional DLC — An analysis of how large language model deployments can treat data-center GPUs as optional add-ons rather than essential resources, highlighting potential inefficiencies and elevated costs. The piece discusses implications for AI infrastructure, throughput, and cost-alignment of workloads with hardware. Source-reddit
The Model Alone Is No Longer the Product — An AI industry view circulating on Twitter argues that delivering AI products now requires more than the underlying model. The shift foregrounds the importance of data, infrastructure, safety, and user experience as part of the product, not just the model. This perspective signals a broader shift in how AI solutions are built and monetized. Source-twitter
AI-generated walls of text invade conversations — The piece examines how AI-generated long, dense blocks of text are creeping into online conversations, increasing noise and reducing readability. It discusses potential causes—from chatbots to auto-responses and platform dynamics—and considers moderation and UX implications. Source-hackernews
What problems should AI solve in the future? — A Twitter post asks followers which future AI problems they hope will be solved and suggests potential collaboration or help. It invites ideas and discussion on AI goals and impact. Source-twitter
GDB Reflects on Coding Before Codex — A post by user gdb on Twitter reflects on what coding was like before Codex. The tweet signals nostalgia for pre-AI coding workflows and acknowledges Codex’s impact on modern development. It hints at ongoing debates about AI-assisted coding. Source-twitter

Generated by AI News Agent | 2026-05-22