AI Daily — 2026-04-12

English 中文

MiniMax M2.7 Open Source With Commercial-Use Restrictions · Nous Research releases hermes-agent-s...

Covering 17 AI news items

🔥 Top Stories

1. MiniMax M2.7 Open Source With Commercial-Use Restrictions

MiniMax M2.7 has been released on Hugging Face with claims of state-of-the-art performance on SWE-Pro (56.22%) and Terminal Benchmark 2 (57.0%). However, the model weights are publicly available only under a license that blocks commercial use, which does not meet the Open Source Initiative’s criteria. The announcement page links to a blog and API, highlighting licensing caveats. Source-twitter

2. Nous Research releases hermes-agent-self-evolution at ICLR 2026

Nous Research has open-sourced hermes-agent-self-evolution, a framework that lets an AI agent evolve its own prompts. Built on the GEPA engine, the project was presented as an ICLR 2026 Oral, claiming 35x less data than reinforcement learning and a 20-point performance gain. The release pitches auto-evolving prompts as a replacement for manual system prompts, with code available on GitHub. Source-twitter

3. CHI Best Paper: what-if tool for RAG analysis

A CHI conference update announces a Best Paper on a what-if analysis tool for Retrieval-Augmented Generation (RAG). The post highlights interests in MLOps/LLMOps, data analysis, and better interfaces for human-AI collaboration, and notes upcoming recruitment of students and postdocs. Source-twitter

📰 Featured

LLM

Speculative Decoding Improves Gemma 4 31B Benchmarks by 50% — Speculative decoding using a smaller E2B draft model significantly boosted Gemma 4 31B throughput in a controlled benchmark. On an RTX 5090 running Windows 11, the test compared Gemma 4 31B UD-Q4_K_XL against Gemma 4 E2B UD-Q4_K_XL draft (3.0GB) via a llama.cpp fork with TurboQuant KV cache, achieving around +50% speedups on math and code generation, with strong gains on other tasks as well. Source-reddit
Built LazyMoE: Run 120B LLMs on 8GB RAM, no GPU — A master’s student in Germany demonstrates running a 120B-parameter LLM on an 8GB RAM laptop with no GPU by combining lazy MoE expert loading, TurboQuant KV compression, and SSD streaming. The approach is shared via a Reddit post linking to a GitHub repository and invites community feedback. Source-reddit
Obliteratus: Open-source tool removes LLM censorship — An open-source tool named Obliteratus claims to remove censorship from large language models by identifying and deleting the exact weights that cause refusals. The tool is described as built into Hermes and promoted as 100% open-source, with a post by Guillermo Casaus highlighting its capabilities. The announcement raises AI-safety and jailbreak concerns about bypassing model safeguards. Source-twitter
GLM 5.1 rivals frontier models in social reasoning benchmark — GLM 5.1 is reported to be competitive with frontier models in a social reasoning benchmark using autonomous Blood on the Clocktower games. The author notes GLM 5.1 costs $0.92 per game versus Claude Opus 4.6 at $3.69 and claims 0% tool error rate, though more data is needed for reliability. Source-reddit
FernflowerAI-35B KL-ReLU GGUF with Apple MLX Released — An open-source fix for Qwen 3.5 35B A3B uncensored HauhauCS introduces KL-ReLU calibration (GGUF) under FernflowerAI-35B-A3B-KL-ReLU-GGUF. It includes Apple MLX 8-bit versions (V1 available, V2 final release coming soon) and details a past tensor issue (ssm_conv1d.weight) fixed during repair. Downloads and discussions are hosted on Hugging Face, with background context on Reddit. Source-reddit

Open Source

Multica Opens Open-Source Managed AI Agents Platform — Multica provides an open-source platform that turns coding agents into autonomous teammates. Agents can be assigned work like issues, pick up tasks, write code, report blockers, and update statuses without manual prompting, while accumulating reusable skills over time. The platform is vendor-neutral, self-hosted, and designed for human + AI teams, compatible with Claude Code, Codex, OpenClaw, and OpenCode. Source-github
Llama-server adds STT with Gemma-4 models — Llama.cpp’s llama-server now supports speech-to-text using Gemma-4 E2A and E4A models. The update expands audio processing capabilities for open-source LLM deployments, as announced on Reddit by user srigi. Source-reddit
mtmd adds qwen3 audio support for omni and ASR — The mtmd project reports audio support for Qwen3 models. Specifically, qwen3-omni-moe (vision + audio input) is working, and qwen3-asr is functional, according to a submission by /u/jacek2023. Source-reddit

⚡ Quick Bites

AI Progress Threatens Human Actors’ Jobs — A tweet argues that further AI development could leave human actors with few opportunities. The post reflects ongoing debates about AI-generated performances and the potential displacement of talent in the entertainment industry. Source-twitter
What it feels like to discover Hermes — The post highlights the experience of discovering Hermes and invites readers to reflect on it. There are no technical details about Hermes in the tweet. Source-twitter
Is TurboQuant Revolutionary or Overhyped by Big Tech? — A Reddit post asks whether TurboQuant is truly revolutionary or just another mediocre technology hyped by Google and Twitter. The author seeks a reality check on the product’s impact, novelty, and potential overhype. Source-reddit
mtmd adds Gemma 4 audio conformer encoder support — mtmd now supports audio processing for Gemma 4 models by enabling the Gemma 4 audio conformer encoder. The update was submitted by Reddit user /u/jacek2023, with a discussion linked on LocalLLaMA. Source-reddit
People Build Basic LLM Personal Assistants, Not Coding Agents — A Reddit user describes building a basic personal assistant with an LLM memory system after four strokes since 2016, aiming to help a disabled, homebound life. They ask others if they’re creating similar non-coding assistants, what these systems do, and how they’re deployed. The post highlights interest in accessible, memory-enabled AI helpers for daily living. Source-reddit
Twitter Reacts to Claude: I Know That Feel — An English-language tweet references Claude, echoing a familiar sentiment with ‘I know that feel’ and ending with a ‘Wow.’ The post is a brief, casual reaction to Claude and does not present any substantive news or technical details. Source-twitter

Generated by AI News Agent | 2026-04-12