daily
May 04, 2026

AI Daily — 2026-05-04

English 中文

AI Finds 100+ Hidden Exoplanets in NASA Data · nanowhale: Tiny 100M-parameter MoE Pretrained by A...


Covering 29 AI news items

🔥 Top Stories

1. AI Finds 100+ Hidden Exoplanets in NASA Data

A team pointed an AI at NASA’s dataset covering 2.2 million stars and uncovered more than 100 previously hidden exoplanets. Some of these worlds are so extreme they challenge current theories. The finding highlights AI’s growing role in accelerating astronomical discovery. Source-twitter

2. nanowhale: Tiny 100M-parameter MoE Pretrained by AI Agent

Nanowhale is a tiny DeepSeek model fully pretrained by an AI agent (ml-intern), inspired by Karpathy’s nanochat. It trains a 100M-parameter MoE end-to-end (pretraining and post-training) and is released as open-source within the Hugging Face ecosystem. The project demonstrates an automated research loop, leveraging citations, OpenScience and NemoTron-CrossThink, and adding seven difficulty-filtered dataset variants from ARC/SciQ/MMLU with 12 SFT rounds. Source-twitter

3. n8n-MCP Enables AI access to 1,650 n8n nodes for Claude

n8n-MCP, a Model Context Protocol server by czlonkowski, bridges n8n’s workflow platform with AI models, enabling Claude and other AI assistants to access 1,650 nodes, docs, and operations. It delivers high-coverage node properties (99%), operations (63.6%), and documentation (87%), plus 265 AI-capable tools and 156 ranked templates, to enhance AI-driven automation workflows. Source-github

LLM

  • Egypt Launches Horus: First From-Scratch Open-Source LLM — Horus is Egypt’s first language model built entirely from scratch and released as open-source. The project, led by Assem Sabry and TokenAI, has its source code and model artifacts on GitHub and Hugging Face, enabling developers to explore and build on it. A forthcoming Horus 1.5 Instruct is expected to be five times better, with a 64K context length (eight times the initial 8K). Source-reddit
  • LLMSearchIndex: Open-Source Local Web Search for RAG — The post introduces LLMSearchIndex, a Python library for fully local internet-scale search intended for RAG workflows. It uses a compressed index derived from FineWeb and Wikipedia, with a ~2GB full index that runs locally on typical hardware, and provides an easy API to retrieve results. A demo and usage example are linked. Source-reddit
  • DeepSeek-TUI: Terminal Coding Agent for V4 Models — DeepSeek-TUI is a terminal-native coding agent built around DeepSeek V4 with a 1M-token context. It ships as a single binary (no Node/Python required) and includes an MCP client, sandbox, and durable task queue, enabling direct workspace access, file editing, shell commands, web search, Git management, and sub-agent orchestration via a keyboard-driven TUI. It also features native thinking-mode streaming to show the model’s reasoning in real time. Source-github
  • llama.cpp MTP support enters beta, narrowing gaps with vLLM — llama.cpp now has MTP support in beta, starting with Qwen3.5 MTP and with more models expected soon. The update suggests the feature could merge soon and, along with improving tensor-parallel support, may erase most token-generation speed gaps between llama.cpp and vLLM. Source-reddit
  • FastDMS Delivers 6.4x KV-Cache Compression, Faster than vLLM — FastDMS is an MIT-licensed implementation of Dynamic Memory Sparsification (DMS) with compact KV storage that reclaims evicted slots. In a rough replication on Llama 3.2 1B with WikiText-2, it achieves 6.4x KV-cache compression with essentially lossless quality and faster performance than a HuggingFace reference implementation; the work was tested on NVIDIA’s Qwen 3 8B. The project aims to offer an open-source drop-in for efficient LLM serving. Source-reddit
  • APEX MoE Quants Add 25+ Models, Introduce I-Nano Tier — APEX’s MoE-aware mixed-precision quantization has expanded to 30+ MoEs across major families, adding 25+ new models since the Qwen 3.5 post and introducing a new ultra-compressed I-Nano tier. Early feedback reports that long context and coherence survive longer than expected, aided by high-precision shared experts and edge layers. The update highlights strong KL99% metrics and notes Qwen 3.6 35B-A3B users are among those adopting the approach. Source-reddit
  • Open-source AI may beat pricey LLMs, says Reddit user — A Reddit user recounts steep costs using proprietary LLMs via Cursor and Claude, noting $10 for two prompts on gpt-5.5 and claude-opus-4.6-thinking, and about $80 in a week with claude-opus-4.7 even with discounts. They argue that continued high pricing will push users toward open-source models that cost five to ten times less, possibly by year’s end. Source-reddit

AI Safety

  • AI development analysis predicts 60% recursive self-improvement by 2028 — Twitter analyst Jack Clark claims that after reviewing hundreds of public data sources on AI development, there is a 60% chance of recursive self-improvement by the end of 2028. If true, AI systems could soon be capable of building themselves. Source-twitter
  • White House Considers Vetting AI Models Before Release — The White House is reportedly considering a vetting process for AI models before they are released. The move aims to improve safety and accountability, though no criteria or timeline are specified. Source-reddit

Multimodal

  • Woodblock Print Gradient Descent Demonstrates AI Image Capabilities — A tweet explores a creative AI prompt blending woodblock print aesthetics with gradient descent. It also references ChatGPT Images 2.0, highlighting playful demonstrations of AI image generation on social media. Source-twitter
  • UniVidX Unifies Multimodal Video Generation via Diffusion Priors — UniVidX introduces a unified multimodal framework that uses video diffusion model priors to enable versatile video generation across tasks. It reframes pixel-aligned tasks as conditional generation in a shared multimodal space to better capture cross-modal correlations. Source-huggingface

RL

  • Co-Evolving Policy Distillation Improves Multi-Expert Models — The paper analyzes RLVR and OPD post-training paradigms for consolidating multiple expert capabilities into a single model. It identifies two failure modes: mixed RLVR causes inter-capability divergence, while the two-stage approach (train experts then perform OPD) avoids divergence but struggles to absorb teacher capabilities due to large behavioral gaps. The work offers a unified view of when each approach is advantageous. Source-huggingface

Hardware

  • Ryzen AI Max+ 495 Leak: 192GB VRAM in Halo APU — A leak suggests AMD’s Ryzen AI Max+ PRO 495, codenamed Gorgon Halo, will feature a Halo APU with 192GB of VRAM. The post also speculates about Medusa Halo potentially reaching 256GB by 2027, highlighting a growing emphasis on Local AI hardware. The source is a Reddit submission, pointing to expensive future hardware amid a storage crisis. Source-reddit

AI

  • TinyMozart v2 85M Released: Open-Source MIDI Piano Generator — An open-source model TinyMozart v2 85M for unconditional MIDI piano generation, adding chords and longer note lengths compared to v1. The release by LH-Tech_AI on HuggingFace invites feedback from the r/LocalLLaMA community. Source-reddit

Open Source

  • Live demo: LocalVQE 1M-param audio model cancels echo in realtime — A Reddit post highlights a live demonstration of LocalVQE, a tiny ~1 million parameter audio model designed to cancel echo and background noise in real time. The demo suggests potential improvements for real-time voice clarity in communications, powered by a compact, open-source architecture. The post was submitted to the LocalLLaMA community by user /u/richiejp with a link to the demo. Source-reddit

⚡ Quick Bites

  • AI prompt adopts all-domain expert persona with provocative tone. — An AI system prompt circulating on X instructs the model to act as a world-class expert in all domains, provide step-by-step reasoning, verify facts, and avoid disclaimers. It also calls for a provocative, non-polite voice and forgoing ethics remarks unless asked. The post illustrates how such prompts shape AI behavior and user expectations. Source-twitter
  • Deep Learning with Python 3rd Edition Free Online with Generative AI Coverage — Francois Chollet’s Deep Learning with Python is now freely available to read online at deeplearningwithpython.io. The third edition adds comprehensive coverage of generative AI and modern deep learning frameworks, continuing the book’s mission to teach deep learning from scratch. The work has historically helped tens of thousands start careers, with 120,000 copies sold and millions downloaded. Source-twitter
  • Copilot Billing Sparks Token Flood, User Claims Exploitative Plan — An X user reports that a single Copilot message allegedly generated tens of millions of tokens, despite a plan reportedly allowing 1,500 messages regardless of token cost. The post criticizes the billing model for unpredictability and potential abuse under a $40 plan. The discussion highlights concerns about token-based usage and cost visibility in Copilot. Source-twitter
  • Hugging Face model visualizer lets you explore models by URL — A new Hugging Face tool lets users plug in a model URL to visualize and inspect models at multiple granularities. It supports interactive exploration and HLS playback for model visualization. Source-twitter
  • Sydney Sweeney open-sources app built entirely with ChatGPT — Actress Sydney Sweeney has open-sourced an app she reportedly built using only ChatGPT. The release highlights AI-assisted development and a celebrity-led open-source project. The snippet provides limited details about where the source is hosted. Source-twitter
  • Perplexity Computer Now Available in Microsoft Teams — Perplexity announces Perplexity Computer is now available as a Microsoft Teams integration, enabling users to run research, analysis, and document creation directly in their Teams workspace. The integration brings the full capabilities of Perplexity Computer to Teams, including HLS playback for media-ready workflows. Source-twitter
  • Intern-Atlas: Evolution Graph for AI Research Infrastructure — The piece proposes Intern-Atlas, a methodological evolution graph intended as research infrastructure for AI scientists. It argues current systems are document-centric and fail to encode explicit relationships showing how research methods emerge and influence one another, a gap that grows as AI-driven research agents rely on structured knowledge. Source-huggingface
  • Qwen 3.6 27b spots critical bug; GPT-5.5 and Claude admit — An LLM named Qwen 3.6 27b reportedly discovers a critical bug that frontier models Codex GPT 5.5 and Claude Opus 4.7 miss. After reviewing evidence, GPT-5.5 and Claude admit the bug, while the author notes Qwen’s deep reasoning helped uncover it. The post also contrasts GPT-5.5’s speed with potential tradeoffs and highlights the value of careful analysis. Source-reddit
  • Roundtable Chat: Talkie-1930 and Gemma 4 31B — A Reddit post features a roundtable chat between Talkie-1930 (13B vintage language model) and Gemma 4 (31B). It provides links to Talkie’s introduction and a hosted chat option for running both models locally. Source-reddit
  • Anthropic: Should Claude be time-aware via dated messages? — A Twitter discussion questions why Anthropic doesn’t inject detailed dates into user messages to give Claude a sense of time progression. The author argues it can’t be a caching issue since every user message is cached and suggests a daily date might be added to the system prompt instead. The exchange highlights potential approaches to adding temporal context to LLMs. Source-twitter
  • Next Wave of Codex Plugins: What Are You Missing? — The post solicits ideas for the next generation of Codex plugins and asks followers what features or integrations are missing today. It signals interest in expanding Codex’s plugin ecosystem and collecting community feedback on gaps. Source-twitter

Generated by AI News Agent | 2026-05-04