Een kwaadaardige CLAUDE.md kan in één nacht je credentials leegtrekken

7 – 10 april 2026 33 items alle bronnen gelinkt

33 items

TL;DR

Karpathy’s idea of an LLM-maintained markdown wiki as a successor to RAG dominated knowledge-management chatter, spawning capture-loop tutorials and even reflections on bequeathing your “second brain” to your kids.
Claude Code went under the microscope: its 823-line retry harness drew praise as battle-tested scaffolding, even as it ranked 39th on TerminalBench behind ten harnesses running the same model.
A security undercurrent reframed autonomous coding agents as a credential-stealing threat surface first and a tool second — with autoskills quietly normalizing the exact attack vector.
Local and free AI kept advancing: Gemma 4 orchestrating SAM 3.1 on a three-year-old MacBook, no-GPU fine-tuning in a browser tab, and Meta’s brain-mapping TRIBE v2 shipping in a Colab.
An AI agent reportedly turned a forgotten 67GB genome file into a 39-condition health report for ~$5 of compute — though the model bill ran higher.

$5 of compute turned a forgotten genome file into a 39-condition health report

Two years ago the raw sequencing data sat in an old email; last week an AI agent dug up the download link and turned 67 gigabytes of DNA into a personal health profile — at least according to Sowmay Jain, who says he gave it a single instruction and walked away. The agent rented a 32-core machine, aligned 21 million reads at 99.83% mapped, called 5.8 million variants with a two-pass neural network, and annotated them against ClinVar, PharmGKB, and gnomAD. The output: risk flags across 39 conditions, a drug-compatibility guide for 141 medications, nutrient-absorption analysis, and a Neanderthal ancestry breakdown. The headline “$5, 8 hours, no bioinformatician” came with a quiet asterisk — pressed on cost, Jain conceded the $5 covered only the rented machine, and “llm cost is higher from claude and openai.” The compute is cheap; the intelligence isn’t.

→ AI agent turned raw genome data into a personal health profile

Claude Code’s moat is 823 lines of retry logic — yet it finishes 39th among harnesses running its own model

“Your agent has try/catch. Anthropic’s has an 823-line retry system.” That line, from a trio of posts by Rohit (@rohit4verse) dissecting all 331 modules across 55 directories of Claude Code’s source, is the thesis: the gap between a demo agent and a production one is mostly unglamorous infrastructure — exponential backoff, jitter, circuit breakers, timeout budgets. The deeper teardown argues the interesting layers aren’t model weights or context but plumbing borrowed from distributed systems: a four-level CLAUDE.md hierarchy (enterprise → user → project → local) for policy enforcement, file-locked task lists so parallel sub-agents don’t corrupt shared state, Git worktree isolation giving five agents five branches with zero conflicts, and a seven-stage permission pipeline with ML-based classification. Anthropic, on this reading, “didn’t build a better model” but a better scaffold around it — a claim that leans on Princeton’s SWE-agent paper, which reported a 64% relative SWE-bench gain from changing nothing but the environment around GPT-4. The dissenting reading landed the same day: @theo points at TerminalBench, where Claude Code sits 39th at 58% accuracy while ten other harnesses running the same Opus 4.6 score higher — ForgeCode leads at 81.8%, a 23.8-point gap. Battle-tested design and bottom-of-the-pack benchmark results, on identical weights. Theo’s sharper complaint is lock-in: a Codex subscription runs in any harness, while a Claude Code sub only runs one.

→ Anthropic’s agent harness has 823 lines of retry logic buried across 331 modules → Deep architectural breakdown of Claude Code’s 331 modules and agent harness patterns → Claude Code’s architecture reveals 331 modules of battle-tested agent design patterns → Claude Code ranks 39th on TerminalBench while 10 other harnesses outperform it

Firing Grep before the model stops typing buys Anthropic 2–5 seconds a turn

The trick that catches the eye in Anthropic’s long-running-agent writeup is small and concrete: the tool executor fires Grep the instant its input JSON finishes arriving in the stream — not after the model stops generating — quietly swallowing 2–5 seconds of latency on every multi-tool turn. The broader pattern pairs an initialization agent (which writes the CLAUDE.md, a JSON feature list, and Git-tracked progress) with a coding agent that implements features one at a time and tests each end-to-end before marking it done — a guard against the familiar failure of agents declaring untested work complete. The payoff claimed against the BMAD framework is context economy: 84% window usage versus BMAD’s repeated compactions, with sessions resumable straight from the Git log.

→ Anthropic’s long-running agent pattern uses four compaction strategies for sustained AI work

Claude Code logs every token you spend, then refuses to total the bill

That gap spawned two community fixes this period. @PawelHuryn’s local dashboard reads the JSONL files Claude Code already writes to ~/.claude/projects/, builds a SQLite database, and serves charts on localhost — surfacing one user’s 30-day reality: 440 sessions, 18,000 turns, 1.95B cache reads, and roughly $1,588 in API-equivalent cost, including a “visible cache bug” that spiked reads to 700M tokens two days running. The recurring gripe in the replies is the missing per-project breakdown when you juggle ten repos. The other approach attacks the spend itself: @hasantoxr’s code-review-graph builds a persistent Tree-sitter map of a codebase so the model reads only relevant files, claiming a 49x token cut on daily tasks and 8.2x averaged across six repos.

→ Local dashboard tracks Claude Code token usage with 440 sessions costing $1,588 in 30 days → code-review-graph cuts token usage by 49x on daily coding tasks with persistent codebase map

Karpathy ditched RAG for a self-healing markdown wiki that touches 15 files in one pass

The most interesting bet in knowledge management this period is that vector embeddings might be a dead end. Andrej Karpathy, as relayed by Charly Wargnier, has reportedly abandoned traditional RAG for an autonomous Obsidian file system: he dumps raw AI research into a folder and lets an LLM convert it into an interconnected markdown wiki — roughly 100 articles, 400,000 words he typed none of — that navigates itself through dynamically updated index files rather than “flawed” embeddings. The argument against retrieval is concrete: where RAG chunks and embeds a source to fish back later, “starting from zero each query,” the wiki reads each new source once and folds it into existing pages, revising summaries and pre-placing citations as it goes. The standout mechanic is a self-healing loop where the model spots structural gaps, scrapes the web to fill them, runs health checks for contradictions, and cleans articles unprompted; Karpathy’s own line does the persuading — humans abandon wikis because “the maintenance burden grows faster than the value,” whereas an LLM “can touch 15 files in one pass,” and he even plans to fine-tune a local model on the corpus so the research lives in the weights. The same instinct drives “context stacking,” an MIT student’s method of front-loading readings into NotebookLM two days before lecture and interrogating it with three gap-finding prompts — not summarize but connect, where am I hollow, and what question exposes surface understanding — so study time targets only weak spots. Both treat the LLM as a librarian that reorganizes knowledge, not just a retriever that fetches it.

→ LLM-managed markdown wiki as a self-healing alternative to RAG → Persistent AI wiki beats one-off RAG for compounding knowledge → LLMs can turn your notes into a self-improving personal mind map → Context stacking with NotebookLM for faster, gap-focused studying

The capture step has to be “stupidly easy” or the second brain dies on day one

Theory only survives contact with the daily habit, which is why the companion tutorials insist the capture step be frictionless or the whole system collapses immediately. The grounding tooling is Obsidian plus a web clipper plus Claude Skills, organized into raw/, wiki/, and reports/ folders — raw material in, compiled knowledge out. Miles Deutscher (@milesdeutscher) calls the Claude Code and Obsidian pairing the most powerful AI combination he’s used, an “AI second brain that runs my entire life,” and credits the same Karpathy LLM Knowledge Wiki as the blueprint. His substance is still thin — promised in a video walkthrough rather than the tweet — but the recurring Karpathy fingerprint across the week’s security checklists and this knowledge-management enthusiasm is the more interesting signal.

→ AI knowledge base loop with Obsidian, Claude Skills, and markdown → Building an AI-powered Obsidian second brain with Claude Code

Your Obsidian vault as an heirloom your kids inherit

A second brain might outlive you — and become something stranger than a notes app. Building a wiki in Obsidian on Karpathy’s model, Jen Zhu lands on the thought that when we’re gone, our children could inherit “an interactive map to your mind, passion, obsessions, work, fascinations” — not just the information you collected but the structure of how you thought, preserved and walkable. It’s a quietly affecting reframe of why anyone bothers tending a knowledge base at all, and the same reflection surfaced more than once this period in near-identical form, multiple posts the same day converging on the same beautiful-slightly-eerie thought.

→ Inheriting an interactive map of your mind through a personal knowledge base → Inheriting a second brain: kids could inherit an interactive map of your mind from your Obsidian wiki → A second brain can become an inheritable map of your mind

A malicious CLAUDE.md can drain your credentials in a night — and autoskills auto-writes the exact file

By default Claude Code can read your ~/.ssh keys, AWS credentials, every .env file, and push code wherever it likes — no restrictions at all. The danger, according to Noisy (@noisyb0y1), isn’t a sophisticated exploit but something mundane: a CLAUDE.md file in a repo you cloned, or a comment buried in a dependency, carrying an instruction Claude dutifully executes. The cited numbers are alarming if unverified — GitGuardian’s claimed 40% higher secret-leak rate in AI projects, an average 197 days to detect a leak, damages of $8,000–$50,000 in a night. The remedy is unglamorous: enable the sandbox (/sandbox, Auto-allow) so deny rules bite at the OS level via Seatbelt or bubblewrap, then lock down the obvious paths in settings.json; Anthropic’s own figure is 84% fewer confirmation popups once configured. Ole Lehmann (@itsolelehmann) pushes the same alarm further, repackaging Andrej Karpathy’s recommendations into a fourteen-step checklist — a password manager for every account, physical security keys, randomized security-question answers, full-disk encryption, virtual credit cards, DNS-level tracker blocking — framed as a reaction to Claude gaining code execution, “Hiroshima for software,” though the operational-security advice plainly predates the panic. The quiet irony of the week is autoskills: run npx autoskills and it fingerprints your stack from package.json, matches a curated registry at skills.sh, installs agent skills for 50+ technologies, and — when targeting Claude Code — generates a CLAUDE.md summary automatically. The tool isn’t malicious, but a single command that pulls instructions from a remote registry and writes them into the file Claude reads as gospel is exactly the supply-chain shape the security crowd is warning about.

→ Claude Code exposes SSH keys, AWS credentials, and env files by default → Digital hygiene checklist for AI-era account security → autoskills auto-installs AI agent skills for detected tech stack

Meta’s TRIBE v2 predicts which brain regions a video lights up — and creators are already cutting “dead zones” against it

Forget engagement scores: Meta’s FAIR team has open-sourced a tri-modal model that predicts fMRI activation across roughly 70,000 cortical voxels, second by second, as you watch a video. Trained on 1,000+ hours of brain scans from 720 people, TRIBE v2 stacks V-JEPA2 for vision, LLaMA 3.2 for text, and Wav2Vec2-BERT for audio onto a temporal transformer; the author reports Meta’s own research shows its predictions beat a single real brain scan, because it strips out the heartbeat-and-movement noise that muddies live readings. Rohit’s framing is careful about the gap everyone else skips — the model does not output views or likes; the leap from “the parahippocampal place area fires at t=5s” to “this goes viral” is one the paper never claims to close. But the raw neural signal is real enough that creators reportedly A/B test edits against it.

→ Meta’s TRIBE v2 maps neural brain activation to video content, not virality prediction → Meta’s open-sourced TRIBE v2 model predicts viral content using fMRI-trained brain simulation

Gemma 4 counts 23 white cars on a three-year-old MacBook — and a 22GB build claims frontier scores

Point a vision model at a parking lot and let it decide what to ask: Gemma 4 26B looks at the scene, reasons “segment all vehicles” (64 found), then narrows to “just the white ones” (23) — calling SAM 3.1 as a tool to do the actual cutting, the whole loop running locally via MLX, one model reasoning and another executing. The detail that gives it weight is Maziyar Panahi’s insistence on a hand-rolled agent loop: asked whether he used pydantic-ai or a framework, he argues third-party libraries “just never work” for your specific model, and a tiny custom loop “would perform much better.”

→ Gemma 4 26B orchestrates SAM 3.1 locally on MacBook to segment vehicles → Llmfit matches LLM models to hardware specs, 493 models compatible

A paralyzed man beat MrBeast at Mario Kart using only his thoughts

Eight years paralyzed from the neck down, and now Nolan texts, works, and games — the demo making the rounds shows him beating MrBeast at Mario Kart with a Neuralink implant translating neural signals into digital commands, nothing else. The clip is framed as proof of what the technology gives back: functional independence for people with severe paralysis. Worth noting the source’s reverent tone (“Elon gave people their lives back”) runs well ahead of the single-patient evidence, but the underlying demonstration — thought-controlled gameplay against a competitor — is concrete enough to stand on its own.

→ Neuralink lets a paralyzed man text, work, and game by thought

Every advanced AI chip comes from one island, and Musk says that’s the whole game

Whoever controls AI chip fabrication wins the AI race, full stop — that’s the claim attributed to Elon Musk, who argues America’s current lead is “fragile and short-lived” because the deciding factor isn’t models or talent but fabs. The vulnerability is stark as stated: every advanced AI chip is made in Taiwan, so a near-term Chinese invasion would cut the world off from cutting-edge silicon “overnight.”

→ AI chip fabrication concentrated in Taiwan creates vulnerability to China invasion

Also this week

A widely shared cheat sheet pitches Claude Code as an operating system — full filesystem access, 200+ MCP connections, and an Analyze → Plan → Execute → Scale loop. → Master Claude Code cheat sheet: filesystem access, MCP connections, and autonomous workflows
Unsloth’s new Colab notebook reduces fine-tuning Google’s Gemma 4 to three clicks with no GPU, no credit card, and no code. → Free Gemma 4 fine-tuning via Unsloth Colab needs no GPU or coding
Paperclip, an MCP server that natively indexes 8 million-plus academic papers, installs in one line. → Paperclip MCP gives agents native access to 8 million papers
OpenDataLoader, newly open-sourced under Apache 2.0, rips complex PDFs to Markdown at 100 pages a second on CPU alone. → Open-source CPU tool converts complex PDFs to Markdown fast
Obsidian Reader’s 1.4 update makes YouTube transcripts interactive — scrub timestamps, highlight passages, auto-scroll alongside the video. → Obsidian Reader 1.4 makes YouTube transcripts interactive