Research

Papers

A curated digest of recent prompt-engineering, agentic, and AI-security research. Each paper: a 3-sentence TL;DR, why it matters for practitioners, and how to put it to work.

2 papers in Development

Development Oct 2023 arXiv: 2310.06770

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Benchmark that tests whether an LLM can take a real GitHub issue and a full repository and produce a passing code change. Went from 2% solve rate in 2023 to 70%+ by 2025 -- the clearest quantitative record of agentic coding progress we have.

Carlos E. Jimenez, John Yang, Alexander Wettig +4
Development Jun 2021 arXiv: 2106.09685

LoRA: Low-Rank Adaptation of Large Language Models

Fine-tune a giant model by training tiny adapter matrices alongside it, leaving the base frozen. Cuts training memory by 10-100x, lets you host one base model with many LoRAs for different tasks, runs on a single consumer GPU.

Edward J. Hu, Yelong Shen, Phillip Wallis +5

Papers

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

LoRA: Low-Rank Adaptation of Large Language Models