Research
Papers
A curated digest of recent prompt-engineering, agentic, and AI-security research. Each paper: a 3-sentence TL;DR, why it matters for practitioners, and how to put it to work.
2 papers in Development
- Development Oct 2023 arXiv: 2310.06770
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Benchmark that tests whether an LLM can take a real GitHub issue and a full repository and produce a passing code change. Went from 2% solve rate in 2023 to 70%+ by 2025 -- the clearest quantitative record of agentic coding progress we have.
- Development Jun 2021 arXiv: 2106.09685
LoRA: Low-Rank Adaptation of Large Language Models
Fine-tune a giant model by training tiny adapter matrices alongside it, leaving the base frozen. Cuts training memory by 10-100x, lets you host one base model with many LoRAs for different tasks, runs on a single consumer GPU.
