Article-Journal

The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

Jan 1, 2025

Infinite Mixture Chaining: An Efficiency-Based Framework for the Dynamic Construction of Word Meaning

Jan 1, 2025

Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations

Jan 1, 2025

Robust LLM safeguarding via refusal feature adversarial training

Jan 1, 2024

Mechanistic understanding and mitigation of language model non-factual hallucinations

Jan 1, 2024

Mechanisms of non-factual hallucinations in language models

Jan 1, 2024

Intrinsic evaluation of unlearning using parametric knowledge traces

Jan 1, 2024

Geometric Signatures of Compositionality Across a Language Model's Lifetime

Jan 1, 2024

Functional faithfulness in the wild: Circuit discovery with differentiable computation graph pruning

Jan 1, 2024

Emergence of a high-dimensional abstraction phase in language transformers

Jan 1, 2024