Understanding and Controlling Repetition Neurons and Induction Heads in In-context learning, Nhi Hoai Doan, Tatsuya Hiraoka, Kentaro Inui, AACL2025 Main. 📄 Paper
This study explores how large language models’ ability to recognize repetitive patterns relates to their in-context learning performance. Focusing on repetition neurons rather than attention heads, the authors find that these neurons’ influence depends on layer depth and identify ways to reduce repetitive outputs while preserving strong ICL abilities.
Can Language Models Handle a Non-Gregorian Calendar?, Mutsumi Sasaki, Go Kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui, Keisuke Sakaguchi, Benjamin Heinzerling, AACL2025 Main. 📄 Paper
This study evaluates how well language models handle the Japanese (non-Gregorian) calendar, a culturally grounded time system. Using four tasks requiring temporal knowledge and reasoning, we find that while some models can convert dates, even Japanese-centric LMs struggle with calendar arithmetic and consistency. The study underscores the need for culture-aware temporal reasoning in LMs.
FOCUS: A Benchmark for Targeted Socratic Question Generation via Source-Span Grounding, Surawat Pothong, Machi Shimmei, Naoya Inoue, Paul Reisert, Ana Brassard, Wenzhi Wang, Shoichi Naito, Jungmin Choi, Kentaro Inui, AACL2025 Main.
Unveiling the Influence of Amplifying Language-Specific Neurons, Inaya Rahmanisa, Lyzander Marciano Andrylie, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji, AACL2025 Findings. 📄 Paper
This study shows that amplifying language-specific neurons in large language models effectively steers outputs toward target languages, especially benefiting low-resource ones, but often reduces cross-lingual performance. It highlights both the potential and limits of such neuron-level interventions for multilingual behavior.
Uncovering the Spectral Bias in Diagonal State Space Models. Ruben Solozabal, Velibor Bojkovic, Hilal AlQuabeh, Kentaro Inui, Martin Takáč, NeurIPS2025. 📄 Paper
This paper explores spectral bias in diagonal state-space models, examining how different initialization schemes affect their frequency response. The authors propose S4D-DFouT, a diagonal initialization in the discrete Fourier domain, which mitigates bias and yields better parameterization for modeling temporal dependencies.
SPIRIT: Patching Speech Language Models against Jailbreak Attacks. Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas, EMNLP2025 Main. 📄 Paper
This work shows speech language models (SLMs) are highly vulnerable to jailbreak attacks, with nearly 100% success when adversaries inject imperceptible noise. It introduces SPIRIT, a post-hoc activation patching defense that improves robustness up to 99% with negligible utility loss—and without retraining.
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors. Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura, EMNLP2025 Main. 📄 Paper
The paper examines how transformer LLMs detect and correct typographical errors via specific neurons and attention heads. Local context allows early/late layer typo neurons to fix simple errors; middle layers, using global context, handle core corrections. Typo heads work broadly, and both neurons and heads contribute also to general context understanding.
Identification of Multiple Logical Interpretations in Counter-Arguments. Wenzhi Wang, Paul Reisert, Shoichi Naito, Naoya Inoue, Machi Shimmei, Surawat Pothong, Jungmin Choi, Kentaro Inui, EMNLP2025 Main. 📄 Paper
The paper presents CALSA+, a dataset of 134 counter-arguments annotated with 13 logical questions (5,226 expert annotations, α=0.46). A model trained with RLVR captures multiple logical interpretations, performing comparably to larger models.
Spelling-out is not Straightforward: LLMs’ Capability of Tokenization from Token to Characters. Tatsuya Hiraoka, Kentaro Inui, EMNLP2025 Findings. 📄 Paper
The paper investigates how LLMs spell out tokens into characters. It shows only first characters are encoded in embeddings, while deeper layers reconstruct later ones. A “breakthrough” layer emerges where character knowledge becomes reliably accessible, confirmed by probing, neuron, and attention analyses.
LLMs Can Compensate for Deficiencies in Visual Representations. Sho Takishita, Jay Gala, Abdelrahman Mohamed, Kentaro Inui, Yova Kementchedjhieva, EMNLP2025 Findings. 📄 Paper
The paper studies how vision-language models (VLMs) using CLIP visual encoders cope with weak visual features. It finds that large language decoders can compensate when encoder self-attention is impaired, recovering object part identifiability via context. Some low-level visual processing is non-recoverable, especially from early encoder layers.
How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders. Tatsuro Inaba, Go Kamoda, Kentaro Inui, Masaru Isonuma, Yusuke Miyao, Yohei Oseki, Yu Takagi, and Benjamin Heinzerling, EMNLP2025 Findings. 📄 Paper
The authors probe how internal representations in large language models evolve during training by fitting sparse autoencoders at various checkpoints. They find that models first learn language-specific features, then cross-linguistic correspondences, and eventually abstract / conceptual knowledge after mastering token-level patterns.
Understanding the Side Effects of Rank-One Knowledge Editing. Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui, 8th BlackboxNLP Workshop. 📄 Paper
This study examines side effects in rank-one knowledge editing of language models. Edits to highly connected subjects cause broader impacts, influenced by relation similarity, object density, and representation distortion. The findings inform more reliable knowledge editing methods.
RECALL: Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles. Munachiso Nwadike, Zangir Iklassov, Toluwani Aremu, Tatsuya Hiraoka, Velibor Bojkovic, Benjamin Heinzerling, Hilal Alqaubeh, Martin Takáč, Kentaro Inui, ACL2025 Main. 📄 Paper
This paper introduces a new method called the self-referencing causal cycle (RECALL). It helps large language models avoid the reversal curse, which is a problem that comes from one-way reasoning. RECALL uses special "cycle tokens" to connect different parts of the training data.
Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning. Ayana Niwa, Masahiro Kaneko, Kentaro Inui, ACL2025 Findings. 📄 Paper
Why do LLMs sometimes generate incorrect answers, even as their capabilities continue to improve? This study hypothesizes that the model's flawed reasoning stems from its reliance on incorrect beliefs, and it demonstrates that correcting the belief space leads to improved performance.
On Entity Identification in Language Models. Masaki Sakata, Sho Yokoi, Benjamin Heinzerling, Takumi Ito, Kentaro Inui, ACL2025 Findings. 📄 Paper
This paper examines how language models identify named entities. It introduces metrics for ambiguity and variability of mentions, and evaluates clustering quality. Results show models reach 0.66–0.90 precision/recall analogs, with entity information encoded in low-dimensional subspaces, especially in early layers.
Annotating Errors in English Learners’ Written Language Production: Advancing Automated Written Feedback Systems, Steven Coyne, Diana Galvan-Sosa, Ryan Spring, Camélia Guerraoui, Michael Zock, Keisuke Sakaguchi, and Kentaro Inui, AIED 2025.
Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students’ (Mis)Understanding Is Hinted, Machi Shimmei, Masaki Uto, Yuichiroh Matsubayashi, Kentaro Inui, Aditi Mallavarapu, Noboru Matsuda, AIED 2025. 📄 Paper
This study proposes AnaQuest, a method for generating MCQs with LLMs. Using student responses, it creates realistic questions whose quality and difficulty closely match human-crafted items, outperforming baseline prompts. [Best LBR Paper Award🎊]
Repetition Neurons: How Do Language Models Produce Repetitions?, Tatsuya Hiraoka, Kentaro Inui, NAACL 2025. 📄 Paper
LLMs sometimes repeat, repeat, repeat, repeat themselves. To uncover the internal mechanism, this paper introduces "repetition neurons", regarded as skill neurons responsible for the repetition problem in text generation tasks. These neurons become progressively more active as the repetition continues.
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces, Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui, NAACL 2025. 📄 Paper
Recent work has analyzed how knowledge is represented in activation space. Building on this line of inquiry, this paper examines how LLMs leverage the linear subspace of entity-numerical attributes when answering questions involving numeric comparisons—for example, "Was Cristiano born before Messi?"
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference, Go Kamoda, Benjamin Heinzerling, Tatsuro Inaba, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui, NAACL 2025 Findings. 📄 Paper
In LLMs, early layers transform subword tokens into more meaningful representations that form the model's inner vocabulary. This paper demonstrates that several important aspects of this detokenization stage can be understood purely by analyzing model weights.
MQM-Chat: Multidimensional Quality Metrics for Chat Translation, Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe and Kentaro Inui, COLING 2025. 📄 Paper
This paper introduces MQM-Chat, a new multidimensional quality metric designed specifically for chat translations. It encompasses seven error types, including three that are unique to chat translations, allowing for the evaluation of both lexical and semantic accuracy in chat translation tasks.
Beyond Click to Cognition: Effective Interventions for Promoting Examination of False Beliefs in Misinformation, Yuko Tanaka, Hiromi Arai, Miwa Inuzuka, Yoichi Takahashi, Minao Kukita, Ryuta Iseki, Kentaro Inui, CHI 2025. 📄 Paper
An online study with 627 participants tested interventions to counter misinformation. Both metacognitive and ranking strategies increased fact-checking clicks, but only metacognition promoted deeper examination of false beliefs, highlighting the need for cognitive, not just behavioral, interventions.
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization, Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Tomoya Iwakura, COLING 2025. 📄 Paper
NLP datasets may still contain annotation errors, even when they are manually annotated. This paper introduces SubRegWeigh, a time-saving method that leverages subword regularization to simulate multiple error detection models for identifying annotation errors in NLP datasets.
ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation, Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui, COLM 2025. 📄 Paper
Assessing the quality of free-text explanations remains challenging for LLMs due to their multifaceted, subjective, and labor-intensive nature. This paper introduces ACORN, a dataset of free-text explanations paired with aspect-wise quality ratings.
Monotonic Representation of Numeric Properties in Language Models, Benjamin Heinzerling, Kentaro Inui, ACL 2024. 📄 Paper
How is factual knowledge involving numeric properties such as "Karl Popper was born in 1902" encoded in the model’s internal representations? This paper introduces a method for finding and editing representations of numeric properties by identifying interpretable, monotonically encoded directions.
A Large Collection of Model-generated Contradictory Responses for Consistency-aware Dialogue Systems, Shiki Sato, Reina Akama, Jun Suzuki, Kentaro Inui, ACL 2024 Findings. 📄 Paper
During interactions with a model, contradictory responses can undermine user trust and disrupt dialogue coherence. This paper tackles that challenge head-on by constructing the first large-scale dataset of model-generated contradictions.
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps, Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui, ICLR 2024. 📄 Paper
Interpreting the internals of Transformer models remains a pivotal challenge. This paper examines the role of feed-forward (FF) blocks in Transformers, which was unexplored, by visualizing their impact on input contextualization through attention maps.
Representational Analysis of Binding in Language Models, Qin Dai, Benjamin Heinzerling, Kentaro Inui, EMNLP 2024. 📄 Paper
This paper delves into the mechanism of in-context entity tracking, exploring how language models seamlessly bind entities to their corresponding attributes from a given context. It newly introduces the Ordering ID captured by entity activations, which directly determines binding behavior.
First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning, Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui, EMNLP 2024. 📄 Paper
This paper reports on the systematic strategy that language models employ in multi-step reasoning processes, revealing that they rely on heuristics such as lexical overlap in the early stages, with this reliance diminishing as they advance toward the final answer.
Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling, Irfan Robbani, Paul Reisert, Surawat Pothong, Naoya Inoue, Camélia Guerraoui, Wenzhi Wang, Shoichi Naito, Jungmin Choi, Kentaro Inui, EMNLP 2024. 📄 Paper
In education, counterarguments are used to boost critical thinking skills, but giving personalized feedback to students can be challenging for teachers. This paper introduces Counter-Argument Logical Structure Analysis (CALSA), a novel approach that breaks down counterarguments in debates into ten clear logical patterns, offering a promising pathway toward automating effective feedback.
Designing Logic Pattern Templates for Counter-Argument Logical Structure Analysis, Shoichi Naito, Wenzhi Wang, Paul Reisert, Naoya Inoue, Camélia Guerraoui, Kenshi Yamaguchi, Jungmin Choi, Irfan Robbani, Surawat Pothong, Kentaro Inui, EMNLP 2024 Findings. 📄 Paper
The paper introduces CALSA: a task to analyze the logical structure of counterarguments (CAs) relative to initial arguments (IAs). It defines 10 logic-pattern templates, builds an annotated dataset of 778 IA-CA pairs (86.5% coverage, Krippendorff’s α ≈0.50), and evaluates language models on automatic template selection and slot-filling, finding the task challenging.
The Curse of Popularity: Popular Entities Have Catastrophic Side Effects when Deleting Knowledge from Language Models, Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui, NAACL 2024 SRW. 📄 Paper
Language models encode world knowledge in their internal parameters through training. This paper investigates the deletion of such encoded knowledge from the model and analyzes the relationship between deletion side effects and the associated entities using a synthetic knowledge graph.
How Well Do Vision Models Encode Diagram Attributes?, Haruto Yoshida, Keito Kudo, Yoichi Aoki, Ryota Tanaka, Itsumi Saito, Keisuke Sakaguchi, Kentaro Inui, ACL 2024 SRW.
Vision models such as CLIP have been used in research on diagram understanding and generation. This study examines an unexplored capability of these models, specifically, whether they can accurately identify diagram attributes including node colors, shapes, edge colors, and connection patterns.
Teach Me How to Argue: A Survey on NLP Feedback Systems in Argumentation, Camelia Guerraoui, Paul Reisert, Naoya Inoue, Farjana Sultana Mim, Keshav Singh, Jungmin Choi, Irfan Robbani, Shoichi Naito, Wenzhi Wang, Kentaro Inui, 10th Workshop on Argument Mining. 📄 Paper
While current models can assess argument quality, they often fail to provide constructive feedback explaining the basis of their evaluations. This survey explores current NLP feedback systems by categorizing them into four key dimensions—Richness, Visualization, Interactivity, and Personalization.
Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words, Hiroto Kurita, Goro Kobayashi, Sho Yokoi, Kentaro Inui, EMNLP 2023 Findings. 📄 Paper
The performance of sentence encoders can be significantly improved by fine-tuning with contrastive loss. However, what characteristics do models acquire during contrastive learning? This paper reveals these characteristics by shedding light on the inner workings of word weighting.
Investigating the Effectiveness of Multiple Expert Models Collaboration, Ikumi Ito, Takumi Ito, Jun Suzuki, Kentaro Inui, EMNLP 2023 Findings. 📄 Paper
To create a translation system that excels across diverse domains, this study employs a Multiple Expert Models Collaboration strategy that aggregates the specialized knowledge of individual domain-specific experts and validates the effectiveness.
Test-time Augmentation for Factual Probing, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui, EMNLP 2023 Findings. 📄 Paper
Factual probing, a method that uses prompts to assess a model's world knowledge, faces the challenge that slight variations in the prompt can drastically change the results. To tackle this issue, the study introduces test-time augmentation, which augments and ensembles prompts to reduce sensitivity.
RealTime QA: What’s the Answer Right Now?, Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Velocity Yu, Dragomir Radev, Noah A. Smith, Yejin Choi, Kentaro Inui, NeurIPS 2023. 📄 Paper
While answers to questions can evolve over time, traditional QA datasets have been built on static assumptions. This paper presents REALTIME QA, a dynamic platform that announces questions and evaluates systems on a regular basis.
Take No Shortcuts! Stick to the Rubric: A Method for Building Trustworthy Short Answer Scoring Models, Yuya Asazuma, Hiroaki Funayama, Yuichiroh Matsubayashi, Tomoya Mizumoto, Paul Reisert, Kentaro Inui, HELMeTO 2023. 📄 Paper
This paper introduces a novel strategy to enhance the trustworthiness of "Short Answer Scoring" systems in educational settings by aligning response features with rubric criteria to mitigate shortcut learning based on superficial cues in the training data.
Deterministic Compression of Word Embeddings, Yuki Nakamura, Jun Suzuki, Takumi Ito, Kentaro Inui, IEEE Access (2025). 📄 Paper
Reducing the memory required by word embeddings while still maintaining performance is crucial given their large vocabulary and high dimensionality. This paper presents a new compression method that uses a deterministic convex optimization process to produce stable and reproducible representations.
Cross-prompt Pre-finetuning of Language Models for Short Answer Scoring, Hiroaki Funayama, Yuichiroh Matsubayashi, Yuya Asazuma, Tomoya Mizumoto, Kentaro Inui, International Journal of Artificial Intelligence in Education (2025). 📄 Paper
This work improves Automated Short Answer Scoring by training on existing data and finetuning with key phrases for new prompts. Using cross-prompt data boosts accuracy and helps models generalize with limited training data.
FinchGPT: a Transformer based language model for birdsong analysis. Kosei Kobayashi, Kosuke Matsuzaki, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui, Kentaro Abe. 📄 Paper
This paper asks whether the long-range dependencies that define human language also appear in animal communication. It employs Transformer models to explore this idea in Bengalese finch songs, which are marked by highly variable and complex syllable sequences.
Large Language Models Are Human-Like Internally. Tatsuki Kuribayashi, Yohei Oseki, Souhaib Ben Taieb, Kentaro Inui, Timothy Baldwin. 📄 Paper
This paper challenges recent claims from cognitive modeling studies that LLMs poorly align with human reading behavior, showing that focusing solely on their final layers can be misleading, as revealed by a mechanistic analysis of their internal layers.
Can Language Models Handle a Non-Gregorian Calendar?. Mutsumi Sasaki, Go Kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui, Keisuke Sakaguchi, Benjamin Heinzerling. 📄 Paper
The paper evaluates how well language models manage non-Gregorian calendar systems, focusing on the Japanese calendar (wareki). Using four tasks that combine temporal knowledge and reasoning, it finds that even Japanese-centric models struggle with calendar conversions, arithmetic, and maintaining consistency across calendar systems.
Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning. Nhi Hoai Doan, Tatsuya Hiraoka, Kentaro Inui. 📄 Paper
The paper studies how repetition neurons and induction heads contribute to recognizing repetitive input patterns in language models and their role in in-context learning (ICL). They show that the effect of repetition neurons depends on layer depth. They propose methods to reduce undesirable repetition while preserving strong ICL performance.
TopK Language Models. Ryosuke Takahashi, Tatsuro Inaba, Kentaro Inui, Benjamin Heinzerling. 📄 Paper
The paper proposes TopK Language Models, which add a TopK activation to transformer layers. This yields hidden states equivalent to sparse autoencoder features without post-hoc models. TopK LMs preserve performance, enhance interpretability, and enable neuron-level interventions with stable behavior across checkpoints.
Emergence of Primacy and Recency Effect in Mamba: A Mechanistic Point of View. Muhammad Cendekia Airlangga, Hilal AlQuabeh, Munachiso S Nwadike, Kentaro Inui. 📄 Paper
This paper analyzes primacy and recency effects in the Mamba state-space language model using structured recall tasks. It finds that early tokens are preserved via sparse long-term memory channels, while recent tokens benefit from delta-modulated short-term recurrence. Memory fades with distractors, and semantic regularities shift forgetting of intermediate items.
Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance. Shintaro Ozaki, Tatsuya Hiraoka, Hiroto Otake, Hiroki Ouchi, Masaru Isonuma, Benjamin Heinzerling, Kentaro Inui, Taro Watanabe, Yusuke Miyao, Yohei Oseki, Yu Takagi. 📄 Paper
The paper investigates whether internal consistency of the “latent language” (the language LLMs think in) affects task performance. Through varying prompt input languages, experiments show that consistency is not always required: models adapt internal representations near their final layers to align with output language, reducing adverse effects.
Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge. Ying Zhang, Benjamin Heinzerling, Dongyuan Li, Ryoma Ishigaki, Yuta Hitomi, Kentaro Inui. 📄 Paper
The paper compares two training strategies for factual recall in LMs: two-stage training (store facts then recall) vs mixed training (store + recall together). Mixed training produces more “shared parameters” across tasks, concentrated in key attention heads, enabling better generalization across question forms.
Mechanistic Insights into Grokking from the Embedding Layer. H.V.AlquBoj, Hilal AlQuabeh, Velibor Bojkovic, Munachiso Nwadike, Kentaro Inui. 📄 Paper
The paper shows that embedding layers play a key role in grokking, where neural networks delay generalization. In modular arithmetic tasks, MLPs with embeddings exhibit delayed generalization; without embeddings, they generalize immediately. Embeddings are thus central to understanding when and why grokking occurs.