Posters

ISCOL 2025 • December 18th, 2025

Session 1 (10:15 - 11:15)

LCHAIM - Investigating Long Context Reasoning in Hebrew

Ehud Malul, Oriel Perets, Ziv Mor, Yigal Kassel, Elior Sulem

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

Or Shachar, Uri Katz, Yoav Goldberg, Oren Glickman

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness in LLMs

Shir Ashury-Tahan, Yifan Mai, Rajmohan C, Ariel Gera, Yotam Perlitz, Asaf Yehudai, Elron Bandel, Leshem Choshen, Eyal Shnarch, Percy Liang, Michal Shmueli-Scheuer

Using Natural Language Inference and Inferentialist Theory to Assess Meaning Similarity in Text Generation

Reto Gubelmann, Christina Niklaus, Thomas Huber

Not Just a Piece of Cake: Cross-Lingual Fine-Tuning for Idiom Identification

Ofri Hefetz, Kai Golan Hashiloni, Alon Mannor, Kfir Bar

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Yonatan Bitton, Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Idan Szpektor, Kai-Wei Chang

Aligning What LLMs Do and Say: Towards Self-Consistent Explanations

Sahar Admoni, Ofra Amir, Assaf Hallak, Yftah Ziser

ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models

Shir Ashury-Tahan, Yifan Mai, Elron Bandel, Michal Shmueli-Scheuer, Leshem Choshen

Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Itay Nakash, George Kour, Guy Uziel, Ateret Anaby Tavor

Comparing human and language models sentence processing difficulties on complex structures

Samuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant

Uncovering Measurement Biases in LLM Embedding Spaces: The Anna Karenina Principle and Its Implications for Automated Feedback

Abigail Gurin Schleifer, Beata Beigman Klebanov, Giora Alexandron

Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark

Hila Gonen, Shachar Mirkin, Yuval Pinter

User-Centric Evidence Ranking for Attribution and Fact Verification

Guy Alt, Eran Hirsch, Serwar Basch, Ido Dagan, Oren Glickman

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

Omer Nahum, Nitay Calderon, Orgad Keller, Idan Szpektor, Roi Reichart

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

Probing Subphonemes in Morphology Models

Gal Astrach, Yuval Pinter

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer

Not Your Typical Sycophant: Evaluating Sycophancy of Large Language Models

Oren Tsur, Shahar Ben Natan

JuStRank: Benchmarking LLM Judges for System Ranking

Ariel Gera, Odellia Boni, Yotam Perlitz, Roy Bar-Haim, Lilach Eden, Asaf Yehudai

Automatic biblical authorship attribution

Shira Faigenbaum-Golovin, Alon Kipnis, Axel Bühler, Eliezer Piasetzky, Thomas Römer, Israel Finkelstein

A Survey on Evaluation of LLM-based Agents

Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-Scheuer

Detecting (Un)answerability in Large Language Models with Linear Directions

Maor Juliet Lavi, Tova Milo, Mor Geva

Don’t lie to your friends: Learning what you know from collaborative self-play

Jonathan Berant, Reza Aghajani, Jacob Eisenstein, Adam Fisch, Dheeru Dua, Fantine Eri Huot, Mirella Lapata, Vicky Zayats

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Tomer Ashuach, Martin Tutek, Yonatan Belinkov

Effective Red-Teaming of Policy-Adherent Agents

Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby Tavor

Integrating Morphological Structure into Word Embedding Representations

Dror Mughaz

Detecting Conspiracies in Hebrew Twitter with LLM-GNN Fusion

Lior Biton, Oren Tsur

Localizing Factual Inconsistencies in Attributable Text Generation

Arie Cattan, Paul Roit, Shiyue Zhang, David Wan, Roee Aharoni, Idan Szpektor, Mohit Bansal, Ido Dagan

MINT: Meaning Integrating Tokenizer

Ibraheem Abo Shakra, Yuval Pinter

d-chi Stencil: A Differential Privacy Mechanism for Interacting with LLMs

Re'em Harel, Yuval Pinter, Niv Gilboa

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

Aviya Maimon

CRISP: Persistent Concept Unlearning via Sparse Autoencoders

Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov

Grade: Quantifying sample diversity in text-to-image models

Royi Rassin, Aviv Slobodkin, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

Decoding Reading Goals from Eye Movements

Omer Shubi, Cfir Avraham Hadar, Yevgeni Berzak

Vocab Diet: Reshaping the Vocabulary of LLMs with Vector Arithmetic

Yuval Reif, Guy Kaplan, Roy Schwartz

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

Sapir Harary, Eran Hirsch, Aviv Slobodkin, David Wan, Mohit Bansal, Ido Dagan

Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach

Oren Sultan, Eitan Stern, Dafna Shahaf

Large Temporal Models: Unlocking Temporal Understanding in LLMs for Temporal Relation Classification

Omri Homburger, Kfir Bar

Fine-Grained Arabic Offensive Language Classification with Taxonomy, Sentiment, and Emotions

Natalia Vanetik, Marina Litvak, Chaya Liebeskind

TwoHillsLab: A Scalable Platform for Quantitative Analysis of Biblical Hebrew

Guy Shaked

DRAGged into CONFLICTS: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

Arie Cattan, Alon Jacovi, Ori Ram, Jonathan Herzig, Roee Aharoni, Sasha Goldshtein, Eran Ofek, Idan Szpektor, Avi Caciularu

Beyond Pairwise: Global Zero-shot Temporal Graph Generation

Alon Eirew, Kfir Bar, Ido Dagan

Inside-Out: Hidden Factual Knowledge in LLMs

Zorik Gekhman, Eyal Ben-David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpektor, Roi Reichart

Beyond the Noise: Aligning Prompts with Latent Representations in Diffusion Models

Vasco Ramos, Regev Cohen, Idan Szpektor, Joao Magalhaes

Do LLMs Understand Harmfulness?

Hadas Orgad, Boyi Wei, Kaden Zheng, Peter Henderson, Seraphina Goldfarb-Tarrant, Yonatan Belinkov

Session 2 (13:45 - 14:45)

The Distracting Effect: Understanding Irrelevant Passages in RAG

Chen Amiraz, Florin Cuconasu, Simone Filice, Zohar Karnin

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

Nitay Calderon, Roi Reichart, Rotem Dror

Connections Between The Pre-Training Data To Model Representations

Omer Ben Shahar, Guy Kaplan, Roy Schwartz

Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation

Noy Sternlicht, Ariel Gera, Roy Bar-Haim, Tom Hope, Noam Slonim

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Gaurav Jain, Oren Pereg, Moshe Wasserblat, David Harel, Roy Schwartz

Segment-Based Attention Masking for GPTs

Shahar Katz, Liran Ringel, Yaniv Romano, Lior Wolf

Easy as PIE? Identifying Multi-Word Expressions with LLMs

Kai Golan Hashiloni, Ofri Hefetz, Kfir Bar

Déjà Vu? Decoding Repeated Reading from Eye Movements

Yoav Meiri, Omer Shubi, Cfir Avraham Hadar, Ariel Kreisberg Nitzav, Yevgeni Berzak

Multi-Domain Explainability of Preferences

Nitay Calderon, Liat Ein-Dor, Roi Reichart

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Adi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkov

Reverse-Engineering the Retrieval Process in GenIR Models

Anja Reusch, Yonatan Belinkov

How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

Sahil Verma, Royi Rassin, Arnav Mohanty Das, Gantavya Bhatt, Preethi Seshadri, Chirag Shah, Jeff Bilmes, Hannaneh Hajishirzi, Yanai Elazar

Estimating Scientific Quality on the Web: A Multilingual LLM Approach

Nir Grinberg, Or Meiri, Ayelet Baram-Tsabari

Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource Languages

Noam Dahan, Omer Kidron, Gabriel Stanovsky

QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs

Maria Tseytlin, Paul Roit, Omri Abend, Ido Dagan, Ayal Klein

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

Alan Arazi, Eilam Shapira, Roi Reichart

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization

Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty

Cross-lingual Extractive Question Answering with Unanswerable Questions

Yuval Gorodissky, Elior Sulem, Dan Roth

How Much Pretraining Does Structured Data Need?

Daniel Fadlon, Kfir Bar

CToT: Causal Tree of Thoughts for Inference-Time Compute

Zachary Elisha Bamberger, Till Raphael Saenger, Ofra Amir, Brandon M. Stewart, Amir Feder

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Daniela Gottesman, Mor Geva, Yoav Gur-Arieh, Ido Cohen, Ori Yoran, Marius Mosbach

Distilling Examples into Task Instructions: Enhanced In-Context Learning for Long B2B Conversations

Guy Rotman, Adi Kopilov, Danit Berger Zalmanson, Omri Allouche

Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification?

Adiel Meir, Kfir Bar

Hyper-RAG

Bar Block, Yuval Pinter

The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora

Chen Amiraz, Yaroslav Fyodorov, Elad Haramaty, Zohar Karnin, Liane Lewin-Eytan

Will it Merge? Causes of Model Mergeability

Adir Rahamim, Asaf Yehudai, Boaz Carmeli, Leshem Choshen, Yosi Mass, Yonatan Belinkov

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David, Elad Hoffer, Roi Reichart

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

Yoav Gur-Arieh, Mor Geva, Atticus Geiger

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs with Value-Prompting

Asaf Yehudai, Naama Rozen, Ariel Gera

Beyond Word Boundaries: A Hebrew Coreference Benchmark for Morphologically Complex Text

Refael Shaked Greenfeld, Reut Tsarfaty

Word Pyramid Puzzles as a Multi-lingually Diverse Reasoning Benchmark

Omer Noy, Reut Tsarfaty, Omer Goldman

Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing

Yanir Marmor, Yair Lifshitz, Yoad Snapir, Kinneret Misgav

Consensus or Conflict? Fine-Grained Evaluation of Conflicting Answers in Question-Answering

Eviatar Nachshoni, Arie Cattan, Shmuel Amar, Ori Shapira, Ido Dagan

Factual Retrieval in LLMs Is a Redundant, Non-Contiguous Process

Hail Hochman, Natalie Shapira, Yoav Goldberg

Pixels at BAREC Shared Task 2025: Visual Arabic Readability Assessment

Ben Sapirstein

How Should We Evaluate LLM Reasoning Quality For Fact Verification?

Ron Eliav, Arie Cattan, Eran Hirsch, Shahaf Bassan, Elias Stengel-Eskin, Mohit Bansal, Ido Dagan

Unveiling the spectrum of Arabic offensive language: Taxonomy and insights

Chaya Liebeskind, Yossef Haim Shrem, Marina Litvak, Natalia Vanetik

Differences in Input and Output Quality of LLMs Across Age Groups

Shira Darchi, Yuval Pinter

Inferring Functionality of Attention Heads from their Parameters

Amit Elhelo, Mor Geva

Break Out the Silverware: Semantic Understanding of Stored Household Items

Michaela Levi Richter, Reuth Mirsky, Oren Glickman

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

Chen Shani, Liron Soffer, Dan Jurafsky, Yann LeCun, Ravid Shwartz-Ziv

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models

Mor Ventura, Michael Toker, Or Patashnik, Yonatan Belinkov, Roi Reichart

Prompts in the Wild: A Large Analyzed Collection of Prompts in Code

Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

Session 3 (16:40 - 17:40)

Readability Formulas, Systems and LLMs are Poor Predictors of Reading Ease

Keren Gruteke Klein, Shachar Frenkel, Omer Shubi, Yevgeni Berzak

PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation

Eliya Habba, Noam Dahan, Gili Lior, Gabriel Stanovsky

ArTyDi-QA: Question Answering and Question Generation in Arabic

Guy Mor-Lan, Abubakr Babiker, Reut Tsarfaty

mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations

Guy Dar

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Eilam Shapira, Omer Madmon, Itamar Reinman, Samuel Joseph Amouyal, Roi Reichart, Moshe Tennenholtz

LiveRAG: A diverse Q&A dataset with varying difficulty level for RAG evaluation

David Carmel, Simone Filice, Guy Horowitz, Yoelle Maarek, Alex Shtoff, Oren Somekh, Ran Tavory

A Holistic Approach towards Vocabulary Expansion for Language Adaptation

Orian Dabod, Amir David Nissan Cohen, Gabriel Stanovsky

Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content

Dana Sotto Porat, Ella Rabinovich

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Yehonatan Peisakhovsky, Zorik Gekhman, Roi Reichart, Yosi Mass, Liat Ein-Dor

Effective QA-driven Annotation of Predicate-Argument Relations Across Languages

Jonathan Davidov, Aviv Slobodkin, Shmuel Tomi Klein, Reut Tsarfaty, Ido Dagan, Ayal Klein

Towards Enforcing Company Policy Adherence in Agentic Workflows

Naama Zwerdling, David Boaz, Ella Rabinovich, Guy Uziel, David Amid, Ateret Anaby Tavor

Out-of-Context Reasoning in Large Language Models

Jonathan Shaki, Emanuele La Malfa, Michael J. Wooldridge, Sarit Kraus

SpeLLM: Character-Level Multi-Head Decoding

Amit Ben-Artzy, Roy Schwartz

PaperFinder: a State-of-the-art LLM-based Scientific Search Agent

Yoav Goldberg, Dan Bareket, Aryeh Tiktinsky, Micah Shlain, Ben Eyal, Mark Polak, Menny Pinhasov, Sigal Rahamimov, Uri Katz, Guy Wiener

Dementia Through Different Eyes: Explainable Modeling of Human and LLM Perceptions for Early Awareness

Lotem Peled-Cohen, Maya Zadok, Nitay Calderon, Hila Gonen, Roi Reichart

Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination

Moran Mizrahi, Chen Shani, Gabriel Stanovsky, Dan Jurafsky, Dafna Shahaf

Making LVLMs Look Twice: Contrastive Decoding with Contrast Images

Avshalom Manevich, Reut Tsarfaty

Keep Guessing? When Considering Inference Scaling, Mind the Baselines

Or Honovich, Gal Yona, Omer Levy, Roee Aharoni

Letting the Data Speak: Automating Schema Discovery for Research

Eliya Habba, Renana Keydar, Gabriel Stanovsky

The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech

Naama Rivlin-Angert, Guy Mor-Lan

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Shahar Levy, Nir Mazor, Lihi Shalmon, Michael Hassid, Gabriel Stanovsky

Where Did That Come From? Sentence-Level Error-Tolerant Attribution

Ori Ernst, Aviv Slobodkin, Meng Cao, Sihui Wei, Jackie CK Cheung

Hexagen: Improving Abstraction Reasoning Through Code Execution

Amit Lisha, Reut Tsarfaty

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Yoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, Mor Geva

Differential Mamba

Nadav Schneider, Itamar Zimerman, Eliya Nachmani

Precise In-Parameter Concept Erasure in Large Language Models

Yoav Gur-Arieh, Clara Haya Suslik, Yihuai Hong, Fazl Barez, Mor Geva

SAEs Are Good for Steering - If You Select the Right Features

Dana Arad, Aaron Mueller, Yonatan Belinkov

Retrieve, Learn, Refine: An Interleaved Retrieval–Learning Agent for Exhaustive IR

Binyamin Cohen, Uri Katz, Yoav Goldberg

Expectation management shifts the representation of unexpectedness

Benjamin Menashe, Michal Ben-Shachar

LAQuer: Localized Attribution Queries in Content-grounded Generation

Eran Hirsch, Aviv Slobodkin, David Wan, Elias Stengel-Eskin, Mohit Bansal, Ido Dagan

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

Shachar Don-Yehiya, Leshem Choshen, Omri Abend

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Liran Ringel, Elad Tolochinsky, Yaniv Romano

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Or David Shafran, Atticus Geiger, Mor Geva

Cross-Lingual and Cross-Cultural Variation in Image Descriptions

Uri Berger, Edoardo Ponti

A Unifying Scheme for Extractive Content Selection Tasks

Shmuel Amar, Ori Shapira, Aviv Slobodkin, Ido Dagan

MRLEval: A Benchmark for LLM Evaluation in Hebrew, Modern Standard Arabic and Levantine Arabic

Guy Mor-Lan, Reut Tsarfaty

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Niv Eckhaus, Uri Berger, Gabriel Stanovsky

ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer

Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Adi Mayrav Gilady, Idan Szpektor, Reut Tsarfaty, Matan Eyal

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

Gili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel Stanovsky

SastBench: A Benchmark for Testing Agentic SAST Triage

Jake Feiglin, Guy Dar

CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

Noy Sternlicht, Tom Hope