Artificial Intelligence Paper Rankings

AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology

Sign up for free to unlock all papers &

200papers (280 total)
63372matches
1

AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design

Sahil Rahman, Maxx Richard Rahman

1590
24
95.8%
Jun 1, 2026
2

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

Po-Nien Kung, Linfeng Song +6

1572
27
92.6%
Jun 2, 2026
3

Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack

Long P. Hoang, Hai V. Le +3

1548
29
89.7%
Jun 4, 2026
4

Zero knowledge verification for frontier AI training is possible

Pierre Peigné, Ky Nguyen +1

1545
27
92.6%
Jun 3, 2026
5

Beyond One-shot: AI Agents for Learning in Field Experiments

Junjie Luo, Ritu Agarwal +1

1542
30
86.7%
Jun 1, 2026
6

Decomposing how prompting steers behavior

Fan L. Cheng, Nikolaus Kriegeskorte

1537
20
90%
Jun 2, 2026
7

LAP: An Agent-to-Instrument Protocol for Autonomous Science

Linwu Zhu, Liqiang Gao +3

1536
22
86.4%
Jun 2, 2026
8

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

Simone Caldarella, Davide Talon +3

1529
24
83.3%
Jun 1, 2026
9

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Parth Asawa, Christopher M. Glaze +6

1528
20
85%
Jun 4, 2026
10

Closing the Loop on Latent Reasoning via Test-Time Reconstruction

Xiaopeng Yuan, Haibo Jin +5

1522
23
87%
Jun 4, 2026
11

Scaling Self-Evolving Agents via Parametric Memory

Tao Ren, Weiyao Luo +6

1520
14
64.3%
Jun 3, 2026
12

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

Jingheng Ye, Huiqi Zou +2

1518
22
77.3%
Jun 4, 2026
13

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

Xinyu Lu, Tianshu Wang +6

1517
17
70.6%
Jun 3, 2026
14

Towards World Models in Biomedical Research

Guangyu Wang, Jingkun Yue +6

1514
29
93.1%
Jun 4, 2026
15

Forget Attention: Importance-Aware Attention Is All You Need

Suhyeong Shin, Yeongwook Yang

v2
1512
23
82.6%
Jun 1, 2026
16

Agents' Last Exam

Yiyou Sun, Xinyang Han +6

1512
25
88%
Jun 3, 2026
17

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Guhong Chen, Yingcheng Shi +6

1510
29
58.6%
Jun 2, 2026
18

Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency

Qi Han Wong

1507
24
79.2%
Jun 2, 2026
19

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

Victor Ojewale, Suresh Venkatasubramanian

1504
19
84.2%
Jun 1, 2026
20

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

Zhichao Yang, Yuanze Hu +6

1501
19
73.7%
Jun 3, 2026
21

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

Wojciech Zarzecki, Jan Dubiński +1

1500
18
72.2%
Jun 2, 2026
22

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

Marquita Ellis, Paul Castro

1495
20
75%
Jun 1, 2026
23

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Zhuoming Chen, Xinrui Zhong +6

1491
21
76.2%
Jun 4, 2026
24

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Wenbo Pan, Shujie Liu +6

1487
22
86.4%
Jun 4, 2026
25

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

Xiangyu Zhao, Hengyuan Zhao +6

1485
21
76.2%
Jun 3, 2026
26

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

Kyungmin Park, Taesup Kim

1485
16
68.8%
Jun 3, 2026
27

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention

Yang Liu, ZhaoKai Luo +6

1484
17
70.6%
Jun 4, 2026
28

Can Generalist Agents Automate Data Curation?

Feiyang Kang, Hanze Li +6

1482
12
66.7%
Jun 2, 2026
29

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Mahtab Bigverdi, Lindsey Li +6

1482
22
77.3%
Jun 2, 2026
30

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Ziyan Liu, Xueda Shen +6

1481
25
80%
Jun 2, 2026
31

Reasoning Structure of Large Language Models

Frédéric Berdoz, Luca A. Lanzendörfer +2

1478
23
69.6%
Jun 2, 2026
32

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

Xizi Luo, Changhong He +3

1477
16
68.8%
Jun 3, 2026
33

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Zherui Yang, Fan Liu +2

1474
20
75%
Jun 2, 2026
34

What Makes Interaction Trajectories Effective for Training Terminal Agents?

Sidi Yang, Chaofan Tao +6

1474
18
72.2%
Jun 2, 2026
35

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Josef Chen

1473
20
60%
Jun 1, 2026
36

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

Jia Yu, Zilong Wang +3

1473
21
76.2%
Jun 2, 2026
37

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Zhangchen Xu, Junda Chen +6

1471
14
64.3%
Jun 3, 2026
38

Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System

Zhangtianyi Chen, Florensia Widjaja +4

1469
17
64.7%
Jun 3, 2026
39

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

Kuan-Yen Chen, Fang-Yi Su +1

1469
19
73.7%
Jun 4, 2026
40

AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez +4

v2
1468
22
72.7%
Jun 1, 2026
41

Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

Jiawen Zhang, Kejia Chen +6

1467
23
69.6%
Jun 4, 2026
42

Benchmark Everything Everywhere All at Once

Shiyun Xiong, Dongming Wu +6

1466
26
76.9%
Jun 4, 2026
43

When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

Chirag Parmar, Akshat Mehta +3

1465
21
76.2%
Jun 1, 2026
44

LLM Self-Recognition: Steering and Retrieving Activation Signatures

Thibaud Ardoin, Jonas Schäfer +1

1465
23
78.3%
Jun 4, 2026
45

Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization

Ayano Hiranaka, Ya-Chuan Hsu +3

1460
20
70%
Jun 4, 2026
46

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

Rayyan Abdalla, Amir Hussein +2

1458
18
66.7%
Jun 3, 2026
47

DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration

Wenkai Wang, Tao Xiong +6

1458
23
69.6%
Jun 2, 2026
48

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

Gianluca Guidi, Francesca Dominici +6

1457
20
75%
Jun 3, 2026
49

Where does Absolute Position come from in decoder-only Transformers?

Valeria Ruscio, Umberto Nanni +1

1456
20
65%
Jun 4, 2026
50

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection

Jinjie Shen, Yaxiong Wang +6

1454
23
69.6%
Jun 2, 2026
51

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Pengcheng Jiang, Zhiyi Shi +6

1454
24
58.3%
Jun 1, 2026
52

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

Youwei Liu, Jian Wang +2

1454
25
68%
Jun 1, 2026
53

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

Hao Li, Jingkun An +6

1453
21
57.1%
Jun 1, 2026
54

Iteris: Agentic Research Loops for Computational Mathematics

Leheng Chen, Zihao Liu +2

1453
22
68.2%
Jun 1, 2026
55

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

Iñaki Dellibarda Varela, R. Sendra-Arranz +6

1452
24
70.8%
Jun 1, 2026
56

Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Weiwei Ding, Zixuan Li +6

1451
22
59.1%
Jun 2, 2026
57

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

Bo-Hong Wang, Baicheng Peng +4

1450
19
57.9%
Jun 1, 2026
58

CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations

Yuxin Zhang, Yiyao Li +4

1450
16
56.2%
Jun 2, 2026
59

Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces

Nicolás Astorga, Nabeel Seedat +1

1446
17
58.8%
Jun 3, 2026
60

SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems

Linyue Pan, Yaoming Zhu +3

1443
28
67.9%
Jun 2, 2026
61

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

Yuhan Yang, Ruipu Li +1

1443
17
52.9%
Jun 3, 2026
62

TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

Ziwen Kan, Yishuo Chen +6

1442
19
68.4%
Jun 4, 2026
63

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Yuxing Lu, Yushuhong Lin +5

1440
24
70.8%
Jun 1, 2026
64

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

Qingxu Fu, Boyin Liu +3

1440
16
56.2%
Jun 3, 2026
65

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

Yaoqi Chen, Haibin Lai +6

1440
20
75%
Jun 4, 2026
66

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Shuo Ji, Yibo Li +1

1440
17
52.9%
Jun 4, 2026
67

Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition

Jinnuo Liu, Yue Peng +2

1438
17
52.9%
Jun 2, 2026
68

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Yasmine Omri, Ziyu Gan +6

1438
19
78.9%
Jun 4, 2026
69

Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models

Sunny Gupta, Shambhavi Shanker +1

1437
19
73.7%
Jun 4, 2026
70

Inducing Reasoning Primitives from Agent Traces

Zhihan Lei, Jiarui Yan +2

1437
19
68.4%
Jun 2, 2026
71

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Shangheng Du, Xiangchao Yan +6

1437
22
77.3%
Jun 4, 2026
72

DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees

Haoran Tan, Zeyu Zhang +3

1437
22
68.2%
Jun 2, 2026
73

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Dongsheng Zhu, Xuchen Ma +6

1436
16
68.8%
Jun 4, 2026
74

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

Yaoxi Shi, Cathy Mengying Fang +2

1433
15
60%
Jun 2, 2026
75

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Jui-Hui Chung, Ziyang Cai +6

1433
23
78.3%
Jun 4, 2026
76

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

Long P. Hoang, Yiran Zhao +2

1432
23
73.9%
Jun 4, 2026
77

LLM-Evolved Pattern Generators for Optimal Classical Planning

Windy Phung, Dominik Drexler +2

1432
21
66.7%
Jun 1, 2026
78

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

João Pedro Gandarela, Thiago Rios +2

1430
15
46.7%
Jun 3, 2026
79

Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo

Renjith Prasad, Chathurangi Shyalika +2

1427
17
58.8%
Jun 4, 2026
80

AIP: A Graph Representation for Learning and Governing Agent Skills

Zachary Blumenfeld, Jim Webber

1427
17
41.2%
Jun 3, 2026
81

scTranslation: A Comprehensive Benchmark for Single-Cell Multi-Omics Modality Translation

Jiabei Cheng, Jingbo Zhou +5

1427
17
41.2%
Jun 2, 2026
82

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

Yuze Gao

1423
18
61.1%
Jun 4, 2026
83

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

Lingxiang Xu, Jiaoyun Yang +3

1422
20
60%
Jun 4, 2026
84

Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation

Jingbo Wen, Liang He +1

1421
16
50%
Jun 3, 2026
85

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Zhongyu He, Yuanfan Li +6

1420
23
65.2%
Jun 1, 2026
86

BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces

Liangwei Yang, Jielin Qiu +6

1420
19
47.4%
Jun 1, 2026
87

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

Srimonti Dutta, Akshata Kishore Moharir

1419
16
56.2%
Jun 3, 2026
88

Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection

Senjie Jin, Peixin Wang +6

1418
21
57.1%
Jun 2, 2026
89

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

Fangbo Tu, Junhua Zhao +5

1418
16
56.2%
Jun 4, 2026
90

FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games

Leonardo Bertolazzi, Katya Tentori +1

1417
16
56.2%
Jun 3, 2026
91

TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents

Chengqi Dong, Chuhuai Yue +6

1417
18
66.7%
Jun 4, 2026
92

Learning Admissible Heuristics via Cost Partitioning

Hugo Barral, Quentin Cappart +2

1416
18
44.4%
Jun 3, 2026
93

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

Deguo Xia, Zihan Li +6

1415
17
52.9%
Jun 3, 2026
94

Output Type Before Quality: A Standards-Derived XAI Admissibility Rubric for Autonomous-Driving Safety

Abhinaw Priyadershi, Mandar Pitale +2

1415
16
62.5%
Jun 3, 2026
95

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

Rahul Suresh Babu, Laxmipriya Ganesh Iyer

1413
16
50%
Jun 4, 2026
96

FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

Zhe Yu, Wenpeng Xing +4

1412
16
56.2%
Jun 4, 2026
97

Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

Can Gurkan, Forrest Stonedahl +1

1411
17
58.8%
Jun 3, 2026
98

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

Yohann Benchetrit, Marlène Careil +4

1411
25
68%
Jun 4, 2026
99

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models

Hongyu Guo, Hao Li +3

1408
22
59.1%
Jun 2, 2026
100

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Xu Wan, Speed Zhu +5

1408
17
52.9%
Jun 2, 2026
101

Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers

Edward Y. Chang

1408
16
68.8%
Jun 3, 2026
102

Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

Amjad Ibrahim, Yong Li

1408
18
50%
Jun 2, 2026
103

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

Jianxin Yan, Wangze Ni +6

1403
18
55.6%
Jun 4, 2026
104

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

Saroj Mishra

1403
17
52.9%
Jun 3, 2026
105

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

Tong Bai, Zhenglin Wan +5

1402
25
56%
Jun 2, 2026
106

Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

Dianxing Shi, Junqi He +3

1401
19
52.6%
Jun 4, 2026
107

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows

Yuhang Fu, Ruishan Fang +5

1399
20
75%
Jun 4, 2026
108

SentinelBench: A Benchmark for Long-Running Monitoring Agents

Matheus Kunzler Maldaner, Adam Fourney +6

1399
16
50%
Jun 3, 2026
109

A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice

Ranjan Mishra, Jakob Schoeffer

1398
18
61.1%
Jun 4, 2026
110

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

Yuxiao Ye, Haoran He +5

1397
17
64.7%
Jun 4, 2026
111

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

Dipesh KC, Anjila Budathoki

1396
23
52.2%
Jun 1, 2026
112

From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents

Patrick Wilhelm, Odej Kao

1396
18
61.1%
Jun 4, 2026
113

Unveiling the Structure of Do-Calculus Reasoning via Derivation Graphs

Clément Yvernes, Emilie Devijver +2

1395
20
50%
Jun 2, 2026
114

AdaMEM: Test-Time Adaptive Memory for Language Agents

Yunxiang Zhang, Yiheng Li +2

1395
16
56.2%
Jun 4, 2026
115

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

Zeyang Yue, Chenfei Yan +6

1394
18
61.1%
Jun 4, 2026
116

Unsupervised Skill Discovery for Agentic Data Analysis

Zhisong Qiu, Kangqi Song +5

1393
15
60%
Jun 4, 2026
117

Bridging Auxiliary Constraints to Resolve Instruction Following in Large Reasoning Models

Zhengyi Zhao, Shubo Zhang +6

1393
16
43.8%
Jun 2, 2026
118

InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain

Tiancheng Han, Yong Li +3

1392
22
50%
Jun 2, 2026
119

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

Santiago Amaya-Corredor, Miguel Calvo-Fullana +1

1390
28
57.1%
Jun 1, 2026
120

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

Ruihui Hou, Siyi Zhu +5

1390
21
42.9%
Jun 2, 2026
121

Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation

Haocheng Luo, Jiahui Liu +6

1389
16
50%
Jun 4, 2026
122

SkillPyramid: A Hierarchical Skill Consolidation Framework for Self-Evolving Agents

Yuan Xiong, Ziqi Miao +6

1388
24
62.5%
Jun 2, 2026
123

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

Kokil Jaidka, Saifuddin Ahmed

1388
18
50%
Jun 3, 2026
124

PLAN-S: Bridging Planning with Latent Style Dynamics for Autonomous Driving World Models

Xiaoyun Qiu, Jingtao He +5

1386
18
55.6%
Jun 4, 2026
125

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

Sihang Zeng, Matthew Thompson +2

1384
22
50%
Jun 1, 2026
126

EvoDrive: Pareto Evolution for Safety-Critical Autonomous Driving via Self-Improving LLM Agents

Tong Nie, Yuewen Mei +5

1384
20
55%
Jun 2, 2026
127

Knowledge Index of Noah's Ark

Sheng Jin, Minghao Liu +6

v2
1382
15
60%
Jun 3, 2026
128

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

Prashanth Vijayaraghavan, Apoorva Nitsure +3

1382
15
46.7%
Jun 2, 2026
129

From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents

Yuhao Sun, Jiacheng Zhang +4

1382
19
63.2%
Jun 4, 2026
130

MOC: Multi-Order Communication in LLM-based Multi-Agent Systems

Yao Guan, Lin Wang +4

1380
19
42.1%
Jun 1, 2026
131

Individual Gain, Collective Loss: Metacognitive Adaptation in AI-Assisted Creativity

Anna Mikeda

1380
19
63.2%
Jun 4, 2026
132

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Jiateng Liu, Bingxuan Li +6

1378
22
54.5%
Jun 3, 2026
133

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

Steffen Knoblauch, Hao Li +4

1378
16
50%
Jun 1, 2026
134

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Chen Huang, Yuhao Wu +1

1377
17
41.2%
Jun 3, 2026
135

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

Manvendra Modgil

1376
18
50%
Jun 2, 2026
136

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su +6

1375
22
50%
Jun 1, 2026
137

Harnessing Generalist Agents for Contextualized Time Series

Zihao Li, Kaifeng Jin +6

1374
16
56.2%
Jun 3, 2026
138

TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning

Shunyu Wu, Dan Li +6

1373
21
47.6%
Jun 2, 2026
139

PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage

Keqi Han, Ryan Young +6

1371
15
46.7%
Jun 3, 2026
140

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Taewon Yun, Hyeonseong Park +4

1371
18
44.4%
Jun 4, 2026
141

PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation

Nicolas Bougie, Xiaotong Ye +2

1370
21
42.9%
Jun 4, 2026
142

Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

Zhikai Chen, Jialiang Gu +6

1369
19
47.4%
Jun 3, 2026
143

Solipsistic Superintelligence is Unlikely to be Cooperative

Rakshit S Trivedi, Natasha Jaques +3

1369
22
45.5%
Jun 2, 2026
144

DMF: A Deterministic Memory Framework for Conversational AI Agents

Matteo Stabile, Enrico Zimuel

1369
20
50%
Jun 2, 2026
145

When to Re-Plan: Subgoal Persistence in Hierarchical Latent Reasoning

Ayushi Chadha

1368
20
55%
Jun 2, 2026
146

Tracking the Behavioral Trajectories of Adapting Agents

Jonah Leshin, Manish Shah +1

1367
31
48.4%
Jun 1, 2026
147

Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

Jiaju Chen, Yuxuan Lu +6

1366
17
47.1%
Jun 4, 2026
148

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

Haoyu Zhou, Qing Qing +6

1366
18
44.4%
Jun 4, 2026
149

Repair Before Veto: Repair-Augmented Constraint Learning for Contextual Decisions

Yifan Wang

1366
23
47.8%
Jun 1, 2026
150

Tree-Based Formalization of Multi-Agent Complementarity in Human-AI Interactions

Andrea Ferrario

1365
20
40%
Jun 3, 2026
151

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

Giulia Pucci, Emily Hemendinger +4

1363
22
54.5%
Jun 1, 2026
152

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

Joel Sol, Homayoun Najjaran

1362
18
38.9%
Jun 2, 2026
153

DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions

Nathan Bout, Maxime Langevin +1

1361
16
50%
Jun 4, 2026
154

ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents

Anjie Liu, Yan Song +4

1360
19
31.6%
Jun 2, 2026
155

Residual Modeling for High-Fidelity Learned Compression of Scientific Data

Liangji Zhu, Sanjay Ranka +1

1359
15
46.7%
Jun 3, 2026
156

BigFinanceBench: A Workflow-Grounded Benchmark for Financial-Research Agents

Alex Wang, Georg Meinhardt +5

1359
17
47.1%
Jun 2, 2026
157

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

Jiaxi Li, Ke Deng +6

1358
15
46.7%
Jun 3, 2026
158

DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance

Yansi Li, Zhuosheng Zhang

1355
18
55.6%
Jun 4, 2026
159

Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

Arslan Bisharat, Brian Ortiz +6

1355
19
42.1%
Jun 4, 2026
160

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

Saket Reddy, Ke Yang +1

1354
16
43.8%
Jun 3, 2026
161

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection

Gautam Gare, John Galeotti +3

1352
20
35%
Jun 2, 2026
162

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

Yanjing Ren, Reza Ebrahimi +1

1351
20
40%
Jun 3, 2026
163

Hedge-Bench: Benchmarking Agents on Hard, Realistic Tasks Pertaining to Financial Reasoning

Eric Cho, Shawn Huang +2

1350
18
44.4%
Jun 2, 2026
164

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Yan Wang, Xuguang Ai +6

1350
22
40.9%
Jun 2, 2026
165

StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems

Taiyu Zhu, Yifan Wu +3

1350
21
42.9%
Jun 2, 2026
166

The Digital Apprentice: A Framework for Human-Directed Agentic AI Development

Travis Weber, Rohit Taneja

1349
18
55.6%
Jun 3, 2026
167

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

Yiming Lu, Sihang Zeng +4

1349
20
40%
Jun 3, 2026
168

Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming

Shah Pallav Dhanendrakumar, Saikat Pal +1

1346
21
38.1%
Jun 4, 2026
169

From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting

Mingyang Liu, Qingcan Kang +6

1345
19
36.8%
Jun 2, 2026
170

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

Kaiqi Yang, Tai-Quan Peng +2

1344
28
32.1%
Jun 2, 2026
171

Bridging the Last Mile of Time Series Forecasting with LLM Agents

Yuhua Liao, Zetian Wang +2

1343
20
45%
Jun 1, 2026
172

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

Jincheng Yu, Haoyang Li +6

1343
16
43.8%
Jun 3, 2026
173

An Infectious Disease Spread Simulation Based on Large Language Model Decision Making

Yonchanok Khaokaew, Ruochen Kong +6

1343
19
36.8%
Jun 4, 2026
174

Characterizing initial human-AI proof formalization workflows

Katherine M. Collins, Simon Frieder +6

1342
16
43.8%
Jun 2, 2026
175

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

Bonan Shen, Youting Wang +2

1341
21
47.6%
Jun 4, 2026
176

Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering

Thomas Eiter, Nelson Higuera Ruiz +1

1339
20
45%
Jun 2, 2026
177

Agentic Molecular Recovery via Molecule-Aware Exploration

Suwan Yoon, Changhee Lee

1336
20
40%
Jun 4, 2026
178

Multi-ResNets for Subspace Preconditioning in Constrained Optimization

Merve Karakas, Christopher J. Williams +4

1335
20
35%
Jun 4, 2026
179

WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation

Shengtao Zheng, Kai Li +6

1334
18
38.9%
Jun 4, 2026
180

Answer Presence Drives RAG Rewriting Gains

Yuejie Li, Yueying Hua +6

1334
17
35.3%
Jun 4, 2026
181

Parthenon Law: A Self-Evolving Legal-Agent Framework

Hejia Geng, Leo Liu

1333
17
23.5%
Jun 3, 2026
182

A formal definition and meta-model for a machine theory of mind

Fabio Cuzzolin

1333
17
41.2%
Jun 2, 2026
183

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts

Yiming Fu, Peixuan Liu +2

1332
25
32%
Jun 2, 2026
184

The DeepSpeak-Agentic Dataset

Sarah Barrington, Maty Bohacek +1

1331
21
33.3%
Jun 2, 2026
185

Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

Nehal Afifi, Mehdi Khabou +6

1330
18
44.4%
Jun 3, 2026
186

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

Thanh Luong Tuan, Abhijit Sanyal

v2
1329
16
37.5%
Jun 2, 2026
187

When AI Says It Feels

Shin-nosuke Ishikawa, Seiya Ikeda +1

1327
18
38.9%
Jun 4, 2026
188

Retry Policy Gradients in Continuous Action Spaces

Soichiro Nishimori, Paavo Parmas

1325
17
35.3%
Jun 4, 2026
189

BiNSGPS: Geometry Problem Solving via Bidirectional Neuro-Symbolic Interaction

Qi Wang, Peijie Wang +2

1325
17
29.4%
Jun 3, 2026
190

GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory

Noujoud Nader, Ibrahem Aljabea +2

1323
25
32%
Jun 2, 2026
191

Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs

Grama Chethan

1320
21
33.3%
Jun 4, 2026
192

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

Wenxuan Wang, Haoyu Sun +5

1319
16
37.5%
Jun 4, 2026
193

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Mengyi Deng, Zhiwei Li +5

1318
23
30.4%
Jun 2, 2026
194

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

Wenhao Wang, Peizhi Niu +6

1313
20
20%
Jun 1, 2026
195

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

Yuyang Li, Zihe Yan +1

1313
25
36%
Jun 1, 2026
196

Evaluating Agentic Configuration Repair for Computer Networks

Rufat Asadli, Benjamin Hoffman +2

1311
19
36.8%
Jun 4, 2026
197

Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents

Zhijie Ding, Weinan Hong +6

1308
18
33.3%
Jun 2, 2026
198

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

Phillip Jiang

1308
26
30.8%
Jun 2, 2026
199

TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management

Shweta Mishra

1308
20
30%
Jun 4, 2026
200

GITCO: Gated Inference-Time Context Optimization in TSFMs

Manya Pandey, Dhruv Kumar +2

1305
19
21.1%
Jun 3, 2026
Win-rate scores from pairwise comparisons with 95% confidence intervals. Papers compared using full-text deep analysis.