M-A-P Daily Paper

The M-A-P daily paper project curates and reviews a selection of new papers published daily on arXiv, providing insightful commentary on cutting-edge research across various scientific disciplines.

back to main page

🛠️ Papers This Week

(Expand to View)

13/9/2024

Paper	Comments
A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges	Discusses an intriguing topic within data collection in RL simulation environments: ICRL, referring to the implicit constraints adhered to by expert agents, and leveraging experience obtained from both the environment and the observed demonstration dataset.
The Role of Deep Learning Regularizations on Actors in Offline RL	Highlights that the generalization of the Actor network remains a significant bottleneck in Offline RL, and examines the effectiveness of classic Deep Learning regularizations. The ablation study on trick ensembling is particularly interesting.
What Makes a Maze Look Like a Maze?	A type of visual Chain-of-Thought (CoT); Tables 1 and 2 appear quite impressive.
AudioBERT: Audio Knowledge Augmented Language Model
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?	A highly challenging benchmark for structured data processing.
Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation
Learning Causally Invariant Reward Functions from Diverse Demonstrations
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

12/9/2024

Paper	Comments
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation	Introduces PF-PPO, which applies the coefficient of determination (R2) between rewards and actual scores on filtered samples as a metric to filter out noisy rewards in cases involving extended reasoning steps. Based on OpenRLHF, it is easily adaptable.
What to align in multimodal contrastive learning?
Synthetic Continued Pretraining	Proposes a potentially effective approach to expand domain-specific data with synthetic data. This method seems to involve abstracting entities, relevant descriptions, and potential relationships while generating synthetic data that describes potential CoT and relationships. It resembles a complex version of Hotpot QA. Scalable, though its utility is uncertain. Tatsunori's team frequently presents interesting ideas; a recent related reference is "IMPROVING PRETRAINING DATA USING PERPLEXITY CORRELATIONS."
Recurrent Aggregators in Neural Algorithmic Reasoning
Generative Hierarchical Materials Search	Similar to the previous entry, this GDM work does not yet reveal significant practical utility.
AGENT WORKFLOW MEMORY
FreeRide: Harvesting Bubbles in Pipeline Parallelism
You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling	Interactive storytelling is a niche yet intriguing field. This paper introduces a straightforward approach by incorporating fixed story background elements and the role-playing dice mechanics of tabletop RPGs. An exciting concept for RPG enthusiasts, with potential interest among interactive novel generation fans.
Neural Algorithmic Reasoning with Multiple Correct Solutions

11/9/2024

Paper	Comments
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data	A practical approach for automatically generating synthetic data for code incorporating security libraries. This approach appears relatively user-friendly and easy to integrate.
Learning Generative Interactive Environments By Trained Agent Exploration	A toy project in Google's Genie data generation direction. This area has limited current attention but is highly interesting. The Genie paper briefly touches on data collection and training. An intriguing topic is whether and how Decision Transformers can generalize across multiple games, a direction that Google has consistently explored. Related literature: Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction, Multi-Game Decision Transformers.
An End-to-End Approach for Chord-Conditioned Song Generation	Related reading for SongCreator, conducted by the same research team.
SongCreator: Lyrics-based Universal Song Generation	Related work in universal song generation using lyrics-based conditioning.
Quantifying and Enabling the Interpretability of CLIP-like Models	A collaboration between Berkeley and Intel, presenting a practical library for interpreting CLIP, CLIP-InterpreT, which offers five types of analyses: property-based nearest neighbor search, per-head topic segmentation, contrastive segmentation, per-head nearest neighbors of an image, and per-head nearest neighbors of text. Notably, this study highlights that larger CLIP models learn representations significantly stronger than those of relatively smaller models.
LLaMA-Omni: Seamless Speech Interaction with Large Language Models	Constructs a speech interaction dataset, InstructS2S-200k, which could be of some utility, albeit situational.
Geometric-Averaged Preference Optimization for Soft Preference Labels	A contribution from GDM, introducing distributional soft preference labels within DPO to capture the potential distributional differences in individual or annotator preferences. This approach integrates effectively into various PPO families.
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis	Work on video-to-audio synthesis using multi-instruction techniques.
Doppelgänger's Watch: A Split Objective Approach to Large Language Models	Meta's work exploring potential applications.
Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity	A study by Google that models datasets and tasks in a relatively general manner. It is unclear if this approach could be directly adopted on different LLM dataset subsets. They replaced combination experiments with individual fit experiments for each subset, then trained a linearized model for fitting.
𝕌𝕊ℂ𝔻: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Work addressing code generation improvements in LLMs through uncertainty-aware selective contrastive decoding.
Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks	This paper provides an intriguing observation: like ICL, CoT mainly retrieves Task/Reasoning Prior and processes input patterns based on these priors. Studying the extent to which these priors are learned during pre-training and how they are activated during alignment and inference could be an important topic. Recommended reading: What Do Language Models Learn in Context? The Structured Task Hypothesis.
DiPT: Enhancing LLM reasoning through diversified perspective-taking	Another approach similar to CoT+BoN, introducing the concept of Perspective.

10/9/2024

Paper	Comments
Towards a Unified View of Preference Learning for Large Language Models: A Survey	Overview of preference data.
Benchmarking Chinese Knowledge Rectification in Large Language Models	Scenarios involving idioms and humor explanations may serve as valuable sources for LLM testing items, focusing on benchmarks that lean towards understanding cultural metaphors in Chinese.
MMEVOL: EMPOWERING MULTIMODAL LARGE LANGUAGE MODELS WITH EVOL-INSTRUCT	As the title suggests.
Semifactual Explanations for Reinforcement Learning	Through the introduction of the concept of Semifactual testing, this work aims to enhance the understanding of RL agent behaviors. The "Even If" design seems potentially applicable to the value alignment testing of language models.
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small	Sparse Autoencoders (SAEs) currently do not suffice for causal analysis, and conclusions may lack certainty.
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models	A synthetic data method for long-context fine-tuning, somewhat similar to needle-in-a-haystack combined with sentence order prediction. Additional recommended readings on synthetic data for long contexts include: Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models, and LONGCITE: ENABLING LLMS TO GENERATE FINEGRAINED CITATIONS IN LONG-CONTEXT QA.
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models	For specific queries, when current model responses are acceptable, outputs closer to the base model are used; otherwise, they are defined as a new skill, upon which the SSR algorithm is designed. However, experiments lack robustness, despite claims that SSR improves generalization. Recommended readings: Language Models Resist Alignment and Reward-Directed Score-Based Diffusion Models via q-Learning.

9/9/2024

Paper	Comments
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs	The study examined the relationship between Knowledge Retrieval within the model and In-Context Learning (ICL), particularly focusing on the data efficiency of ICL examples. The research provided interesting perspectives and was conducted on three toy regression datasets. The experimental results may have been influenced by the atomic computation capabilities.
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data	It is an effective scheme for code data decontamination and code instruction data pruning.
MULTI-PROGRAMMING LANGUAGE ENSEMBLE FOR CODE GENERATION IN LARGE LANGUAGE MODEL	As stated in the title.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers	The experiment design for testing humans on complex problems was interesting and yielded intriguing results. Although many ideas proposed by AI were not feasible, they were notably more novel than those of most human researchers, especially after being re-ranked

If you are intereted in the work published by us, please navigate to our full paper list.