projects | Zach Studdiford

Uncovering the Computational Ingredients of Human-Like Representations in LLMs

What components of transformer pre/post training support more human-like representations? We use a cognitive science inspired triplet-task to obtain semantic representations across more than 70 open source models, and compare these to semantic representations in humans.

Contextual Effects in Human and LLM Causal Reasoning

Are human world model representations qualitatively different from those of LLMs? What kinds of grounded causal inference tasks are easy for people and difficult for models? We evaluate performance on simple causal reasoning problems in humans and LLMs, and find convergences in both the behavior and representations elicited from people and models.

Evaluating Steering Techniques using Human Similarity Judgments

What kinds of semantic respresentations support tasks requiring cognitive control in humans? We compare human and LLM representations obtained from behavioral judgments, and evaluate the extent to which popular steering methods (SAEs, task vectors, etc.) support alignment with human representations.