Google Scholar

User profiles for Stephen Casper

Stephen Casper

PhD student, MIT

Verified email at mit.edu

Cited by 552

[PDF] arxiv.org

Clusterability in neural networks

D Filan, S Casper, S Hod, C Wild, A Critch… - arXiv preprint arXiv …, 2021 - arxiv.org

The learned weights of a neural network have often been considered devoid of scrutable
internal structure. In this paper, however, we look for structure in the form of clusterability: how …

Save Cite Cited by 32 Related articles All 2 versions View as HTML

Concussion: A History of Science and Medicine, 1870‐2005

ST Casper - Headache: The Journal of Head and Face Pain, 2018 - Wiley Online Library

Objective To review the intellectual history of concussion from the mid‐19th century to the
opening decade of the 21st century. Background Head injuries (HI) and their acute and long‐…

Save Cite Cited by 35 Related articles All 5 versions

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state-…

Save Cite Cited by 168 Related articles All 6 versions View as HTML

[HTML] sciencedirect.com

[HTML][HTML] The punch-drunk boxer and the battered wife: Gender and brain injury research

ST Casper, K O'Donnell - Social Science & Medicine, 2020 - Elsevier

This essay uses gender as a category of historical and sociological analysis to situate two
populations—boxers and victims of domestic violence—in context and explain the temporal …

Save Cite Cited by 18 Related articles All 12 versions

[PDF] arxiv.org

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 IEEE Conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However, …

Save Cite Cited by 91 Related articles All 4 versions

[PDF] neurips.cc

Robust feature-level adversaries are interpretability tools

S Casper, M Nadeau… - Advances in Neural …, 2022 - proceedings.neurips.cc

The literature on adversarial attacks in computer vision typically focuses on pixel-level
perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent …

Save Cite Cited by 24 Related articles All 7 versions View as HTML

[PDF] arxiv.org

Explore, establish, exploit: Red teaming language models from scratch

S Casper, J Lin, J Kwon, G Culp… - arXiv preprint arXiv …, 2023 - arxiv.org

… Stephen Casper received support for this work from … [5] Isabelle Augenstein, Christina
Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, and Jakob …

Save Cite Cited by 42 Related articles All 3 versions View as HTML

[PDF] neurips.cc

Red teaming deep neural networks with feature synthesis tools

S Casper, T Bu, Y Li, J Li, K Zhang… - Advances in …, 2023 - proceedings.neurips.cc

Interpretable AI tools are often motivated by the goal of understanding model behavior in out-of-distribution
(OOD) contexts. Despite the attention this area of study receives, there are …

Save Cite Cited by 10 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Scalable and transferable black-box jailbreaks for language models via persona modulation

R Shah, S Pour, A Tagade, S Casper… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite efforts to align large language models to produce harmless responses, they are still
vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate …

Save Cite Cited by 27 Related articles All 3 versions View as HTML

[PDF] aaai.org

Frivolous units: Wider networks are not really that wide

S Casper, X Boix, V D'Amario, L Guo… - Proceedings of the …, 2021 - ojs.aaai.org

A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their
accuracy does not degrade when the network width is increased. Recent evidence suggests …

Save Cite Cited by 26 Related articles All 6 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Stephen Casper

Stephen Casper

Clusterability in neural networks

Concussion: A History of Science and Medicine, 1870‐2005

Open problems and fundamental limitations of reinforcement learning from human feedback

[HTML][HTML] The punch-drunk boxer and the battered wife: Gender and brain injury research

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

Robust feature-level adversaries are interpretability tools

Explore, establish, exploit: Red teaming language models from scratch

Red teaming deep neural networks with feature synthesis tools

Scalable and transferable black-box jailbreaks for language models via persona modulation

Frivolous units: Wider networks are not really that wide

Related searches