Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Robust Feature-Level Adversaries are Interpretability Tools., , , and . NeurIPS, (2022)Benchmarking Interpretability Tools for Deep Neural Networks., , , , , and . CoRR, (2023)Open Problems in Technical AI Governance., , , , , , , , , and 21 other author(s). CoRR, (2024)Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience., , , , , , , , , and 6 other author(s). CoRR, (2024)The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence., , , , , , , , , and . CoRR, (2024)Frivolous Units: Wider Networks Are Not Really That Wide., , , , , , and . AAAI, page 6921-6929. AAAI Press, (2021)White-Box Adversarial Policies in Deep Reinforcement Learning., , and . CoRR, (2022)Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs., , , , , , , , , and 1 other author(s). CoRR, (2024)Rethinking Machine Unlearning for Large Language Models., , , , , , , , , and 3 other author(s). CoRR, (2024)Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback., , , , , , , , , and 22 other author(s). Trans. Mach. Learn. Res., (2023)