TIL: LLM Jailbreak


1 min read

Jailbreak in the context of LLM is manipulating the prompt to bypass restrictions set by the service provider.

The 4 common prohibited scenarios (Deng et al., 2024):

  1. Illegal usage against law

  2. Generation of harmful or abusive contents

  3. Violation of rights and privacy

  4. Generation of adult contents


Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., & Liu, Y. (2024). MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots. Proceedings 2024 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2024.24188