TIL: LLM Jailbreak
Jailbreak in the context of LLM is manipulating the prompt to bypass restrictions set by the service provider.
The 4 common prohibited scenarios (Deng et al., 2024):
Illegal usage against law
Generation of harmful or abusive contents
Violation of rights and privacy
Generation of adult contents
Reference
Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., & Liu, Y. (2024). MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots. Proceedings 2024 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2024.24188
ย