TIL: LLM Jailbreak

PublishedJune 24, 2024

Jailbreak in the context of LLM is manipulating the prompt to bypass restrictions set by the service provider.

The 4 common prohibited scenarios (Deng et al., 2024):

Illegal usage against law
Generation of harmful or abusive contents
Violation of rights and privacy
Generation of adult contents

Reference

Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., & Liu, Y. (2024). MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots. Proceedings 2024 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2024.24188

#jailbreak #large-language-models

25 views

Comments

Join the discussion

No comments yet. Be the first to comment.

TIL

Part 8 of 8

Start from the beginning

TIL: 3 general approaches language models use

How one word relates to another and in which context it can be used. If this can be calculated precisely, then would that be the same as knowing exactly what to say and when? How can a sequence of words be generated? The 3 general approaches language...

More from this blog

TIL: Google Cloud Storage and Serverless Options

5 Core Storage Options Cloud Storage Object Storage Object (e.g. video, pictures, audio recordings) + Metadata Cloud SQL Fully managed relational databases MySQL, PostgreSQL, SQL Server Cloud Spanner Scales horizontally SQL support for o...

Dec 18, 20231 min read8

TIL: How to remove X-Powered-By: ASP.NET Header

Go to Server Manager -> IIS Right click on the Server Select Internet Information Services (IIS) Manager On Features View tab, go to IIS -> HTTP Response Headers Right click on X-Powered-By -> Remove Click Yes to confirm. Reload ...

Nov 28, 20231 min read60

TIL: How to remove X-Powered-By: ASP.NET Header

TIL: How to remove IIS server response header

Go to Server Manager -> IIS Right click on the Server Select Internet Information Services (IIS) Manager On Features View tab, go to Management -> Configuration Editor On the drop down menu, select the section: system.webServer -> security...

Nov 27, 20231 min read10

TIL: How to remove IIS server response header

Google Preemptible Virtual Machine Instances

To preempt means to stop. Preemptible virtual machines are available at a lower price but might be preempted (stopped) at any time. This instance always stops after running for 24 hours (except for Spot VMs that do not have maximum runtime unless def...

Nov 20, 20231 min read10

Google Preemptible Virtual Machine Instances

iamsywid

13 posts

🌱 learning, unlearning, relearning

Command Palette

Reference

Comments

TIL

TIL: 3 general approaches language models use

More from this blog