Proj No. | A2042-251 |
Title | Identification and Manipulation of Text Watermark from Large Language Model |
Summary | Large Language Models (LLM) like GPT-4 and DeepSeek have gained tremendous attention in recent years and gradually been used in applications including media writing, customer service and knowledge retrieval. However, the abuse of LLM-generated content, such as spreading misinformation and academic misconduct, has raised significant concerns. Therefore, embedding a watermark in LLM-generated text which carries its provenance is necessary for differentiating human-written and machine-generated text. Current LLM watermarking schemes can be broadly categorized into two types. The first type of LLM watermark randomly initializes a red/green token list and adds bias to logits belonging to the green list during token sampling, so that the watermarked text can be detected by counting the number of red/green tokens in it. The second type of LLM watermark leverages a pseudo-random number generated according to a fixed seed to direct the sampling of each token, which allows for watermark detection by comparing the candidate token and the random number sequence. While these LLM watermark schemes have negligible impact on the generated text quality, recent studies suggest that the text watermark can be correctly identified by a malicious user, and this knowledge can even be exploited to conduct LLM spoofing (i.e., imitate LLM output). This project aims to discover novel methods for identifying the working principle of watermarked LLM-generated text from the perspective of a malicious LLM user. The student will first study the fundamentals of LLM and popular LLM watermarking schemes, followed by the implementation of these techniques. Then he/she will dive into the ongoing research of LLM watermark identification and manipulation. The student is required to have basic deep learning knowledge and familiarity with Pytorch. As the project proceeds, the student is encouraged to come up with their own identification methods and conduct relevant experiments. Students who wish to work on this project are encouraged to contact the supervisor for more information before making the selection. |
Supervisor | Prof Chang Chip Hong (Loc:S2 > S2 B2C > S2 B2C 97, Ext: +65 67905873) |
Co-Supervisor | - |
RI Co-Supervisor | - |
Lab | VIRTUS, IC Design Centre of Excellence (Loc: S3.2-B2 Tel 6592 1844) |
Single/Group: | Single |
Area: | Digital Media Processing and Computer Engineering |
ISP/RI/SMP/SCP?: |