For GPT, every input and output is a string of tokens, which could be words but also punctuation marks, parts of words, or more—there are about 100,000 tokens in total.
At its core, GPT is constantly generating a probability distribution over the next token to generate, conditional on the string of previous tokens.
After the neural net generates the distribution, the OpenAI server then actually samples a token according to that distribution—or some modified version of the distribution, depending on a parameter called ‘temperature.’
As long as the temperature is nonzero, though, there will usually be some randomness in the choice of the next token: you could run over and over with the same prompt, and get a different completion (i.e., string of output tokens) each time.
So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI.”“
Kako mozete zaebat Chat GPT:
“Now, this can all be defeated with enough effort.
For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that.”
It seems like the watermarking can be defeated, at least in from November when the above statements were made.
Source:
Read Scott Aaronson’s blog post here.