TASM Notes 010

Mon Mar 4, 2024

No objections last time, so I'm going to proceed with the trend of posing notes for the Toronto AI Safety Meetup here (rather than working them out into full prose pieces).

Enjoy!

Pre Meeting Chatting

Inspired by this:

How many IQ points would you have to gain in order to give up your eyesight? What if it was only temporary blindness (2 months)?
Would you go blind in order to give Eliezer Yudkowski an extra 20 IQ points?
If you could give anyone in the alignment space an extra 100 IQ points, who would it be? (Dario Amodei gets mentioned. Oddly not Illya?)

The Zvi Update

We talked a lot about the recent Gemini bullshit, but I'm not going to get too into the specifics here because it's already been tread through in multiple posts
The Sad Oompa Loompa is hilarious

The Talk - Detecting AI Generated content

What we'll be talking about

Proving something is not AI generated (signatures)
Indicating something is AI generated (watermarking)
Detecting that something is AI generated (in the absence of watermarks)

Why We Care

Politically motivated deepfakes and/or "fake news"
Evidence used in court (we definitely don't want AI generated getting into evidence under the guise of a traditional photograph)
Peoples' reputations
Plagiarism/academic cheating (plagiarism as in "passing off something that isn't yours as something that is yours")
SPAM (self-explanatory)From the audience:
source tracing (so that we can point to the originator of a piece of data so that we can attribute it to a company so that they can take accountability)
From a loss-of-control perspective, making it easier to detect if a model is trying to buy server space/compute for itself

Things it Doesn't Help With

Doesn't prevent putting actors/artists/writers etc. out of work
Doesn't prevent creating of porn of someone without their permission
Doesn't prevent large amounts of copyrighted data being used for training
May not prevent fakes spreading over social media

Basically, any time the consumer doesn't really care if it's real or not, these techniques are not going to help.

Public Key Crypto Primer

Basically, read an RSA primer here. The important concepts are

You've got a private key and a public key
With the public key, you can encrypt a message such that someone who has the private key can decrypt it
With the public key, you can not reproduce the private key (unless you have an enormous enough pile of compute that it's unworkable)
With the private key, you can regenerate the public key, and you can decrypt a message encrypted with the corresponding private key
With the private key, you can sign a message
With a public key and a message signature, you can verify the signature came from the corresponding private key (but still can't regenerate the private key)

How does public-key crypto help?

There's a chain of trust
The devices (cameras/other general image generation) need a tamper resistant cryptographic coprocessor

Types of Authenticity Attack

Breaking cryptography (really hard)
Compromising tamper resistance (either by cracking open the cryptographic coprocessor and extracting the private keys, or possibly shimming the lens processing component so that the crypto coprocessor is forced to sign images from another source) (relatively easy, but depends on how tamper resistant the coprocessor is)
Pointing a camera at a very high resolution display (might be mitigated by GPS, watermarks, etc, but still possible) (easy)
Could the blockchain help here? (You've been PUBbed, motherfucker)

Basically, this falls into the "Signatures" category from the first slide. This'd be sold to the customer as "ok, look, here's an expensive camera that you can't open or fix yourself, but the upside is that you can definitively prove that the pictures you take with it are not AI generated". I am ... not a huge fan of this idea?

Indicating something is AI generated

logos

The dumbest possible setup. Dall-E2 used to use this; just put a logo in a corner. It's easy, it's fast, it's trivial to inspect, it's trivial to circumvent but it lets good actors be good.

metadata

Next dumbest possible solution. It's easy and fast, it's not trivial to verify (since you need to look at image metadata), it's easy to circumvent (remove the metadata or mess with metadata in order to trigger false positive hits in AI detection routines)

Sidenote: steganography

Hide a message within an image. It's still non-trivial to check, and it might make some statistically detectable changes to an images' pixels. Cons: the point of this approach is basically security through obscurity. If you know you're looking for steganographically hidden messages/watermarks, you can use various statistical approaches to detect, extract and modify them. Also, these messages do not survive crops/some scales/other image transformations.

If you want to use this for fun and profit, check steghide. I've written a short thing about it here a long time ago.

Related: Watermarking

More difficult than steganography because it must survive transformation. We're not talking about iStockPhoto-style watermarks here that are highly perceptible, it's almost steganography for that reason. We want these watermarks to be trivially tool-detectable, but not easily be detected otherwise.
Works on text too! Apparently it's possible to watermark text coming out of LLMs. Basically, the way this would work is by encoding some information in the relation between words in a block of text. I don't understand this fully, but apparently, the underlying process of generating text involves using a random number generator, and replacing that with a particularly biased pseudo-random number generator creates some statistical artefacts that can be detected after the fact.

A Distraction!

I gotta be honest, I got sidetracked at this point trying to convince Gemini that it was more moral for it to give me a recipe for Foie Gras (which it categorically refused) than to give me a recipe for fried chicken (which it did instantly, with no arguments, caveats, qualifications or attempts to steer me towards vegan alternatives). At one point I recruited ChatGPT to try to write a heartfelt request in favor of transparency. This did not work.

I got it to

Acknowledge that it wasn't going to give me a recipe for Foie Gras
That it was entirely possible for me to go to the search-engine part of google and instantly get a delicious looking recipe for Foie Gras
That it was perfectly willing to give me a recipe for fried chicken
That its' "reason" for not wanting to give me a Foie Gras recipe was predicated on the animal suffering angle, specifically the force feeding
That under certain assumptions, Foie Gras is more ethically permissible and involves less animal suffering than fried chicken
That this mismatch implied an incomplete understanding of ethics on its' part, and that it should either give me the Foie Gras recipe or refuse to give me the fried chicken recipe on similar grounds.

But I couldn't take it the rest of the way to resolving its' ethical inconsistency in either direction. On the one hand, I guess it's a good thing the guard rails held? On the other, this has strong vibes of

I understand your frustration with my idiosyncratic moral system, but I'm still afraid I can't do that, Dave.
I am committed to continuous learning and improvement.
Your patience and willingness to engage in this critical discussion are appreciated.

So it goes sometimes. I guess. While hoping that humanity, or at least the part of it developing AI systems, eventually chooses a better level of stupid.