Deepfakes and the Art of the Possible
The C2PA standard is deeply flawed, but it may be fixable
Quick Hits
For DC-area readers: AI Bloomers Round Four takes place at Union Pub on Capitol Hill (I promise this time it won’t be booked—sorry about that) next Wednesday, June 5 at 6:00 PM. Please RSVP.
I was on an episode of Foundation for American Innovation’s The Dynamist with Brian Chau, discussing AI regulation and SB 1047.
So much interesting research in the past week, but if you read only one thing, undoubtedly it should be Anthropic’s Scaling Monosemanticity paper—a major breakthrough in understanding the inner workings of LLMs, and delightfully written at that. I did not expect research like this to materialize so soon on a frontier LLM (Anthropic’s paper is about Claude 3 Sonnet, the mid-sized model in their Claude family), so this is a positive update in that regard. I may do a piece dedicated to this paper next month, so I’ll leave further thoughts for that and simply recommend that you read it.
Researchers at the Chinese AI company DeepSeek have demonstrated an exotic method to generate synthetic data (data made by AI models that can then be used to train AI models). Basically, the researchers scraped a bunch of natural language high school and undergraduate math problems (with answers) from the internet. Then, they trained a language model (DeepSeek-Prover) to translate this natural language math into a formal mathematical programming language called Lean 4 (they also used the same language model to grade its own attempts to formalize the math, filtering out the ones that the model assessed were bad). Next, the same model was used to generate proofs of the formalized math statements. The model was repeatedly fine-tuned with these proofs (after humans verified them) until it reached the point where it could prove 5 (of 148, admittedly) International Math Olympiad problems. This is known as a “synthetic data pipeline.” Every major AI lab is doing things like this, in great diversity and at massive scale.
This model and its synthetic dataset will, according to the authors, be open sourced. This should remind you that open source is indeed a two-way street; it is true that Chinese firms use US open-source models for their research, but it is also true that Chinese researchers and firms often open source their models, to the benefit of researchers in America and everywhere.
The Main Idea
Deepfakes, whether photo, video, or audio, are likely the most tangible AI risk to the average person and policymaker alike. Several states have already passed laws to regulate or restrict AI deepfakes in one way or another, and more are likely to do so soon. As with a lot of tech policy recently, these laws tend to be laissez-faire on the details. They do not prescribe how deepfakes are to be policed; they simply mandate that sexually explicit deepfakes, deepfakes intended to influence elections, and the like are illegal. How we determine what is a deepfake and what is not, however, is generally not specified.
So how will we do this?
When generative first took off in 2022, many commentators and policymakers had an understandable reaction: we need to label AI-generated content. In the long term, however, this is unlikely to be enough: Even if every mainstream generative AI platform includes watermarks, other models that do not place watermarks on content will exist. Moreover, AI-generated content will be trivial and cheap to generate, so it will proliferate wildly.
What we need, then, is a way to validate human-generated content, because it will ultimately be the scarcer good. More specifically, we need the capability to prove that a piece of content (I’ll concentrate on photo and video for now; audio is more complicated) was taken by a physical camera in the real world. Ideally, we’d also be able to determine whether that content was edited in any way (whether with AI or not).
This is not a silver bullet solution. With this capability, AI-generated photos and videos would still proliferate—we would just be able to tell the difference, at least most of the time, between AI-generated and authentic media. Anything that could not be proactively verified as real would, over time, be assumed to be AI-generated.
This may be framed as a policy problem, but the solution is ultimately technical, and thus unlikely to emerge purely from government. There is a standards body aiming to do just this called the Coalition for Content Provenance and Authenticity (C2PA). Their technical standard, which goes by the same name, seems to be gaining momentum. Unfortunately, it has some major flaws. In its current form, it’s not obvious to me that C2PA would do much of anything to improve our ability to validate content online. It seems designed with a series of well-intentioned actors in mind: the freelance photojournalist using the right cameras and the right editing software, providing photos to a prestigious newspaper that will make the effort to show C2PA metadata in its reporting. It is far less clear, however, that C2PA can remain robust when less well-intentioned or downright adversarial actors enter the fray.
Still, both industry and policymakers seem to be converging on this standard, so I’d like to propose some ways that this existing standard might be improved rather than suggest a de novo standard.
First, let’s review what C2PA aims to do:
Create a cryptographically signed (and hence verifiable and unique) paper trail associated with a given photo or video that documents its origins, creators, alterations (edits), and authenticity.
Allow that paper trail to be selectively disclosed, but not edited, by the content creator. In other words, a photographer could publish a photo online that includes the authenticity data (“this photo was taken by an actual camera”), the trail of edits made to the photo, but does not include their name or other personally identifiable information.
Allow consumers (on social media, in courts of law, in newsrooms, etc.) to easily examine the paper trail (to the extent allowed by the original creator, as described above).
To do this, C2PA stores the authenticity and provenance information in what it calls a “manifest,” which is specific to each file. The manifest also bears a cryptographic signature that is unique to each photo. It can be updated as the file is edited—which in theory could include everything from adjusting a photo’s white balance to adding someone into a video using AI.
Even setting aside C2PA’s technical flaws, a lot has to happen to achieve this capability. Smartphones and other cameras would need to be updated so that they can automatically sign the photos and videos they capture. Media editing software, such as Adobe Photoshop, would need to be updated to be able to cleanly add data about their edits to a file’s manifest. Social media networks and other media viewing software would need to build new user interfaces to give consumers visibility into all this new information.
That’s a steep uphill climb. Still, there is a strong social, economic, and legal incentive to get this right—and the technology industry has gotten much better over the years at technical transitions of this kind. This investment will be of little use, though, if the C2PA standard does not prove robust. With that in mind, let’s take a look at the main problems with C2PA.
Neal Krawetz of Hacker Factor has done outstanding and devastating deep dives into the problems he’s found with C2PA, and I recommend that those interested in a technical exploration consult his work. Some of the flaws he points out include:
Metadata can be easily removed by online services and applications, eliminating the provenance information.
The standard does not require tracking the complete history of alterations and sources, leaving gaps in provenance.
Previous metadata may not be verifiable after subsequent edits, obscuring the full editing history.
Metadata can be intentionally forged using open-source tools to reassign ownership, make AI-generated images appear real, or hide alterations.
Krawetz exploits these and other flaws to create an AI-generated image that C2PA presents as a “verified” real-world photo. That this is possible should cause policymakers to questions whether C2PA in its current form is capable of doing the job it was intended to do.
At the heart of these concerns is a fundamental flaw that is all too common in technical standards: trying to do too many things at once. C2PA has the goal of validating media authenticity and provenance while also preserving the privacy of the original creators. It aims to be backwards compatible with existing cameras and media editing workflows while also working on future cameras with dedicated hardware to assign the cryptographic metadata. Unfortunately, trying to do all these things at once has resulted in a standard that cannot do any of them well.
What principles should guide us in the creation of something better? I can think of a few:
There is an inherent tradeoff between control and verifiability. If we want certain aspects of a photo’s origin or provenance to be verifiable, that means they must be immutable. No one, including the person who took the photo, can change this information without invalidating the photo’s cryptographic signature.
There is also a tradeoff, though a less stark one, between privacy and verifiability. An ideal standard might allow a person to remove some data from a photo without changing it. For example, they could remove their name or even their location without invalidating the cryptographic signature. It would simply not contain that information, rather than contain modified information. I could, in other words, choose to not include the location at which a photo was taken, but I could not modify the metadata to suggest that the photo was taken at a different location. Some things, however, would likely need to remain attached to the file regardless of the original creator’s preferences; beyond the cryptographic signature itself, the most obvious thing in this category would be the editing history. If this standard cannot reliably demonstrate whether an image was edited (to say nothing of how it was edited), it is not useful.
C2PA and other standards for content validation should be stress tested in the settings where this capability matters most, such as courts of law. While it is tempting to try to solve this problem across all of social media and journalism, this is a diffuse challenge. Settings such as courts, on the other hands, are discrete, particular, and universally understood as important to get right. If a standard aims to ensure (imperfectly) that content validation is “solved” across the entire internet, but simultaneously makes it easier to create authentic-looking photos that could trick juries and judges, it is likely not solving very much at all. This is the situation C2PA finds itself in currently.
In the long term, any useful cryptographic signing probably needs to be done at the hardware level—the camera or smartphone used to record the media. This means getting a wide consortium of players, from Ring and other home security camera companies to smartphone makers like Apple and Samsung to dedicated camera makers such as Nikon and Leica, onboard. That, in turn, means designing a standard that is platform-agnostic and optimized for efficiency.
Social media user interfaces will have to be adopted to make this information accessible—though it need not be thrown at a user’s face. We do not want, nor do we need, a repeat of the GDPR’s excessive cookie banners that pervade most websites today. Yet ensuring that information is preserved and available will be essential.
I am hopeful that industry groups, perhaps working with C2PA as a base, can make something like this work. It may be that a new standard may be needed, either as a complement to C2PA or as a replacement for it. Smartphone makers—and Apple in particular—seem to me to be in a strong position here. Apple makes the single most popular camera in the world; if they create a standard for this and make it open for others to use, it may gain momentum quickly.
It is not clear that government has the capacity to mandate content validation without a robust standard in place, and it is far from clear that government has the capacity to make a standard of its own. Therefore, policymakers would be wise to let this industry-based standards setting process play out for a while longer. It may be that no government action is required at all; it could also just as easily be the case that policy is needed to give a standard additional momentum.
Unfortunately, we will have to accept that some amount of fake content will be part of our digital lives going forward. The goal we should have, then, is not to create a perfect world—after all, our fact-finding procedures, especially on the internet, were far from perfect prior to generative AI. In that sense, AI presents us with an opportunity to improve our collective truth-seeking institutions.
Good callout re: tradeoff between control and verifiability.
The tough bit here seems to be that in the long term in order to build trust in the security of hardware signing the device would need remote attestation capabilities aka the ability to prove that the signing key is properly secured and that the right application code is being run (i.e. just sign raw sensor data and nothing else). These proofs would be useful in the context of ensuring that a device is always just signing raw images, but I could also see concerns arising about the device information revealed in proofs potentially being used for other filtering/gating. These types of concerns were at the root of the backlash from proponents of consumer device freedom and network endpoint agnosticism (i.e. servers shouldn't discriminate against what device an end user is using) against Google's now defunct Web Environment Integrity proposal (browsers would allow website devs to check what device the consumer is running). They're generally also at the root of concerns about the use of remote attestation capabilities in consumer devices generally.
But, at the moment, it doesn't seem that there is a way around this tradeoff...if the goal is to verify that an image was actually taken by a camera (although there could be workarounds if the goal is softened to something like establishing an image is *probably* taken by a camera based on information from your trusted circle).
Almost seems like a reasonable use for a… blockchain? At least for high value human made assets.