Generally speaking, I think evidence tampering is not a new problem, and even though it's easy in some cases, I don't think it's _that_ widespread. Just like it's possible to lie on the stand, but people usually think twice before they do it, because _if_ they are found to have lied, they're in trouble.
My main concern is rather that legit evidence can now easily be called into question. That seems to me like a much higher risk than fake evidence, considering the overall dynamics.
But ultimately: Humanity has coped without photo, audio or video evidence for most of its existence. I suppose it will cope again.
Have you ever heard some of the wiretap or hidden microphone recordings used to convict mafia bosses back in the 1980s? It was so bad I can't believe it was accepted. It could easily have been faked. The only thing that made it work was the sworn statements of authenticity from the people who did the recording, and the chain of custody thereafter.
I know a lawyer who was convicted on one of those. The detectives had the mic + transmitter taped to his groin so it would get through a pat down. Then the undercover just spoke both sides of the conversation (the defendant's side was whispered and muffled). Someone testified it sounded like the defendant. Conviction was overturned on appeal on a simpler issue unrelated to the evidence.
In fact, I remember a drug dealer I was helping with his defense. Hidden mic on undercover was taped under his armpit. He arrived to defendant's hotel room for a deal and defendant made him undress. The recording is hilarious because defendant is like "You fucking snitch, what's that under your armpit?" and the undercover says "It's my .. er .. MP3 player?" LOL
I agree. The article didn't touch on this aspect, but we're now at the point where even authentic recordings could be plausibly denied and claimed to be fake. So the entire usage of recordings as evidence will suffer a hit. We may essentially be knocked back to an 18th century level of reliance on eyewitness testimony. One wonders what the consequences for justice will be.
I wouldn't say we'd be quite back to pre-photo evidence days. I feel a lot if not most of the value in a video/audio recording is not just that the medium has traditionally been difficult to edit, but that it's attesting to a lot of details with high specificity. There's a lot to potentially get caught out on with not a lot of wiggle room when inconsistencies are spotted (compared to recalling from memory). Document scans and static images are still useful despite having long been trivial to edit, for instance.
There's already a process for this, its called chain of custody. If you cant prove the evidence has a solid chain of custody then it was potentially tampered with and isn't reliable.
The usual chain of custody goes something like: The store has a video surveillance system which the police collect the footage from, so the chain of custody goes through the store and the police which implies that nobody other than those two have tampered with it.
But then you have an inside job where the perpetrators work for the store and have doctored the footage before the police come to pick it up, or a corrupt cop who wants to convict someone without proving their case or is accepting bribes to convict the wrong person and now has easy access to forgeries. Chain of custody can't help you in either of these cases, and both of those things definitely happen in real life, so how do you determine when they happen or don't?
Surely chain of custody applies if the accused has access to the evidence? Perhaps I’m missing your point or I’m overly optimistic about the legal system.
Suppose the store manager is having a dispute with a kid who keeps skateboarding in the parking lot, so the store manager decides to commit insurance fraud by robbing the store herself and then submits forged video of the kid doing it to the police.
The store manager is in the chain of custody but isn't a suspect, the accused is the kid. The kid doesn't even know who actually committed the crime. How is the kid supposed to prove this?
In this case, chain of custody needs to extend to the capture device itself, and to any software that exists in the supply chain for the video content.
There are some experimental specifications that exist to provide attestation as to the authenticity of media. But most of what I’ve seen so far is a “perjury based” approach that just requires a human to say that something is authentic.
> In this case, chain of custody needs to extend to the capture device itself, and to any software that exists in the supply chain for the video content.
There are two major problems with this.
First, is all footage from existing surveillance systems going to be thrown out because it doesn't use this technology? Answer: No, because it would be impractical. But then nobody cares to adopt the technology because using it isn't required. How's that IPv6 transition going?
Second, that sort of thing doesn't actually work anyway. Surveillance cameras are made by the lowest bidder. Their security record is appalling. They're going to publish their private keys on github and expose buffer overflows to the public internet and leave a telnet server running on the camera that gives you a root shell with no password. Does it sound like hyperbole? Those are all things that have actually happened.
There is only one known way to prevent this from happening: Do not allow the hardware vendor to write the software. Any of the software. Instead, demand hardware documentation so that the firmware can be written by open source software people instead of lowest bidder hardware companies. This is incompatible with using the hardware vendor as the root of trust, which is a natural consequence because the hardware vendors are completely untrustworthy.
But let's suppose we find some way to do it. We'll pass a law imposing a $100 fine on any company that has a security vulnerability. Then there will never be a security vulnerability again because security vulnerabilities will be illegal; I'm assured this is how laws work. At that point the forger takes the camera and points it a a high resolution playback of the forgery, and the camera records and signs the forgery.
I kind of wish people would stop suggesting this. It's completely useless but it creates the false impression that it can be solved this way and then people stop trying to find a real solution.
Chain of custody isn't real as long as the judiciary gives the government a 'good faith' pass when chain of custody isn't maintained/documentable in court. Go into Lexus Nexus and look up 'good faith' related to 'chain of custody'. Any 'protections' that can be waived away at the judges whim when the standard isn't met by the government are not actually real but pure theater to lend legitimacy to the American judicial system that it doesn't deserve.
Yep, "chain of custody." Came here hoping to see that concept discussed since it's how the system already deals with cases of potential evidence tampering. If the evidence is of material importance and there's no sufficiently credible chain of custody, then its validity can be questioned. The concept started around purely physical evidence but applies to image, audio and video. The good thing about the ubiquity of deepfake memes on social media is that it familiarizes judges and juries with how easy it now is to create plausible fake media.
Chain of custody only covers from when the evidence came into the hands of the police; the real issue here is original provenance, which chain of custody doesn't solve.
Evidence of provenance is already important, to be sure, but the the ability to have some degree of validation of the contents has itself provided some evidence of provenance; lose that and there is a real challenge.
Who needs a whole blockchain? Just basic public-key cryptography would do the job.
Imagine if you will, that the NVR (recording system) has a unique private key flashed in during manufacturing, with the corresponding public key printed on it's nameplate. The device can sign a video clip and its related meta-data before exporting. Now, any decent hacker could see possible holes in this system, but it could be made tamper-resistant enough that any non-expert wouldn't be able to fabricate a signed video. Then the evidence becomes the signed video and the NVR's serial number and public key. Not perfect, but probably good enough.
This is such BS. The government is ALWAYS deferred to when the chain of custody is broken because 'good faith' is applied. As long as 'good faith' is rountely dispensed 'chain of custody' is nothing but propaganda for the justice system not an actual tool used for justice.
As long as chain of custody ca be discarded because 'good faith' whenever it becomes inconvenient it is not a real thing.
I can easily imagine a future where video evidence is only acceptable in the form of chemically developed analog film, at resolutions that are prohibitively expensive to model, and audio recordings of any kind are not admissible as evidence at all. Signatures on paper, faxes, etc are, of course, inadmissible, too.
The point of a signature on paper is that (at least in my country) you can summon somebody in court and ask "is this your signature"? If they say it is, it is, even if it does not look much like any of their other signatures. Then there might be a suspect of false testimony or being blackmailed etc but that's not always the most important issue in a case. E.g.: if it is a contract and all the signers state that they agree with its terms, then there is no much else to discuss.
That's a great point. The ability to undermine real evidence by claiming it's AI-generated could be just as (if not more) damaging than fake evidence itself
At this stage it's more a risk for people who have their likeness out in the public domain where it can be copied. But that's just about everyone these days.
> But ultimately: Humanity has coped without photo, audio or video evidence for most of its existence. I suppose it will cope again.
Same argument for electricity, the internet, sanitation, democracy... doesn't seem like a great test for stuff that we just didn't have it before and survived.
Well, I'm not arguing "good riddance". I just optimistically think we'll manage. I wouldn't want to miss any of the things you listed, actually. I'd also prefer evidence tampering to be impossible or at least very difficult. But that's not a call I can make. All I can decide about things out of my control, is how I think about it. And here I'm carefully optimistic.
>But ultimately: Humanity has coped without photo, audio or video evidence for most of its existence.
which allowed to burn witches based on testimony of citizens in good standing.
And that leads us to using Neuralink and similar tech and the next gen lie detectors (like say the defendant's fMRI (most probably interpreted by AI too:)) to look into the brain and extract confession. No need for evidence, deposition and all that expensive time consuming stuff standing in the way of truth especially given that it can't be trusted anymore in the face of AI capabilities.
I kind of want to have an LLM which takes absolutely any criticism of AI or news of it doing something bad and then generates a plausible HN comment that basically goes "I don't think it's a new problem, X has always been possible, there's nothing really new"... because that comment always appears like clockwork beneath it :)
Strangely, I'm not even offended by the notion that an AI can replace this work I apparently do :D Though my thought here was just pure optimism in the face of something bad, not an attempt to frame a bad side effect of something good as not a big deal.
When it comes to generative AI, I personally don't see a lot of good applications, but a plethora of bad ones. The only solution I could imagine would be regulation to the degree that using or distributing models with certain capabilities is just illegal. Judging from the war on file sharing a few decades ago, probably very difficult to enforce, even if it is perhaps still worth doing.
But I don't see any governments line up to do it. Given that, this particular (semi) new development that generative AI is effective for evidence tampering, I think we'll manage to deal with it.
I question the wisdom of setting the judge up as a superjury / gatekeeper for this kind of situation. This seems like a reliability / weight of the evidence scenario, not a reliability / qualification of the witness scenario (as with an expert witness).
Why would the judge be better qualified to determine whether the voice was authentic, as opposed to the witness? And why should the judge effectively determine the witness's credibility or ability to discern, when that's what juries are for?
All that said, emulated voices do pose big problems for litigation.
I don't know what the latest is, but often judges are supposed to not allow "expert testimony" without checking that the person is an expert. However this is a really complex area. Judges don't want to be the one deciding a case, but not allowing some "expert" is in a way deciding the case.
There’s a built-in design paradox. How does a judge assess an expert in a field where he is not one? There’s probably some improvement that comes from experience but it’s not perfect.
Qualifications=academics usually & career experience. No way to know if they actually learned something and aren’t a fraud. Even corporations that do thousands of interviews get duped
If it was solely up to the opposing party, then they'd strike down every single expert. The judge still is the ultimate decider on what evidence (including expert testimony) is admissible.
It's the jury that ultimately decides whether the evidence is convincing, even if the judge allows it. The defense will try to cast doubt on the evidence when they make their arguments.
This also reminds me of one of Norm Macdonald's bits. He says (very seriously) that if he were on a jury he would not convict someone on the basis of DNA evidence. "What do I know about DNA?" he says.
But I think the argument here is less about judges making definitive calls on authenticity and more about ensuring that clearly questionable evidence isn't automatically admitted just because a witness vouches for it.
You mean like expert witness polygraphers that were treated as fact by the courts for years and still used to re-incarcerate people on parole/probation?
Or do you mean bite mark analysis that was again wrong?
Many of these forensic methods were used for decades and presented/treated as conclusive evidence before being challenged leading to wrongful convictions. But yes, let's have 'certified' voice experts whose living is based on being hired by the government and giving testimony to convict people. Surely this time it will be altruistic and scientific.
What always worried me is that nobody ever challenged these methods. We just accepted it as fact. Endless crime tv shows reenforce that these methods are infallible. People that watch these shows then become jurors. Scary.
Excluding fake evidence is very much a responsibility of the judge. In the age of Fox News, letting the jury decide for themselves whether or not made-up bullshit is actual evidence seems like a recipe for disaster, and not necessarily one that errs on the side of caution.
To the contrary, in US courts the jury determines whether evidence (documentary, testimonial) is credible or not, and what weight (if any) to assign it. (Experts, à la Daubert etc., are a different matter because they give expert opinions, not factual evidence based on personal witnessing of the events in the case, so the judge does perform a gatekeeping function, essentially to ensure the underlying field/science is reliable.)
While certainly Fox News headlines would not reach the jury in most instances, that is on account of hearsay, lack of qualification, materiality, relevance, and similar rules. It is not a prior credibility or weight determination by the judge, as I understand TFA to be advocating. So: did the witness hear a voice that he believed to be the one in question? If so, jury gets to decide (unless unfairly prejudicial or some other overriding rule comes into play).
IANAL, and I am assuming you are a lawyer - I interpreted @NoMoreNicksLeft as saying that it's a judge's responsibility to determine whether or not evidence is admissible - before it gets to a jury. And that's also what the article is talking about - "The examples should illustrate circumstances that may satisfy the authentication requirement while still leaving judges discretion to exclude an item of evidence if there is other proof that it is a fake. "
Judges have a role in excluding evidence which is more prejudicial than probative. In the case of a fake voice recording, hearing someone who sounds like the accused participating in a crime may prejudice the jury even if they later hear evidence that the recording is fake. (This is probably even more true for faked videos.)
>While certainly Fox News headlines would not reach the jury in most instances, that is on account of hearsay,
Nor should obviously fake evidence reach the jury. They can judge for themselves whether testimony is credible, but this is far different than admitting faked evidence. And if you can't see the difference, I'm not sure I'm qualified to explain it to you.
How is AI voice faking any different than any other type of faking? How is it different than a manipulated recording, or a recording where someone is imitating another?
It is just as easy to fake many paper documents, and we have accepted documents as evidence for centuries.
Photos can be faked, video can be edited or faked, witnesses lie or misremember.
Is this just about telling lawyers that unvetted audio recordings can be unreliable? Because that shouldn't be news.
Edit: this is a good faith question. I'm legitimately just curious. Splicing and editing have been around since recording was invented, I was legitimately curious why voice recordings would have been given extra evidential weight when manipulating recordings is a known possibility.
Presuming good faith here, faking recordings has been harder to do, easier to detect, and less equivocal in the past than it is now.
If it takes an FX house to generate a plausible recording of me saying something I didn't say, that's a risky enterprise with a lot of witnesses.
If my enemy can do it in their basement with an hour of research, the exposure risk goes way down, and consequently the expectation you'll see it in real life goes way up.
I understand what you are saying, but my point is that I could make plausible sounding recordings that did not reflect reality by, for example, cutting recordings up with freeware like Audacity, or even using a consumer level double tapedeck before that. It wasn't Manhattan project levels of effort before this.
This seems more like people losing their minds over 3d printed guns, when hobbyists with a drill press have been making guns in the garage for decades.
Yeah, its easier now to fake voice, but its not as if what this article warns against wasn't possible before the latest AI hype cycle. And it is also worth noting that voice cloning/changing technology is not particularly new either (I've been able to sound like Morgan Freeman using a phone app for at least half a decade).
I agree that courts should be cautious around accepting voice recording evidence, I just don't think that the ability to do this is new.
> I could make plausible sounding recordings that did not reflect reality by, for example, cutting recordings up with freeware like Audacity,
Cutting up what recordings, exactly? If you're mixing-and-matching, you need a pretty broad corpus of source material with fairly consistent recording quality (background noise, etc), and even still you're limited to reproducing words that are already present. You can't cut-and-splice together a recording of me saying 'Yes, dghlsakjg, I embezzled $1 million and blew it on the ponies' because there will be no recording of me saying your username, 'embezzled', or 'ponies.'
The problem comes with the voice-cloning technology that can construct entirely new sentences based on relatively short voice profile samples.
> I've been able to sound like Morgan Freeman using a phone app for at least half a decade
You've been able to sound like Morgan Freeman because of specific, hard tuning work put into the voice changer. Now, you can sound like your boss, or your neighbour, or your ex.
I'm giving limited examples, not a conclusive list of how audio could be manipulated. I understand that there are circumstances where my examples don't work.
My broader point is that audio evidence has never been perfect, just like every other medium that evidence can exist in.
Generative audio technology is not the first time that audio has been corruptible as evidence, and any lawyer who believed otherwise was naive at best.
I edited the audio of a few math videos to upload to YouTube and I had to fix "typos" like "then one plus zero is zero". The problem is that there are no spaces in the voice, so it sounds like "thenonepluszeroiszero."
So I had to look for a "one" that is in a similar context, after an "s" and before a ".", or at least a similar one. And hope the speed matches, and adjust the volume.
They were free videos, so it was not necesary to get it perfect, but at least reduce the distraction caused by mistakes.
> I could make plausible sounding recordings that did not reflect reality by, for example, cutting recordings up with freeware like Audacity, or even using a consumer level double tapedeck before that.
Being able to do it at scale, convincingly, in real-time, for any arbitrary text, with just 30s or so of someone's voice as a sample, changes the calculus a lot.
I've developed a novel approach to creating tamper-evident video via cryptographic feedback loops between projectors and cameras. The process works as follows:
1. A projector displays a challenge pattern (Perlin noise derived from of a hash)
2. A camera captures this projection
3. The system hashes the captured image concatenated with the previous hash and uses it to derive the next projection
4. This chain demonstrates true temporal sequentiality that's difficult to forge
By incorporating random noise derived from Byzantine Fault Tolerant networks and using these networks as timestamping servers, the proofs inherit the network's decentralization properties. ML then confirms that the feature distributions in projection-photograph pairs match expected patterns from the training dataset.
There are actually (at least) 3 different places where cryptography is needed here:
* Proof that this starts after a given time. Traditionally this has used methods like "this is the headline of a major newspaper today", which is limited to 1-day granularity and has problems if you can just generate a large number of expected headlines and use them in parallel. But with crypto, we can just query any random-number-timestamp-signing server, and a network of such servers can mutually sign each other's previous packets so it's very reliable both against downtime and against attacks.
* Proof of sequencing. This is trivial with a chain of hashes, though it does prevent recompression.
* Proof that this ends before a given time. This requires actively submitting your signature data to a timestamp server for additional signing, which is a much more complicated task than the initial half. It is still possible to eliminate the single source of vulnerability, but much more work.
"Camera looks at monitor" is going to be a much cheaper way to make this air-gapped than adding a projector. And this doesn't strictly need to be continuous; most things are tolerant of one-day granularity and almost everything of 15-minute granularity.
Largely true, although submitting timestamp hashes to the blockchain is probably the easiest bit.
"Camera looking at a monitor": While that might be simpler in some setups, it doesn’t really solve my main issue. I want the signal to permeate the entire scene, not just appear in the corner of a display or overlaid on the video. By projecting the challenge onto all visible surfaces, we create a physical environment that’s difficult to fake (since you’d have to convincingly generate or remove those patterns in real time). Air-gapping isn’t really the goal right now.
Finally, we're need much finer granularity than 15 minutes! The point is to lower the generation time below what is achievable via generative model.
Thank you for the comment, and I hope these clarifications are useful. It's a new concept, so please forgive the clumsiness with which I may be communicating it.
"The blockchain" is just a wasteful way of doing things compared to the plain crypto.
"Overlay the entire scene" doesn't actually appear to add any information-theoretic value compared to simply bounding the timestamp at which the video was made. Nothing either of us is talking about will actually prevent fakes (before the camera signs it), only constrain the time at which the fakes are made.
Slower-than-real-time generation of fakes is still significantly inhibited by the fact that the hash sequence can be checked for continuity across long lengths of time whose bounds are verified using the other steps.
Thank you! It is indeed a little like a signature based on proof-of-projection.
As they say, once you have a signature, you have a most of a cryptosystem.
I've been experimenting with those and other applications of non-linear functions in projector-camera systems.
In the future this stuff will get so good that the public will beg to be surveilled at all times because it will be the only way to prove what you didn't do. You will learn to love Total Information Awareness. Consent status: manufactured :)
It could easily go the other way, where the public doesn't care what people think they did or didn't do and just does whatever they want, because they don't respect the state and believe the social contract has been broken. "Fuck justice, talk to my AR-15."
There's ample evidence that this is already happening, eg. recent headlines about kids being radicalized at increasingly younger ages, groups like No Lives Matter that embrace violent nihilism, increased domestic terrorism, record high gun ownership across both sides of the political spectrum, authority figures that just do whatever they want and ignore any form of law or accountability, etc.
We're already at a place where most people don't care what other people think of them.
Issue is, as long as the government has the big guns, what the government thinks of you will still matter in a major way.
In such an environment, most people are going to choose to have some kind of way to prove to the government what they did and what they didn't do. Not because they care what other people think as you're implying, but rather because they very much care that the government not get the wrong idea about them. Because the government getting the wrong idea about you can be fatal.
Government is based on the threat of the use of force, not the actual use of force. If you're a government that is regularly using force against a significant proportion of your citizens, you have problems, and probably will not remain the government for long.
I suspect that we're saying the same thing but with reversed causality. Both of us agree that non-deterministic enforcement breaks down the incentives needed for pro-social behavior. You're saying that this will cause people to demand ways to improve the governments enforcement abilities. I'm saying that this will cause people to adapt their behavior to the new, lessened enforcement abilities. In defense of my point, I'd point out that changes to government are a coordination problem while adaptation of behavior is an individual-only response, and it is much easier to effect changes to your own behavior than it is to convince 300 million people to agree on a solution and implement it, particularly when the root problem is a lack of enforcement ability.
Keep in mind that being able to demonstrate your innocence to the government is also just an individual change in behavior. A person might not necessarily care if you use the tracking technologies or submit yourself to all the cameras in society. But they'll choose to do it themselves just so that there's that record out there.
The government coming up with one way to track everyone is not really necessary to get people to submit to tracking. In fact, most of the law enforcement "watcher" types wouldn't want that anyway. Not only is having a myriad number of ways to track and surveil people is far preferable to law enforcement, but it also allows people to individually choose to set up all the Ring doorbells and security cameras and GPS trackers and smart glasses based body cams etc etc all on their own.
And they'll happily choose to buy all that stuff voluntarily and without regard to what everyone else is doing.
I think the issue demonstrated by this article is that technology is getting such that demonstrating your innocence to the government is not an individual change of behavior, and that there are ways to abuse the demonstration mechanisms that effectively feed false information to the government.
Imagine that the populace supports widespread surveillance techniques, and so cameras are setup everywhere. Some hacker group figures out how to hack into the cameras and insert deepfakes in them. Now members of that hacker group have a government-proof alibi whenever they want it, and can commit crimes at will, and get it blamed on others. Justice goes out the window.
Which speaks to why the law enforcement types prefer multiple surveillance and tracking channels. A hacker group compromising one may be possible, but to get away with a crime, it would be necessary to compromise the feeds coming from all of the channels at the same time. Extremely unlikely. The deep fake of someone else robbing a store that you put on the store's security camera system is nice. But the police are likely to be more convinced by the other ten thousand smart phones, smart glasses, and security and door bell cameras in the neighborhood that recorded you and your buddies running out of that store with guns at the time of the robbery.
This is how the "watcher" types are trained. Information is only valid if they can get it from multiple independent sources. So they love when new tracking and surveillance channels are released. (Social media apps, or smart glasses, or doorbell cameras or what have you.)
The bar would be much higher than compromising a single channel. With multiplying channels, the task of compromising them all approaches impossibility.
1. Cryptographically hash each piece of media when it's recorded.
2. Submit the hash to a "trusted" authority.
3. It will add a timestamp and sign the result.
4. Now, as long as you keep the original, without re-compressing, and you trust the authority, you have some evidence that the media existed at a timestamp. On or before.
This doesn't prove authenticity, but in many cases, establishing a timestamp would be enough. Forgeries probably wouldn't be created until later, after the shit hit the fan.
You can do this already by yourself or use a service which you need not even trust.
Every x seconds, collect all the pieces of data that need to be notarized as existing in the world prior to some time t. It can be an arbitrary amount of data.
Construct a Merkle Tree[1] of all the hashes (or HMACs) of all the data to be notarized. Compute the Merkle Root. Make sure that everyone gets their Merkle Proof (path from leaf to root) or publish the Merkle Tree publicly.
Embed the Merkle Root into a one or more cryptocurrency blockchains to exploit their immutability guarantees. By either including it as "additional data" to some transaction or just straight up as a fictional cryptocurrency address.
Every piece of data processed in this way will have a Merkle Proof (cryptographic path from hash of the data to the Merkle Root) that proves it existed prior to the creation of the Merkle Root. The Merkle Root will have its creation time bounded by the proof of work conducted on the cryptocurrency blockchain.
Thieves are planning an inside job. They forge the surveillance video ahead of time, do the theft and submit the forgery to be timestamped while it's happening.
Also, a lot of surveillance systems are purposely kept offline to prevent them from being compromised, but your system doesn't allow that because they would need external connectivity to get signatures.
Sure when you're controlling the source, you can fake it. But requiring the fakes be prepared ahead of time locks the faker into one story that may be contradicted by other evidence.
That's pretty much what happens anyway. The police are going to come collect the footage as soon as the crime is reported and you can't change it after they do.
Isn't this basically achieved by standard cloud storage on phones? Photos and videos get uploaded automatically (with authentication for the user account) and presumably that action is logged somewhere the user can't edit. Just need to prove those logs are secure, and there you go.
Unless you’re they police in a murder case, you’re probably not going to get google or apple to give you something that certifies that a file was uploaded before a specific time and not modified after. And just showing me the modification date in the UI wouldn’t convince me.
Much easier with the account holder's consent, it's probably in the "data download" thing you can request. And probably available by subpoena. It's like phone records.
I don't mean "date modified" in the file metadata when it's uploaded (of course that could easily be spoofed before upload) but the actual "date uploaded". You know something must have been made before a certain time.
This is a pretty terrifying problem, especially considering how easily accessible AI voice cloning tech has become. The fact that courts still operate under rules written before AI-generated evidence was even a concept is concerning
What shocks (and irritates!) me is that Charles Schwab keeps wanting me to set up voice ID. Why would I want to set up a voice ID for something that is now trivially spoofed?
When was the last time you encountered this? I remember getting nags up until around the end of last year, but not lately. I like to think they dropped the program because I expressed concerns about it to so many reps, but more likely I've just been dialing in on a different number.
Schwab used to have an 8 character maximum on passwords (although at least they changed that). They have never been a paragon of good security practices.
I'll say it again, even though it is rather unpopular here: there has never been a need to develop these tools, nor one to make them easy to deploy, nor one to make them easy to use. Yet all this has happened, and now it may occur that someone is acquitted because AI generated media is so good, the evidence might be artificial. If that happens, and the suspect commits another crime, it's on the conscience of the people that contributed to this. You cannot create something and pretend its use has nothing to do with you.
The tools aren't perfect yet, so it's not too late to stop. Stop the ridiculous image and audio generation tools before it's too late. Nothing of value is lost when these models are made private again, and research is simply halted.
It actually is too late for that. For anyone unaware, the open source models are already more than sufficient enough for imitations and deepfakes. For better or worse there's no going back.
Personally, I'd rather we all know this tech is out there and develop defense mechanisms rather than thinking hiding it away will prevent harm.
The cyber security industry exists because of all the privacy and security issues posed by all the tech we already have had for the past several decades.
I'm confident the same will happen for AI simply because it is a business opportunity and other businesses and institutions are already talking about these issues.
There is an argument that "AI-generated shit" (on social media, or otherwise) can influence elections, and thus war. The best example I can think of is https://www.bbc.com/news/world-asia-india-68918330.
> Nothing of value is lost when these models are made private again
To list a few uses I see for voice-generation/cloning tools:
* Real-time translation of a user's voice, maintaining emotion and intonation
* Professional-quality audio from cheap microphone setups (for video tutorials, indie games, etc.)
* Allowing those with speech impairments to communicate using their natural voice again, or:
* Allowing those uncomfortable with their natural voice to communicate closer to how they wish to be perceived
* Customization of voice assistants, such as to use a native accent/dialect
* Movies, podcasts, audiobooks, news broadcasts, etc. made available in a huge range of languages
* And of course: memes, satire, and parody
If it exists but isn't widely accessible, it's likely in the hands of Musk/Zuck and and various state actors. To me that seems possibly the worst alternative - to have the public generally unaware that it's possible and receiving few of the benefits listed, yet still having it available as a tool for competent disinformation.
What a truly fascinating perspective. I wonder if you yearn to build complex tunnel systems underground at the behest of a queen? I’m of course describing the most common habit on the planet.
There is some use to bringing deepfakes to mass adoption. The thing is that since the tech exists, powerful actors with lots of resources will develop these tools for their own use either way. The question is whether they'll be able to fool the masses who are unaware that such realistic deepfakes can exist, or whether they will have no effect as everyone and their mom have already seen similar AI slop on their Facebook feed
Wait some more time and photos or even video recordings will be deemed just as dangerous. And then what? Even if there is real evidence, it will have to be discarded unless it can't be sufficiently validated. It will get very hard to prove anything.
Or perhaps camera manufactures will start putting traces in. Anyone who makes surveillance systems (like stores use) should put some end to end things in so they can say "this camera took that recording and we can see that it wasn't tampered with by...". (if anyone works for a surveillance company please run with this!) With encryption we can verify a lot of things, but it sets the bar higher than someone took a picture.
Of course in a darkroom someone skilled could always make a fake photo - but the bar is a lot lower with AI.
I think the best proof of authenticity is already there, in the quirks each particular camera model has in the recording encoding. It's likely that a forgery would easily be shown to have differences in encoding when compared with a video file from the same camera. It would be more difficult with an audio recorder, as you could just record the AI voice in the room. An audio expert could probably show that the acoustics of the fakery don't match where it was claimed to be recorded.
Cryptography doesn't really fix it. There are a zillion camera makers and all it takes is one of them to have poor security and leak the keys. Then the forger uses any of the cameras with extractable keys to sign their forgery.
Or they just point a camera at a high resolution playback of a forged video.
This also assumes you can trust all the camera makers, because by doing this you're implicitly giving them each authorization to produce forged videos. Recall that many of these companies are based in China.
The point is the camera maker certifies in court under perjury penalty that their cameras are not compromised and that is their image. "Other camera systems are compromised, but ours is not...".
I wonder if film photography (slide film in particular) will become viable again and that photojournalists will once again become necessary for society.
Rather than cryptography which could be difficult to grok for a non-technical jury, a physical slide of film would be the source of truth.
It can still be faked by photographing a manufactured/generated scene though.
So... those talking head "influencers" who leave multiple hours of voice and video samples on social networking for anyone to download and clone are the most at risk for an attack like this?
AI threatens all digital perceptions, not just voice. Images, videos, recordings, ... I think soon enough proving things in court beyond a reasonable doubt when the evidence is digital media will be difficult/impossible.
On a related note, why oh why does Lloyds Bank insist on grabbing my voice for login every time I call them? I have to keep saying "no, fcuk off!" a dozen times until it gives up.
I think we're going to see C2PA (https://c2pa.org/) being mandatory for cellphones. At least when there's at least one implementation and that all digital cameras has an integrated TPM.