Voice Analysis
Voice Analysis screens an audio clip for signs that it was synthesised or cloned (for example, a fake "manager" voice note asking for a payment). This section explains what it checks and — just as importantly — what it does not do.
What it is (and isn't)
Important: Voice Analysis is rule-based spectral analysis, not a trained machine-learning model. Its version is rule-based-v1.1. A CNN-based classifier is planned but not yet shipped. Please read the limitations below before relying on it for high-stakes decisions.
It works by extracting acoustic features with librosa and applying calibrated thresholds. Each check that "fires" adds to a synthetic-voice score.
What it checks
The detector computes eight acoustic indicators:
| Indicator | Fires when | Notes |
|---|---|---|
| Spectral flatness | Too high | Synthetic audio is often spectrally "flatter" |
| Zero-crossing rate | Unusually low or high | Out-of-range articulation |
| Pitch variation | Too low | Flat, monotone pitch (a strong cloning tell) |
| MFCC variance | Too low | Least useful on its own |
| Energy variance | Too low | Unnaturally even loudness |
| Harmonic ratio (HPSS) | Too low | One of the two strongest discriminators |
| Silence ratio | Too high | Can indicate splicing |
| Bandwidth variation | Too high | The other strongest discriminator |
What you get back
The result includes a deepfake score (0–100), a risk level, plain-English indicators, the raw technical feature values, a confidence label, and a recommended action:
| Score | Risk level | Verdict |
|---|---|---|
| ≤ 30 | Low (green) | Likely genuine human voice |
| 31–60 | Medium (yellow) | Some synthetic characteristics detected |
| > 60 | High (red) | Strong indicators of synthetic or cloned voice |
The related detection types are: TTS synthesis, voice cloning, audio splicing, and authentic.
Limitations (please read)
- It is heuristic, not ML. There is no trained classifier making a probabilistic judgement — it counts how many threshold checks fired.
- Confidence is derived from the number of indicators, not from a calibrated probability.
- Harmonic ratio and bandwidth variation are the only strong discriminators; some indicators (like MFCC variance) are weak on their own.
- Treat a result as a signal to verify through another channel (call the person back on a known number), not as proof.
Privacy
DPDP note: Your audio is deleted immediately after analysis — synchronously, before the response is returned. Only the verdict and metadata (filename, format, size, score, detection type, and the spectral-check results) are retained; the audio itself is never persisted. (Some internal docstrings describe this as "within 5 minutes"; the actual behaviour is stricter — immediate deletion.)