Voice Analysis

Voice Analysis screens an audio clip for signs that it was synthesised or cloned (for example, a fake "manager" voice note asking for a payment). This section explains what it checks and — just as importantly — what it does not do.

What it is (and isn't)

Important: Voice Analysis is rule-based spectral analysis, not a trained machine-learning model. Its version is rule-based-v1.1. A CNN-based classifier is planned but not yet shipped. Please read the limitations below before relying on it for high-stakes decisions.

It works by extracting acoustic features with librosa and applying calibrated thresholds. Each check that "fires" adds to a synthetic-voice score.

What it checks

The detector computes eight acoustic indicators:

Indicator	Fires when	Notes
Spectral flatness	Too high	Synthetic audio is often spectrally "flatter"
Zero-crossing rate	Unusually low or high	Out-of-range articulation
Pitch variation	Too low	Flat, monotone pitch (a strong cloning tell)
MFCC variance	Too low	Least useful on its own
Energy variance	Too low	Unnaturally even loudness
Harmonic ratio (HPSS)	Too low	One of the two strongest discriminators
Silence ratio	Too high	Can indicate splicing
Bandwidth variation	Too high	The other strongest discriminator

What you get back

The result includes a deepfake score (0–100), a risk level, plain-English indicators, the raw technical feature values, a confidence label, and a recommended action:

Score	Risk level	Verdict
≤ 30	Low (green)	Likely genuine human voice
31–60	Medium (yellow)	Some synthetic characteristics detected
> 60	High (red)	Strong indicators of synthetic or cloned voice

The related detection types are: TTS synthesis, voice cloning, audio splicing, and authentic.

Limitations (please read)

It is heuristic, not ML. There is no trained classifier making a probabilistic judgement — it counts how many threshold checks fired.
Confidence is derived from the number of indicators, not from a calibrated probability.
Harmonic ratio and bandwidth variation are the only strong discriminators; some indicators (like MFCC variance) are weak on their own.
Treat a result as a signal to verify through another channel (call the person back on a known number), not as proof.

Privacy

DPDP note: Your audio is deleted immediately after analysis — synchronously, before the response is returned. Only the verdict and metadata (filename, format, size, score, detection type, and the spectral-check results) are retained; the audio itself is never persisted. (Some internal docstrings describe this as "within 5 minutes"; the actual behaviour is stricter — immediate deletion.)