ScamShield AI ScamShield AI
Platform How it works Pricing Docs Blog Contact
Contact us Start free trial
Documentation
  • Overview
  • Getting Started
  • Integrations
  • Threat Inbox
  • Universal Scanner
  • Voice Analysis
  • Dark Web Monitoring
  • Training
  • DPDP Compliance
  • FAQ & Troubleshooting

Voice Analysis

Voice Analysis screens an audio clip for signs that it was synthesised or cloned (for example, a fake "manager" voice note asking for a payment). This section explains what it checks and — just as importantly — what it does not do.

What it is (and isn't)

Important: Voice Analysis is rule-based spectral analysis, not a trained machine-learning model. Its version is rule-based-v1.1. A CNN-based classifier is planned but not yet shipped. Please read the limitations below before relying on it for high-stakes decisions.

It works by extracting acoustic features with librosa and applying calibrated thresholds. Each check that "fires" adds to a synthetic-voice score.

What it checks

The detector computes eight acoustic indicators:

Indicator Fires when Notes
Spectral flatnessToo highSynthetic audio is often spectrally "flatter"
Zero-crossing rateUnusually low or highOut-of-range articulation
Pitch variationToo lowFlat, monotone pitch (a strong cloning tell)
MFCC varianceToo lowLeast useful on its own
Energy varianceToo lowUnnaturally even loudness
Harmonic ratio (HPSS)Too lowOne of the two strongest discriminators
Silence ratioToo highCan indicate splicing
Bandwidth variationToo highThe other strongest discriminator

What you get back

The result includes a deepfake score (0–100), a risk level, plain-English indicators, the raw technical feature values, a confidence label, and a recommended action:

Score Risk level Verdict
≤ 30Low (green)Likely genuine human voice
31–60Medium (yellow)Some synthetic characteristics detected
> 60High (red)Strong indicators of synthetic or cloned voice

The related detection types are: TTS synthesis, voice cloning, audio splicing, and authentic.

Limitations (please read)

  • It is heuristic, not ML. There is no trained classifier making a probabilistic judgement — it counts how many threshold checks fired.
  • Confidence is derived from the number of indicators, not from a calibrated probability.
  • Harmonic ratio and bandwidth variation are the only strong discriminators; some indicators (like MFCC variance) are weak on their own.
  • Treat a result as a signal to verify through another channel (call the person back on a known number), not as proof.

Privacy

DPDP note: Your audio is deleted immediately after analysis — synchronously, before the response is returned. Only the verdict and metadata (filename, format, size, score, detection type, and the spectral-check results) are retained; the audio itself is never persisted. (Some internal docstrings describe this as "within 5 minutes"; the actual behaviour is stricter — immediate deletion.)
ScamShield AI
Docs Privacy Policy Terms of Service Refund Policy SLA Blog
ScamShield AI Pvt Ltd, Ahmedabad, Gujarat, India
ScamShield AI Ltd, London, UK (Company No. 17092415)
© 2026 ScamShield AI · Made in India