DPDP Compliance

ScamShield AI is designed to be conscious of India's Digital Personal Data Protection (DPDP) Act, 2023. This section summarises what is stored, what isn't, how data is pseudonymised, and how audio is handled — grounded in the actual product behaviour.

What is stored vs. not stored

The rule is detection metadata is retained; message content is not — with one deliberate exception (the Universal Scanner). Here is the precise picture per pipeline:

Pipeline	Content stored?	What is retained
Email analysis (Threat)	No body, no raw subject/sender	A SHA-256 hash of the sender domain and the subject length only
Gmail scan (Inbox message)	No body	Hashed Gmail message-ID, hashed sender domain, and a ≤200-char subject preview
Voice	No audio	Verdict + metadata (filename, format, size, score, detection type, spectral checks)
Universal Scanner (text/URL/UPI)	Yes — full submitted input is stored	The raw text/URL/UPI you submitted, plus the result
Universal Scanner (image)	Image + OCR text stored	The screenshot and the extracted OCR text

Be precise about the exception: the email, Gmail, and voice pipelines store metadata only. The Universal Scanner stores the content you submit (raw_text for text/URL/UPI; the image and its OCR text for screenshots). This is not a truncated preview — the full submitted input is persisted for the scan record. If you paste sensitive content into the scanner, it is stored. Plan your usage (and any customer-facing privacy copy) accordingly.

Pseudonymisation

Where identifiers must cross the tenant boundary (to power the shared threat-intelligence network), they are hashed, never stored in the clear:

The shared threat-intel graph is DPDP-safe by design: it exposes only aggregate counts (total sightings, distinct-organisation count) and never reveals which company reported an indicator. Its scope is UPI IDs and domains only — phone numbers and bank accounts are explicitly out of scope.
Indicators are normalised and keyed by a SHA-256 hash (sha256("{type}:{normalized_value}")). The per-company link is internal bookkeeping and is never surfaced.
Elsewhere: email sender domains are hashed, Gmail message-IDs and sender domains are hashed, IP addresses are hashed and truncated, and feedback training records keep only a short subject snippet plus the sender domain — never PII.

Audio handling

DPDP note: Uploaded voice audio is deleted immediately after analysis — synchronously, within the same request, before the response is returned. There is no audio file left on disk waiting for a cleanup job. (Some internal documentation phrases this as "within 5 minutes"; the implemented behaviour is stricter and immediate.)

Your rights and controls

Audit log — security and account events (logins, MFA changes, integrations connected/disconnected, data export, data deletion, API-key actions) are recorded with timestamp, IP, and user agent.
Compliance export — Professional/active companies can export a DPDP compliance report (PDF) for a date range. It includes category counts and a high-severity table, and attests *"Email body content stored: ZERO — DPDP compliant."*
Deletion requests — a DPDP erasure request is tracked with a scheduled deletion time and a grace period (pending → processing → completed, with the option to cancel during the grace window).

Verify before you publish: these docs describe ScamShield AI's *technical* data handling as implemented. They are not legal advice and are not a substitute for your own DPDP notice, consent records, and data-processing agreements. Have your compliance/legal owner review any customer-facing privacy statements — especially the Universal Scanner content-retention point above.