Olympus Docs
CookbookDefensive security

Identity graph for fraud detection

Connect identities across signals to spot abuse

A single user might create multiple accounts (anti-spam evasion, multi-account abuse, sock-puppets). Identity graph connects accounts that share signals.

Data model

CREATE TABLE identity_signals (
  identity_id UUID,
  signal_type TEXT,    -- "ip", "device_fingerprint", "phone", "address", "stripe_customer"
  signal_value TEXT,
  observed_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ix_signals ON identity_signals(signal_type, signal_value);

-- Find linked identities
SELECT DISTINCT s1.identity_id, s2.identity_id
FROM identity_signals s1
JOIN identity_signals s2 ON s1.signal_value = s2.signal_value
                         AND s1.signal_type = s2.signal_type
                         AND s1.identity_id != s2.identity_id;

Two identities share a signal → connected.

Signals to capture

  • IP address (changes but stable for periods).
  • Device fingerprint.
  • Phone number (rare same-phone reuse).
  • Credit card token / Stripe customer (very strong link).
  • Address (delivery, billing).
  • User agent (weak; many share).
  • Email domain.

Strength of signals

SignalStrengthReuse rate
EmailIdentifier(not a link, that's the user)
Credit cardHigh1-2% legitimate reuse
PhoneHigh< 5% legitimate (family)
Device fingerprintMedium5-10% (shared computers)
IPLow20-30% (shared NAT)
User agentVery lowmany share

Score per signal:

const weights = {
  credit_card: 0.9,
  phone: 0.8,
  device: 0.6,
  ip: 0.3,
  user_agent: 0.1,
};

Sum scores between two identities = link strength.

Graph queries

"Find identities likely the same as X":

WITH linked AS (
  SELECT 
    s2.identity_id AS related_id,
    SUM(weights[s1.signal_type]) AS strength
  FROM identity_signals s1
  JOIN identity_signals s2 ON s1.signal_value = s2.signal_value
                           AND s1.signal_type = s2.signal_type
                           AND s1.identity_id != s2.identity_id
  WHERE s1.identity_id = $X
  GROUP BY s2.identity_id
)
SELECT * FROM linked WHERE strength > 0.5
ORDER BY strength DESC
LIMIT 20;

Threshold 0.5: high confidence same person.

Use cases

Multi-account abuse

User created 10 accounts to abuse free-tier. Signals:

  • Same credit card across 10 accounts.
  • Same device fingerprint across 10 accounts.

Detection:

SELECT identity_id, COUNT(DISTINCT related_id) FROM ... GROUP BY 1 HAVING COUNT > 3;

Flag, suspend, investigate.

Account takeover

Existing user's account suddenly has new device/IP. Cross-reference: does the new device match any other identity's known device?

If yes: known bad actor's device. Strong evidence of ATO.

Sock puppets (forums, reviews)

User has multiple accounts to post fake content. Detect by signal cluster.

SELECT graph.cluster_id, COUNT(*) AS member_count
FROM identity_graph
GROUP BY 1
HAVING COUNT(*) > 5;

Clusters of 5+ identities sharing signals = suspect.

Privacy

The identity graph is a powerful tool, and a privacy concern. Document:

  • What signals you collect.
  • Why.
  • How long.

Don't sell the data. Don't share without legal basis.

Storage

Signals can be voluminous. For 1M users with 10 signals each = 10M rows. Indexable, manageable.

For long-term: aggregate after 90 days (frequency counts, no per-event records).

Manual review

Don't auto-ban based on graph alone. False positives:

  • Family share IPs.
  • Shared device (library, kiosk).
  • Same card across accounts (parent paying for child).

Use graph as INPUT to manual review queue.

Combine with anomaly

Graph + anomaly + behavior = stronger signal.

Graph alone: suspect.
Graph + behavior anomaly: stronger suspect.
Graph + anomaly + new device + bad IP: act.

Weighted score combines.

Cluster IDs

ALTER TABLE identities ADD COLUMN cluster_id UUID;

-- Background job: assign cluster IDs based on link strength
UPDATE identities SET cluster_id = ... WHERE id IN (...);

Membership in cluster useful for grouping investigations.

Visualizations

For investigators, graph viz:

// Build edges
const links = await db`SELECT id1, id2, strength FROM identity_links`;
// Render with d3 or similar

Manual inspection of who-is-connected-to-who. Detective work.

Tools

For complex graphs:

  • Neo4j (graph database).
  • TigerGraph.
  • Self-hosted with Postgres + pg_graph (sufficient for most).

For Olympus scale: Postgres is fine.

When you don't need this

For B2C apps with low abuse / low fraud, identity graph is overkill. Worth it when:

  • Free tier with valuable resources.
  • Marketplace fraud risk.
  • Multi-account incentive (referral programs).
  • Regulated industries.

On this page