Deduplication isn’t one-size-fits-all. The right method depends on your data quality and what “duplicate” means in your context.
Option 1: Full-row deduplication
Full-row deduplication treats two rows as duplicates only if every column matches.
Use it when:
- You want to remove exact duplicates
- You’re merging exports from the same system
- You trust that identical records truly are duplicates
Downside:
- If one column differs (spacing, casing, timestamp), the rows won’t be considered duplicates.
Option 2: Key-based deduplication
Key-based deduplication dedupes rows using a single column (or key), like:
emailcustomer_idorder_id
Use it when:
- You know a stable identifier exists
- You’re merging multiple sources
- You want one record per key
Downside:
- If the key is missing or inconsistent, results may be wrong.
- You must decide which row “wins” (first seen is a common strategy).
Best practices
- Trim and normalize key columns before deduping (e.g.,
emailto lowercase). - Validate how many rows were removed.
- Export and inspect a preview for sanity.
Oriah Sheet supports both modes: full-row comparison or dedup by a selected column key.