How Text Diff Works
4 min read · Developer tools
What is a diff?
A diff (short for difference) is a representation of the changes between two versions of a text. It shows what was added, removed, or left unchanged — without repeating the parts that stayed the same.
Every time you run git diff, open a pull request, or review a code change, you are looking at a diff. The diff command-line tool has been part of Unix since 1974.
How to read a unified diff
The most common format is the unified diff, used by git:
--- a/app.js
+++ b/app.js
@@ -12,7 +12,8 @@
function greet(name) {
- return "Hello " + name;
+ const greeting = `Hello, ${name}!`;
+ return greeting;
}
module.exports = { greet };--- and +++ — the two files being compared (before and after)
@@ -12,7 +12,8 @@ — the hunk header: old file starts at line 12 (7 lines shown), new file starts at line 12 (8 lines shown)
Lines starting with - — removed from the original
Lines starting with + — added in the new version
Lines starting with a space — unchanged context lines (usually 3 shown on each side)
How the algorithm works
Most diff tools use the Longest Common Subsequence (LCS) algorithm. Given two sequences of lines, the algorithm finds the longest sequence of lines that appears in both files in the same order — those lines are unchanged. Everything else is a deletion or an insertion.
For example, diffing these two lists:
Original
apple banana cherry date
Modified
apple blueberry cherry date elderberry
The LCS is apple, cherry, date. The diff shows: banana removed, blueberry added, elderberry added.
Myers diff algorithm
The algorithm used by git (and most modern tools) is the Myers diff algorithm, published in 1986. It finds the shortest edit script — the minimum number of insertions and deletions needed to transform one file into the other.
Myers is fast enough for large files and produces compact, human-readable diffs. For very large files, git uses a variant called histogram diff which avoids common boilerplate lines (like empty lines or closing braces) appearing in the LCS, which would produce confusing diffs.
Diff in git
Common git diff commands:
# Changes in working directory (not staged) git diff # Changes that are staged (ready to commit) git diff --staged # Changes between two commits git diff abc123 def456 # Changes between a branch and main git diff main..feature/my-branch # Word-level diff (highlights changed words, not lines) git diff --word-diff
Three-way merge
When two people edit the same file simultaneously, git needs to merge their changes. It does this with a three-way diff: it compares both edited versions against the common ancestor (the last commit both branches share).
If two people edited different parts of the file, git can merge automatically. If they edited the same lines, git flags a merge conflict and asks you to resolve it manually:
<<<<<<< HEAD
return "Hello " + name;
=======
return `Hi, ${name}!`;
>>>>>>> feature/greetingCompare text in your browser
Paste two versions of any text and see the differences highlighted side by side.
Text Diff Tool →