There’s a really cool diff class hosted by Google here:
http://code.google.com/p/google-diff-match-patch/
I’ve used it before on a few web sites, but now I need to use it within an Excel macro to compare text between two cells.
However, it is only available in JavaScript, Python, Java, and C++, not VBA.
My users are limited to Excel 2003, so a pure .NET solution wouldn't work. Translating the code to VBA manually would take too much time and make upgrading difficult.
One option I considered was to compile the JavaScript or Java source using the .NET compilers (JScript.NET or J#), use Reflector to output as VB.NET, then finally downgrade the VB.NET code manually to VBA, giving me a pure VBA solution. After having problems getting it to compile with any .NET compiler, I abandoned this path.
Assuming I could have gotten a working .NET library, I could have also used ExcelDna (http://www.codeplex.com/exceldna), an open-source Excel add-in to make .NET code integration easier.
My last idea was to host an Internet Explorer object, send it the JavaScript source, and calling it. Even if I got this to work, my guess is it would be dirt-slow and messy.
UPDATE: Solution found!
I used the WSC method described below by the accepted answer. I had to change the WSC code a little to clean up the diffs and give me back a VBA-compatible array of arrays:
function DiffFast(text1, text2)
{
var d = dmp.diff_main(text1, text2, true);
dmp.diff_cleanupSemantic(d);
var dictionary = new ActiveXObject("Scripting.Dictionary"); // VBA-compatible array
for ( var i = 0; i < d.length; i++ ) {
dictionary.add(i, JS2VBArray(d[i]));
}
return dictionary.Items();
}
function JS2VBArray(objJSArray)
{
var dictionary = new ActiveXObject("Scripting.Dictionary");
for (var i = 0; i < objJSArray.length; i++) {
dictionary.add( i, objJSArray[ i ] );
}
return dictionary.Items();
}
I registered the WSC and it worked just fine. The code in VBA for calling it is as follows:
Public Function GetDiffs(ByVal s1 As String, ByVal s2 As String) As Variant()
Dim objWMIService As Object
Dim objDiff As Object
Set objWMIService = GetObject("winmgmts:")
Set objDiff = CreateObject("Google.DiffMatchPath.WSC")
GetDiffs = objDiff.DiffFast(s1, s2)
Set objDiff = Nothing
Set objWMIService = Nothing
End Function
(I tried keeping a single global objWMIService and objDiff around so I wouldn't have to create/destroy these for each cell, but it didn't seem to make a difference on performance.)
I then wrote my main macro. It takes three parameters: a range (one column) of original values, a range of new values, and a range where the diff should dump the results. All are assumed to have the same number of row, I don't have any serious error-checking going on here.
Public Sub DiffAndFormat(ByRef OriginalRange As Range, ByRef NewRange As Range, ByRef DeltaRange As Range)
Dim idiff As Long
Dim thisDiff() As Variant
Dim diffop As String
Dim difftext As String
difftext = ""
Dim diffs() As Variant
Dim OriginalValue As String
Dim NewValue As String
Dim DeltaCell As Range
Dim row As Integer
Dim CalcMode As Integer
These next three lines speed up the update without botching the user's preferred calculation mode later:
Application.ScreenUpdating = False
CalcMode = Application.Calculation
Application.Calculation = xlCalculationManual
For row = 1 To OriginalRange.Rows.Count
difftext = ""
OriginalValue = OriginalRange.Cells(row, 1).Value
NewValue = NewRange.Cells(row, 1).Value
Set DeltaCell = DeltaRange.Cells(row, 1)
If OriginalValue = "" And NewValue = "" Then
Erasing the previous diffs, if any, is important:
Erase diffs
This test is a visual shortcut for my users so it's clear when there's no change at all:
ElseIf OriginalValue = NewValue Then
difftext = "No change."
Erase diffs
Else
Combine all the text together as the delta cell value, whether the text was identical, inserted, or deleted:
diffs = GetDiffs(OriginalValue, NewValue)
For idiff = 0 To UBound(diffs)
thisDiff = diffs(idiff)
difftext = difftext & thisDiff(1)
Next
End If
You have to set the value before starting the formatting:
DeltaCell.value2 = difftext
Call FormatDiff(diffs, DeltaCell)
Next
Application.ScreenUpdating = True
Application.Calculation = CalcMode
End Sub
Here's the code that interprets the diffs and formats the delta cell:
Public Sub FormatDiff(ByRef diffs() As Variant, ByVal cell As Range)
Dim idiff As Long
Dim thisDiff() As Variant
Dim diffop As String
Dim difftext As String
cell.Font.Strikethrough = False
cell.Font.ColorIndex = 0
cell.Font.Bold = False
If Not diffs Then Exit Sub
Dim lastlen As Long
Dim thislen As Long
lastlen = 1
For idiff = 0 To UBound(diffs)
thisDiff = diffs(idiff)
diffop = thisDiff(0)
thislen = Len(thisDiff(1))
Select Case diffop
Case -1
cell.Characters(lastlen, thislen).Font.Strikethrough = True
cell.Characters(lastlen, thislen).Font.ColorIndex = 16 ' Dark Gray http://www.microsoft.com/technet/scriptcenter/resources/officetips/mar05/tips0329.mspx
Case 1
cell.Characters(lastlen, thislen).Font.Bold = True
cell.Characters(lastlen, thislen).Font.ColorIndex = 32 ' Blue
End Select
lastlen = lastlen + thislen
Next
End Sub
There are some opportunities for optimization, but so far it's working just fine. Thanks to everyone who helped!
See Question&Answers more detail:
os