I wanna make a program that detect a potential duplicates with 3 severity level.
let consider my data is only in two column, but with thousands row.
data in second column delimited only with comma. data example :
Number | Material
1 | helmet,valros,42
2 | helmet,iron,knight
3 | valros,helmet,42
4 | knight,helmet
5 | valros,helmet,42
6 | plain,helmet
7 | helmet, leather
and my 3 levels is :
very high : A,B,C vs A,B,C
high : A,B,C vs B,C,A
so so : A,B,C vs A,B
so far i just can make the first level, i don't know how to do the 2nd and the 3rd level.
what I've tried.
Sub duplicates_separation()
Dim duplicate(), i As Long
Dim delrange As Range, cell As Long
Dim shtIn As Worksheet, shtOut As Worksheet
Set shtIn = ThisWorkbook.Sheets("input")
Set shtOut = ThisWorkbook.Sheets("output")
x = 2
y = 1
Set delrange = shtIn.Range("b1:b10000") 'set your range here
ReDim duplicate(0)
'search duplicates in 2nd column
For cell = 1 To delrange.Cells.Count
If Application.CountIf(delrange, delrange(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange(cell).Address
i = i + 1
End If
Next
'print duplicates
For i = UBound(duplicate) To LBound(duplicate) Step -1
shtOut.Cells(x, 1).EntireRow.Value = shtIn.Range(duplicate(i)).EntireRow.Value
End Sub
duplicates detected by the program :
3 | valros,helmet,42
5 | valros,helmet,42
what i expected:
Number | Material
1 | helmet,valros,42
3 | valros,helmet,42
5 | valros,helmet,42
4 | knight,helmet
2 | helmet,iron,knight
i have an idea for detecting duplicates lv 2, but I think it will be so complicated and make the program slow.
- turn column 2 to columns with "text to columns" command
- sort column from A to Z (alphabetically)
- concatenate the column
- do countif like done in detecting duplicates lv 1
is there a way to detect the 2nd & 3rd level duplicates?
UPDATE
Yesterday I went to friend's house to consult about this problem, but his solution is in JAVA languange.. >which i don't understand
public class ali {
static void sPrint(String[] Printed) {
for (int iC = 0; iC < Printed.length; iC++) {
System.out.print(String.valueOf(Printed[iC]) + " | ");
}
System.out.println();
}
public static void main(String Args[]) {
int defaultLength = 10;
int indexID = 0;
int indexDesc = 1;
String[] DETECTORP1 = new String[defaultLength];
String[] DETECTORP2 = new String[defaultLength];
String[] DETECTORP3 = new String[defaultLength];
String[] DETECTORP4 = new String[defaultLength];
String[][] theString = new String[5][2];
theString[0] = new String[]{"1", "A, B, C, D"};
theString[1] = new String[]{"2", "A, B, C, D"};
theString[2] = new String[]{"3", "A, B, C, D, E"};
theString[3] = new String[]{"4", "A, B, D, C, E"};
theString[4] = new String[]{"5", "A, B, D, C, E, F"};
int P1 = 0;
int P2 = 0;
int P3 = 0;
int P4 = 0;
for (int iC = 0; iC < theString.length; iC++) {
System.out.println(theString[iC][indexID] + " -> " + theString[iC][indexDesc]);
}
for (int iC = 0; iC < theString.length; iC++) {
int LEX;
String theReference[] = theString[iC][indexDesc].replace(",", ";;").split(";;");
for (int iD = 0; iD < theString.length; iD++) {
if (iC != iD) {
String theCompare[] = theString[iD][1].replace(",", ";;").split(";;");
if (theReference.length == theCompare.length) {
LEX=0;
int theLength = theReference.length;
for (int iE = 0; iE < theLength; iE++) {
if (theReference[iE].equals(theCompare[iE])) {
LEX += 1;
}
}
if (LEX == theLength) {
DETECTORP1[P1] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
P1 += 1;
} else {
LEX = 0;
for (int iF = 0; iF < theReference.length; iF++) {
for (int iG = 0; iG < theCompare.length; iG++) {
if (theReference[iF].equals(theCompare[iG])) {
LEX += 1;
break;
}
}
}
if (LEX == theReference.length) {
DETECTORP2[P2] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
P2 += 1;
}
}
} else {
LEX = 0;
if (theReference.length > theCompare.length) {
for (int iF = 0; iF < theReference.length; iF++) {
for (int iG = 0; iG < theCompare.length; iG++) {
if (iG == iF) {
if (theReference[iF].equals(theCompare[iF])) {
LEX += 1;
break;
}
}
}
}
if (LEX <= theReference.length && LEX >= theCompare.length) {
DETECTORP3[P3] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
P3 += 1;
}
} else {
LEX =0;
for (int iF = 0; iF < theCompare.length; iF++) {
for (int iG = 0; iG < theReference.length; iG++) {
if (iG == iF) {
if (theCompare[iF].equals(theReference[iF])) {
LEX += 1;
// System.out.println(theReference[iG] + "==" + theCompare[iG]);
break;
}
}
}
}
if (LEX <= theCompare.length && LEX >= theReference.length) {
DETECTORP3[P3] = theString[iC][indexID] + " WITH " + theString[iD][indexID];
P3 += 1;
}
}
}
}
}
}
sPrint(DETECTORP1);
sPrint(DETECTORP2);
sPrint(DETECTORP3);
}
}
how to do this in VBA?
See Question&Answers more detail:
os