I have a big dataset df
(354903 rows) with two columns named df$ColumnName
and df$ColumnName.1
head(df)
CompleteName CompleteName.1
1 Lefebvre Arnaud Lefebvre Schuhl Anne
1.1 Lefebvre Arnaud Abe Lyu
1.2 Lefebvre Arnaud Abe Lyu
1.3 Lefebvre Arnaud Louvet Nicolas
1.4 Lefebvre Arnaud Muller Jean Michel
1.5 Lefebvre Arnaud De Dinechin Florent
I am trying to create labels to see whether the name is the same or not.
When I try a small subset it works [1 if they are the same, 0 if not]:
> match(df$CompleteName[1], df$CompleteName.1[1], nomatch = 0)
[1] 0
> match(df$CompleteName[1:10], df$CompleteName.1[1:10], nomatch = 0)
[1] 0 0 0 0 0 0 0 0 0 0
But as soon as I throw the complete columns, it gives me complete different values, which seem nonsense to me:
> match(df$CompleteName, df$CompleteName.1, nomatch = 0)
[1] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[23] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
[45] 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101
Should I use sapply
? I did not figured it out, I tried this with an error:
sapply(df, function(x) match(x$CompleteName, x$CompleteName.1, nomatch = 0))
Please help!!!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…