Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
387 views
in Technique[技术] by (71.8m points)

c# - I want to compare two files or text boxes to find the degree of similarity between them

Sorry for the disturbance. I've removed the code and edited the post...

Real problem is I'm trying to find out the degree of similarity or plagiarism act between two texts or files. how can I do that? If you guide me ...

I need the code for the above algorithm to be included in my project.

using visual studio 2013 ... c#

EDITED: k so far I've done this ...

        int i = 0;
        int j = 0;
        long lena1 = txtFile1.Text.Length;
        long lenb1 = lena1;
        long len2 = txtFile2.Text.Length;
        string str1 = txtFile1.Text;
        string str2 = txtFile2.Text;
        string str3;
        bool match = false;
        int count = 0;
        int nowords1 = 0;
        int nowords2 = 0;
        string str4;
        int k = 0;
        int m = 0;
        int nowords_match = 0;


        char[] array1 = str1.ToArray();
        char[] array2 = str2.ToArray();
        int[] loc1 = new int[1048576];
        int[] loc2 = new int[1048576];

        while (i < array1.Length)
        {
            if (array1[i] == ' ')
            {
                nowords1++;
                loc1[j] = i;
                j++;
            }

            i++;

        }

        i = j = 0;

        while (i < array2.Length)
        {

            if (array2[i] == ' ')
            {
                nowords2++;
                loc2[j] = i;
                j++;
            }

            i++;

        }

        i = j = 0;
        m = 0;

        for (k = 0; k < loc1.Length-2; k++)
        {
            str3 = str1.Substring(loc1[m], loc1[m + 1] - loc1[m]);
            match = true;

            if (match == true && count > 3)
            {
               txtPlagiarism.Text += " " + loc1[i-3] + loc1[i-2] + " " + loc1[i];
            }

            else
            {
                count = 0;
                match = false;
            }

            j = 0;
            i = 0;

            while (i < nowords2)
            {

                if (j != nowords2)
                {
                    str4 = str2.Substring(loc2[j], loc2[j + 1] - (loc2[j]));
                }

                else
                {
                    break;
                }

                if (str4.Equals(str3)) 
                {
                    nowords_match++;
                    count ++;
                }

                j++;
                i++;

            }

            m++;

        }

I'm just counting the number of words matched so that I can pick that number of words from the first_file text to the copy-case text. but I'm getting a run-time error in it.

**System.ArgumentOutOfRangeException was unhandled
  HResult=-2146233086
  Message=Length cannot be less than zero.
Parameter name: length
  Source=mscorlib
  ParamName=length
  StackTrace:
       at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Calculate_File_Checksum.Form1.btnDetectPlagiairism_Click(Object sender, EventArgs e) in c:UsersBLOOMDocumentsVisual Studio 2013App2TestCalculate_File_ChecksumCalculate_File_ChecksumForm1.cs:line 363
   at System.Windows.Forms.Control.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ButtonBase.WndProc(Message& m)
   at System.Windows.Forms.Button.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
   at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
   at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.Run(Form mainForm)
   at Calculate_File_Checksum.Program.Main() in c:UsersBLOOMDocumentsVisual Studio 2013App2TestCalculate_File_ChecksumCalculate_File_ChecksumProgram.cs:line 19
   at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()
  InnerException:** 

I don't understand why it is going so ?? because I've given the correct values in it ... please help anyone.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are numerous ways to compare the similarity of strings. Here's an algorithm Martin put together for the Levenshtein distance


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...