Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
904 views
in Technique[技术] by (71.8m points)

algorithm - Given the runtime data, how to know if sorting program uses bubble sort or insertion sort?

I measured a sorting program / algorithm and based on the runtime data, I have narrowed it down to two sorting algorithms - bubble sort and insertion sort.

Is there a way to know for sure which one it is? Without knowing the code of course.

They both have the same time complexity and I'm out of ideas.

Time complexity data:

  • Sorted: O(n) Time taken for 1000 numbers = 0.0047s
  • Random unsorted: O(n^2) Time taken for 1000 numbers = 0.021s
  • Descending order: O(n^2) Time taken for 1000 numbers = 0.04s

Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
  1. Your 1000 elements for sort is too low

    the measured times are too low to represent a valid measurement (as majority of the time might not be used by the sort itself but initialization of window, openning files etc ...).

    you need times at least 100ms or more (1 sec is ideal).

  2. if you have access to data that is being sorted

    You can introduce data set that will be challenging for each type of sort (and from times infer used algo)... so for example bubble sort is slowest for sorted array in reverse order ... so pass sorted data ascending and descending and random and compare times. let call the times tasc,tdes,trnd and assuming ascending sort then if bubble sort is involved it should be:

    tasc O(n) < trnd  < tdes O(n^2)
    

    so:

    tasc*n == tdes + margin_of error
    

    so just test tdes/tasc is close to n ... with some margin for error ...

    so you just need to create a sample data that will be hard for specific type of sort and not for the others ... and from the times detect if it is the case or not until you find algo used.

    Here some data (all times are in [ms]) I tested on mine bubble sort and asc ordered data:

       n     tasc    tdesc    tasc*n
    1000  0.00321  2.96147  3.205750
    2000  0.00609 11.76799 12.181855
    4000  0.01186 45.58834 47.445111
    

    to be more clear if we have runtime for complexity O(n)

    t(O(n)) = c*n
    

    to convert to runtime with complexity O(n^2) (assuming the same constant time c):

    t(O(n^2)) = c*n*n = t(O(n)) * n
    

    This way you can compare times with different complexities you just need to convert all measured time into single common complexity...

  3. if you can chose sorted data size

    then as it was mentioned in comments you can infer the growth rate of the times with increasing n (doubling) from that you can estimate complexity and from that you can tell which algo was used.

    So lets assume measured times from #2 then for O(n) the constant time c should be the same so for tasc (O(n)):

       n     tasc    c=tasc/n
    1000  0.00321 0.000003210
    2000  0.00609 0.000003045 
    4000  0.01186 0.000002965 
    

    and for tdesc (O(n^2)):

       n     tdesc        tdesc/n^2
    1000   2.96147 0.00000296147000
    2000  11.76799 0.00000294199750
    4000  45.58834 0.00000284927125
    

    as you can see the c is more or less the same for both times tasc,tdesc which means they comply their estimated complexities O(n),O(n^2)

However without knowing what the tested App does is hard to be sure as sorting might be preceded by processing ... for example data might be scanned to detect the form (sorted, random, almost sorted ...) which is doable in O(n) and with the result along with data size it might chose which sorting algo to use ... So your measurements might measure different routines invalidating results ...

[edit1] I had an insane idea of detecting the complexity automaticaly

Simply by testing if constant time constant is more or less the same between all meassured times versus their corresponding n... Here simple C++/VCL code:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
    {
     11,0.008,
     12,0.012,
     13,0.013,
     14,0.014,
     15,0.016,
     16,0.014,
     17,0.015,
     18,0.017,
     19,0.019,
     20,0.016,
     21,0.017,
     22,0.019,
     23,0.021,
     24,0.023,
     25,0.025,
     26,0.027,
     27,0.029,
     28,0.032,
     29,0.034,
     30,0.037,
     31,0.039,
     32,0.034,
     33,0.037,
     34,0.039,
     35,0.041,
     36,0.039,
     37,0.041,
     38,0.044,
     39,0.046,
     40,0.041,
     41,0.044,
     42,0.046,
     43,0.049,
     44,0.048,
     45,0.050,
     46,0.054,
     47,0.056,
     48,0.056,
     49,0.060,
     50,0.063,
     51,0.066,
     52,0.065,
     53,0.069,
     54,0.072,
     55,0.076,
     56,0.077,
     57,0.162,
     58,0.095,
     59,0.093,
     60,0.089,
     61,0.093,
     62,0.098,
     63,0.096,
     64,0.090,
     65,0.100,
     66,0.104,
     67,0.111,
     68,0.100,
     69,0.121,
     70,0.109,
     71,0.119,
     72,0.104,
     73,0.124,
     74,0.113,
     75,0.118,
     76,0.118,
     77,0.123,
     78,0.129,
     79,0.133,
     80,0.121,
     81,0.119,
     82,0.131,
     83,0.150,
     84,0.141,
     85,0.148,
     86,0.154,
     87,0.163,
     88,0.211,
     89,0.151,
     90,0.157,
     91,0.166,
     92,0.161,
     93,0.169,
     94,0.173,
     95,0.188,
     96,0.181,
     97,0.187,
     98,0.194,
     99,0.201,
    100,0.185,
    101,0.191,
    102,0.202,
    103,0.207,
    104,0.242,
    105,0.210,
    106,0.215,
    107,0.221,
    108,0.217,
    109,0.226,
    110,0.232,
    111,0.240,
    112,0.213,
    113,0.231,
    114,0.240,
    115,0.252,
    116,0.248,
    117,0.598,
    118,0.259,
    119,0.261,
    120,0.254,
    121,0.263,
    122,0.270,
    123,0.281,
    124,0.290,
    125,0.322,
    126,0.303,
    127,0.313,
    128,0.307,
      0,0.000
    };
//---------------------------------------------------------------------------
double sort_asc[]=
    {
    1000,0.00321,
    2000,0.00609,
    4000,0.01186,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_desc[]=
    {
    1000, 2.96147,
    2000,11.76799,
    4000,45.58834,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_rand[]=
    {
    1000, 3.205750,
    2000,12.181855,
    4000,47.445111,
       0,0.000
    };
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat)  // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
    {
    AnsiString O="O(?)";
    int i,e;
    double t,n,c,c0,c1,a,dc=1e+10;
    #define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
    #define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }


    testbeg;            c=div(t,n);                 testend("O(n)");
    testbeg;            c=div(t,n*n);               testend("O(n^2)");
    testbeg;            c=div(t,n*n*n);             testend("O(n^3)");
    testbeg;            c=div(t,n*n*n*n);           testend("O(n^4)");
    testbeg; a=log(n);  c=div(t,a);                 testend("O(log(n))");
    testbeg; a=log(n);  c=div(t,a*a);               testend("O(log^2(n))");
    testbeg; a=log(n);  c=div(t,a*a*a);             testend("O(log^3(n))");
    testbeg; a=log(n);  c=div(t,a*a*a*a);           testend("O(log^4(n))");
    testbeg; a=log(n);  c=div(t,n*a);               testend("O(n.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*a);             testend("O(n^2.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a);           testend("O(n^3.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a);         testend("O(n^4.log(n))");
    testbeg; a=log(n);  c=div(t,n*a*a);             testend("O(n.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a);           testend("O(n^2.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a);         testend("O(n^3.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a);       testend("O(n^4.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a);           testend("O(n.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a);         testend("O(n^2.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a);       testend("O(n^3.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a);     testend("O(n^4.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a*a);         testend("O(n.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a*a);       testend("O(n^2.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a*a);     testend("O(n^3.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a*a);   testend("O(n^4.log^4(n))");

    #undef testend
    #undef testbeg
    return O+AnsiString().sprintf(" error = %.6lf",dc);
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    mm_log->Lines->Clear();
    mm_log->Lines->Add("factorial "+get_complexity(factorial));
    mm_log->Lines->Add("sort asc  "+get_complexity(sort_asc));
    mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
    mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
    }
//-------------------------------------------------------------------------

with relevant times measurements of mine fast exact bigint factorial where I simply used only the bigger times above 8ms, and also the sorting measurement from above which outputs this:

factorial O(n.log^2(n)) error = 0.665782
sort asc  O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000

The code just test few supported complexities and outputs the one that has lowest error (variation of c constant time between different n)...

Just ignore the VCL stuff and convert the AnsiString into any string or output you want ...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...