Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
677 views
in Technique[技术] by (71.8m points)

merge - New observations are added after merging - Stata

I am new to Stata and hope that one of you can help me with this problem! I have three data sets that I want to merge. The first two include the same variables; therefore I already merged these in the first step. In the second step I now want to merge the other file. After doing so, I end up with more observations than before. Am I doing something wrong?

This is what the first two data sets look like before I merged them with the third one:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     mergeid |          0
      ch001_ |     46,398    2.215785    1.570519         -2         17
        wave |     67,576    1.549781    .4975194          1          2
      br001_ |     46,389    3.113993    1.998745         -2          5
        bmi2 |     66,916    2.694468     .939573         -3          4
-------------+---------------------------------------------------------
       eurod |     65,568    2.322566    2.277759          0         12
       spheu |     30,284    2.304913     .940147         -2          5
isced1997y_r |     30,344     10.5346     8.95301         -2         97

This is what the third data set looks like before the merge:

   Variable. |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     mergeid |          0
     country |     28,472     18.6361    5.445199         11         30
         yob |     28,472    1942.239    10.05177       1907       1984
      gender |     28,472     1.56034    .4963544          1          2   
  sl_cs004d1 |     27,894    .9609235    .2006878         -2         1
-------------+---------------------------------------------------------   
  sl_cs004d2 |     27,894    .9078655    .2938936         -2          1  
  sl_cs007dno|     28,391    .2837167    .4594766         -2          1 
  sl_cs008_  |     28,392    2.046598    1.235409         -2          5   
  sl_rp002_  |     28,418    1.236188    .9437631         -1          5

Now, after I have merged them m:1 using mergeid as the key, this is what I end up with:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     mergeid |          0
      ch001_ |     28,951    2.162792    1.445806         -2         14
        wave |     41,967    1.603188    .4892422          1          2
      br001_ |     27,201    3.108709    1.997605         -2          5
        bmi2 |     41,741    2.730553    .8908915         -3          4
-------------+---------------------------------------------------------
       eurod |     41,193    2.233923    2.189107          0         12
       spheu |     16,616    2.216057    .8818345         -2          5
isced1997y_r |     16,638    10.51301    8.484587         -2         97
     country |     41,967    17.91288    4.886818         11         30
         yob |     41,967    1941.642    9.929027       1907       1978
-------------+---------------------------------------------------------
      gender |     41,967    1.563919    .4959034          1          2
  sl_cs004d1 |     41,162    .9599145    .2029794         -2          1
  sl_cs004d2 |     41,162    .9060055      .29645         -2          1
 sl_cs007dno |     41,866    .2776955    .4567399         -2          1
   sl_cs008_ |     41,868    2.029426    1.233308         -2          5
-------------+---------------------------------------------------------
   sl_rp002_ |     41,906    1.242113    .9549139         -1          5

All variables from the third data set have more observations now than they had before the merge. Does anyone know what I can do to solve this problem?

This is a description of all variables, in case this is helpful.

              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------
mergeid         str12   %12s                  Person identifier (fix across modules and waves)
ch001_          byte    %10.0f     dkrf       Number of children
wave            float   %9.0g                 
br001_          byte    %10.0f     yesno      Ever smoked daily
bmi2            byte    %27.0g     bmi2       Bmi categories
eurod           byte    %14.0g     eurod      Depression scale EURO-D - high is depressed
spheu           byte    %10.0g     spheu      Self-perceived health - european version
isced1997y_r    float   %25.0g     iscedy     Respondent: years of education derived from ISCED-97
country         byte    %14.0g     country    Country identifier
yob             int     %10.0g     dkrf       Year of birth of respondent
gender          byte    %10.0g     gender     Gender of respondent
sl_cs004d1      byte    %12.0g     dummi      Lived in hh when ten: biological mother
sl_cs004d2      byte    %12.0g     dummi      Lived in hh when ten: biological father
sl_cs007dno     byte    %12.0g     dummi      Features of accommodation when ten: none of these
sl_cs008_       byte    %58.0g     cs008      Number of books when ten
sl_rp002_       byte    %10.0g     yesno      Ever been married

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...