Stata Merge Two Variables

broken image


  1. Stata Merge Specific Variables
  2. Merge Two Variables Together In Stata
  3. Stata Merge Two Variables Into One

I work with messy administrative data and very often have to merge datasets by people's or cities' names. String variables often come with typos, different spelling, etc. Think about languages that use diacritical marks and you have a complete mess.

Stata has a nice user written command called reclink built for this purpose. It uses record linkage methods to match identifiers in the two datasets. A nice feature is that you can use more than one identifier variable and give weights to each of them. For example, in the code below I want to merge the datasets based on make and foreign, but I know the mismatches are in the variable make, so I give a large weight (10) to it and only 2 to foreign.

Stata Merge Two Variables
Stata merge two variables in r

Stata Merge Specific Variables

To use reclink you also have to create ids in each dataset and feed them into the options idm() and idu(). This will allow the code to display the matches in terms of ids as well. Additionally, it creates a variable that contains the matching scores for the merged observations. The scores are in the 0-1 range and one means exact matches.

Merge Two Variables Together In Stata

Differently from Stata's merge, the resulting dataset will include all the variables from both datasets. The code adds a capital 'U' to the beginning of each of the Using dataset variables. This is pretty handy because you can eyeball the matches straight away and check whether the weights you defined have done a good job. It's easy enough to just drop all these afterwards. See the example below in which I have used Stata's auto.dta. Check here for more details.

Stata Merge Two Variables Into One

How can I merge multiple files in Stata? Stata FAQ A simple example. A good first step is to describe our data. We can do this without actually opening file (this can be. Dropping unwanted variables. It is not uncommon to find that a large dataset contains many variables you are not going. Note Stata creates a merge variable in the merged results, which indicates how the merge was done for each observation. The value of merge is 1 if the observation comes form file1 (master file) only, 2 if the observation comes from file2 (using file) only, 3 if the observation comes from both of the two files – in other words, 3 means the observation is matched. This module shows examples of combining twoway scatterplots. This is illustrated by showing the command and the resulting graph. This includes hotlinks to the Stata Graphics Manual available over the web and from within Stata by typing help graph. Dec 14, 2016 Stata merge with multiple match variables. Especially since the merging dataset has three possible variables that the j identification can occur in.





broken image