
To get help in Stata type help followed by topic or command, e.g., help codebook.
Most Stata commands follow the same basic syntax: Command varlist, options.
Start with comment describing your Do-file and use comments throughout
* Use '*' to comment a line and '//' for in-line comments
* Make Stata say hello:
disp "Hello " "World!" // 'disp' is short for 'display'
Hello World!
/// to break varlists over multiple lines:disp "Hello" ///
" World!"
Hello World!
* change directory
// cd "C://Users/dataclass/Desktop/StataIntro"
cd dataSets
// open the gss.dta data set
use gss.dta, clear
// save data file:
save newgss.dta, replace // "replace" option means OK to overwrite existing file
/home/izahn/Documents/Work/Classes/IQSS_Stats_Workshops/Stata/StataIntro/dataSets file newgss.dta saved
* import data from a .csv file
import delimited gss.csv, clear
* save data to a .csv file
export delimited gss_new.csv, replace
Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=gasp -Dswing.aatext=true -Dsun.java2d.opengl=true (7 vars, 451 obs) file gss_new.csv saved
* import/export SAS xport files
clear
import sasxport gss.xpt
export sasxport gss_new, replace
file gss_new.xpt saved
.do file. cd) to the dataSets folder.use gss.dta, clear
sum educ // statistical summary of education
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 217 13.52995 3.0687 1 20
codebook region // information about how region is coded
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
region (unlabeled)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str5)
unique values: 4 missing "": 0/217
tabulation: Freq. Value
54 "east"
48 "north"
48 "south"
67 "west"
tab sex // numbers of male and female participants
respondents |
sex | Freq. Percent Cum.
------------+-----------------------------------
male | 114 52.53 52.53
female | 103 47.47 100.00
------------+-----------------------------------
Total | 217 100.00
/* Histograms */
hist educ
(bin=14, start=1, width=1.3571429)
// histogram with normal curve; see 'help hist' for other options
hist age, normal
(bin=14, start=18, width=4.2142857)
/* scatterplots */
twoway (scatter educ age)
graph matrix educ age inc
* By Processing
bysort sex: tab happy // tabulate happy separately for men and women
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> sex = male
general |
happiness | Freq. Percent Cum.
--------------+-----------------------------------
very happy | 32 28.07 28.07
pretty happy | 68 59.65 87.72
not too happy | 14 12.28 100.00
--------------+-----------------------------------
Total | 114 100.00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> sex = female
general |
happiness | Freq. Percent Cum.
--------------+-----------------------------------
very happy | 33 32.04 32.04
pretty happy | 61 59.22 91.26
not too happy | 9 8.74 100.00
--------------+-----------------------------------
Total | 103 100.00
bysort marital: sum educ // summarize eudcation by marital status
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> marital = married
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 103 13.65049 3.374381 1 20
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> marital = widowed
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 6 12.33333 1.36626 11 15
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> marital = divorced
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 39 13.46154 2.501012 6 19
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> marital = separate
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 9 12.11111 2.803767 6 14
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> marital = never ma
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
educ | 60 13.7 3.004516 6 20
happy for married individuals only/* Labelling and renaming */
// Label variable inc "household income"
label var inc "household income"
// change the name 'educ' to 'education'
rename educ education
// you can search names and labels with 'lookfor'
lookfor household
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
inc byte %8.0g rincom06 household income
/*define a value label for sex */
label define mySexLabel 1 "Male" 2 "Female"
/* assign our label set to the sex variable*/
label val sex mySexLabel
| var | rename to | label with |
|---|---|---|
| v1 | marital | marital status |
| v2 | age | age of respondent |
| v3 | educ | education |
| v4 | sex | respondent's sex |
| v5 | inc | household income |
| v6 | happy | general happiness |
| v7 | region | region of interview |
| value | label |
|---|---|
| 1 | "married" |
| 2 | "widowed" |
| 3 | "divorced" |
| 4 | "separated" |
| 5 | "never married" |
| Operator | Meaning |
|---|---|
| == | equal to |
| != | not equal to |
| > | greater than |
| >= | greater than or equal to |
| < | less than |
| <= | less than or equal to |
| & | and |
| or |
// create a new variable named mc_inc
// equal to inc minus the mean of inc
gen mc_inc = inc - 15.37
/* the 'generate and replace' strategy */
// generate a column of missings
gen age_wealth = .
// Next, start adding your qualifications
replace age_wealth=1 if age<30 & inc < 10
replace age_wealth=2 if age<30 & inc > 10
replace age_wealth=3 if age>30 & inc < 10
replace age_wealth=4 if age>30 & inc > 10
// conditions can also be combined with "or"
gen young=0
replace young=1 if age_wealth==1 | age_wealth==2
(217 missing values generated) (19 real changes made) (26 real changes made) (22 real changes made) (134 real changes made) (45 real changes made)