--- title: "Introduction to Stata workshop notes" always_allow_html: yes output: html_document: highlight: tango toc: true toc_float: collapsed: true ---

Introduction

Workshop descripton

Why stata?

Stata interface

Do-files

Stata help

To get help in Stata type help followed by topic or command, e.g., help codebook.

General Stata command syntax

Most Stata commands follow the same basic syntax: Command varlist, options.

Commenting and formatting syntax

Start with comment describing your Do-file and use comments throughout

Let's get started

Getting data into Stata

Data file commands

A note about path names

Where's my data?

What if my data is not a Stata file?

What if my data is from another statistical software program?

Exercise 1: Importing data

  1. Save any work you've done so far. Close down Stata and open a new session.
  2. Start Stata and open your .do file.
  3. Change directory (cd) to the dataSets folder.
  4. Try opening the following files:
    • A comma separated value file: gss.csv
    • An Excel file: gss.xlsx

Statistics and graphs

Frequently used commands

Basic graphing commands

The "by" command

Exercise 2: Descriptive statistics

  1. Use the dataset, gss.dta
  2. Examine a few selected variables using the describe, sum and codebook commands
  3. Tabulate the variable, "marital," with and without labels
  4. Summarize the variable, "income" by marital status
  5. Cross-tabulate marital with region
  6. Summarize the variable happy for married individuals only

Basic data management

Labels

Variable and value labels

Exercise 3: Variable labels and value labels

  1. Open the data set gss.csv
  2. Familiarize yourself with the data using describe, sum, etc.
  3. Rename and label variables using the following codebook:
var rename to label with
v1 marital marital status
v2 age age of respondent
v3 educ education
v4 sex respondent's sex
v5 inc household income
v6 happy general happiness
v7 region region of interview
  1. Add value labels to your "marital" variable using this codebook:
value label
1 "married"
2 "widowed"
3 "divorced"
4 "separated"
5 "never married"

Working on subsets

Operator Meaning
== equal to
!= not equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
& and
or

Generating and replacing variables

Exercise 4: Manipulating variables

  1. Use the dataset, gss.dta
  2. Generate a new variable, age2 equal to age squared
  3. Generate a new "high income" variable that will take on a value of "1" if a person has an income value greater than "15" and "0" otherwise
  4. Generate a new divorced/separated dummy variable that will take on a value of "1" if a person is either divorced or separated and "0" otherwise