/*******************************************************************************
********************************************************************************

					INTRODUCTION TO STATA - FT 2021
						
							Matthias Krapf

			Copyright by Marcus Roller, University of Bern
			
********************************************************************************
********************************************************************************



CONTENT:
1. General Header
2. Import Data
3. Data Manipulation
4. Descriptive Statistics
5. Graphs
6. Regressions
7. General Ending




********************************************************************************
1. General Header
*******************************************************************************/


*Clear all memory
	clear all
	
*Set workspace to 1 Gigabyte (not necessary anymore)
	set mem 1g
	
*Turn more off - enables processing of do-file without stops
	set more off
	
*Set working directory
	cd "C:\Users\krapfm\Dropbox\Stata\"
	
*Close and create log-file 
	capture log close
	log using myfirstwageregression, text replace 

/*******************************************************************************
2. Import Data
*******************************************************************************/

*Import if data is csv-file(.csv)
	insheet using rawdata.csv, names clear
* or	import delimited rawdata.csv
	*you might have to set a different delimiter: insheet using rawdata.csv, delimiter(";") names clear
*Save imported data as stata-file(.dta)
	save statadata, replace
	
*Have a look at the data
	browse	
	describe
	
/*******************************************************************************
3. Data Manipulation
*******************************************************************************/
	

*Generate logwage (instead of "generate" it is enough to type "gen")
	generate logwage=ln(wage)
	
*Age
	gen age=year-birthyr
	
*Black-Dummy
	gen black=0
	replace black=1 if race==2

*Union dummy
	gen union=0
	replace union=1 if unionfee>0
	browse
	*INSPECT THAT where unionfee==. we now have union=1
		drop union
		gen union=0 if unionfee!=.
		replace union=1 if !missing(unionfee) & unionfee>0
	*both if conditions are equivalent!
	
*Industry wage average
	sort indcode
	by indcode: egen indmwage=mean(wage)

*Delete observations with very high wages
	drop if wage>50

*Delete variable indmwage
	drop birthyr


/*******************************************************************************
4. Descriptive Statistics
*******************************************************************************/
	
*Mean Wage
	sum wage

*Detailed summary
	sum wage, detail

*Correlation
	correlate wage grade age tenure union

*Frequencies Union Membership
	describe
	tabulate union
*Generate Labels for union
	label variable union "Union Membership"
	label define member 0 "No Member" 1 "Union Member"
	label values union member	
	
	describe union
	tabulate union
	
*Different means for union
	sort union
	by union: sum wage

*Test Difference
	ttest wage, by(union) unequal


/*******************************************************************************
5. Graphs
*******************************************************************************/

*Histograms
	histogram wage

*Scatter plots
	scatter logwage age



/*******************************************************************************
6. Regressions
*******************************************************************************/

*OLS-Regression
	gen agesq=age^2
	regress logwage age agesq exper black tenure union

*Robust Standard Errors
	regress logwage age agesq exper black tenure union, robust

*Testing
	test age=agesq=0
	

/*******************************************************************************
7. General Ending
*******************************************************************************/

*close running logfile
	clear
	log close
*Inspect logfile
	view myfirstwageregression.log
