Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to data. In addition, we want you to get comfortable with accessing data that could be useful for any future project.
Guidelines
- You can work by yourself or with groups of up to three.
- Submit your independent answers to gradescope (within Canvas).
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- Always work on a do-file that you can save.
- Your responses should be professionally formatted and written.
- Due date is Monday February 5th at 8am EDT.
Preamble
The current population survey sponsored jointly by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics (BLS), is the primary source of labor force statistics for the population of the United States. This survey is the main way we get statistics about employment, demographics, and other broad aspects of the population year by year and it is representative at the state level and federal level. In short, it’s a survey that policy people working in the U.S. should know well. You can read more about it here
One can download the data directly from the census website, if you are trying to improve your data skills you can start there. Integrated Public Use Microdata Series (IPUMS) is a project that was carried out by the Minnesota Population Center in collaboration with Unicon Research Corporation, and what they do is download a number of surveys from the government, and harmonizing across the years, and make the data much easier to access and download. Hence, generally, when people want to use surveys like CPS, they use the IPUMS version or IPUMS-CPS. This is also where we’ll start.
Exercises
- First task is to download a dataset from IPUMS-CPS. The task of downloading it is the thing I want y’all to practice and it may take some time to get it right (have to create an account, understand a bit the layout, downloading it once, maybe twice, etc). Once you download you may need to unzip the file. You should download a data set that contains the following variables:
- Years: 2010-2019, from “ASEC”. ASEC stands for Annual Social and Economic Supplements, this is a supplement from the CPS survey usually administered in March of each year.
- Notice that there are some “pre-selected” variables (i.e. variable that one will always have because they are pre-selected by IPUMS). In addition, to the pre-selected variables, pick the following variables: age, sex, race, region, and state (fips code), total personal income, and education (educational attainment recode).
- Once you have selected the variables and sample you then have to “submit” your extract. This lingo means the following: you submit a request to download this data, and you get an email later on, saying that your submission has been approved and you can download the data. This should take a couple of minutes. Once you receive the email, follow the links and you’ll be able to download the data. The page should look like the image below.
- When you click on downloading the data it will download a file that is in “DAT” format. This is a very general type of file, in the same way that the data can be in state format (DTA), excel (CVS), etc, DAT is a particular format. The challenge here is turning into a STATA dataset or R dataset.
- Fortunately, IPUMS does all this work for us. Notice that there will be a link under “Command Files” for each type of language you want to work with. For the case of STATA, if you click that option, either a do-file will download or a do-file will appear. Save that do-file. This do-file opens the data as DTA and then “labels” it and then you can see it in STATA and save it as whatever name you will like. Let’s go with “hw1.dta”.
- Notice that the do-file needs to be told where the “DAT” file is. I recommend creating an RMDA Folder, which contains a folder for each HW, and downloading all of these files under the “hw1” folder. Then in the IPUMS do-file you just download it, the only thing you have to add is:
- the file path of where the DAT file is
- adding a line to save the dataset as a STATA dataset and name it hw1
- Now the dataset is in your hands. We want to investigate the difference in income by race, and over time. A peer suggests to run the following regressions:
- Let’s first clean up the income variable: the variable should only have positive numbers and notice on what’s a missing value or NIU and what is not. Set negative values to 0. In the future, we’ll just say “Clean up the variable” and you should think about what that implies.
- Now with that cleaned income variable, run the regression as seen above (no need to clean up or do anything with the race variable). Answer the following questions:
- What type of time frame is the income variable? (i.e. monthly, quarterly, etc)
- What type of variable is the income variable?
- What type of variable is the race variable? (Using the lingo that we learned in class?)
- There are two main measurement issues with the set up of the regression above. What are those two issues?
- How would you change the regression above, in order to get at the main objective which is to know what is the difference in income across races, and across time. Notes: we are not interest in causality, just descriptives.
- Use the tools of regressions, means and/or ratios to assess the following question as it related to magnitudes. Use the information only from the dataset itself. Show your work in the code and write the conclusions given your results from the code.
- Has the income gap between men and women changed substantially or not between the years 2010 and 2019?
Notice there that rectype, year, serial and month, have been pre-selected. Notice also that in this screenshot I already pick my sample first and so only “ASEC year” appears on the right columns. The x indicates if that variable appears in that particular year.
Your data cart should look like this.
* NOTE: You need to set the Stata working directory to the path
* where the data file is located.
* ### this is my way to changing the file path, but you may have your own
* ### way.
rmda 2024 1
cd "$hw"
set more off
clear
quietly infix ///
int year 1-4 ///
long serial 5-9 ///
byte month 10-11 ///
double cpsid 12-25 ///
byte asecflag 26-26 ///
byte hflag 27-27 ///
double asecwth 28-38 ///
byte region 39-40 ///
byte statefip 41-42 ///
byte pernum 43-44 ///
double cpsidv 45-59 ///
double cpsidp 60-73 ///
double asecwt 74-84 ///
byte age 85-86 ///
byte sex 87-87 ///
int race 88-90 ///
int educ 91-93 ///
double inctot 94-102 ///
using `"cps_00043.dat"'
replace asecwth = asecwth / 10000
replace asecwt = asecwt / 10000
format cpsid %14.0f
format asecwth %11.4f
format cpsidv %15.0f
format cpsidp %14.0f
format asecwt %11.4f
format inctot %9.0f
label var year `"Survey year"'
label var serial `"Household serial number"'
label var month `"Month"'
label var cpsid `"CPSID, household record"'
label var asecflag `"Flag for ASEC"'
label var hflag `"Flag for the 3/8 file 2014"'
label var asecwth `"Annual Social and Economic Supplement Household weight"'
label var region `"Region and division"'
label var statefip `"State (FIPS code)"'
label var pernum `"Person number in sample unit"'
label var cpsidv `"Validated Longitudinal Identifier"'
label var cpsidp `"CPSID, person record"'
label var asecwt `"Annual Social and Economic Supplement Weight"'
label var age `"Age"'
label var sex `"Sex"'
label var race `"Race"'
label var educ `"Educational attainment recode"'
label var inctot `"Total personal income"'
label define month_lbl 01 `"January"'
label define month_lbl 02 `"February"', add
label define month_lbl 03 `"March"', add
label define month_lbl 04 `"April"', add
label define month_lbl 05 `"May"', add
label define month_lbl 06 `"June"', add
label define month_lbl 07 `"July"', add
label define month_lbl 08 `"August"', add
label define month_lbl 09 `"September"', add
label define month_lbl 10 `"October"', add
label define month_lbl 11 `"November"', add
label define month_lbl 12 `"December"', add
label values month month_lbl
label define asecflag_lbl 1 `"ASEC"'
label define asecflag_lbl 2 `"March Basic"', add
label values asecflag asecflag_lbl
label define hflag_lbl 0 `"5/8 file"'
label define hflag_lbl 1 `"3/8 file"', add
label values hflag hflag_lbl
label define region_lbl 11 `"New England Division"'
label define region_lbl 12 `"Middle Atlantic Division"', add
label define region_lbl 21 `"East North Central Division"', add
label define region_lbl 22 `"West North Central Division"', add
label define region_lbl 31 `"South Atlantic Division"', add
label define region_lbl 32 `"East South Central Division"', add
label define region_lbl 33 `"West South Central Division"', add
label define region_lbl 41 `"Mountain Division"', add
label define region_lbl 42 `"Pacific Division"', add
label define region_lbl 97 `"State not identified"', add
label values region region_lbl
label define statefip_lbl 01 `"Alabama"'
label define statefip_lbl 02 `"Alaska"', add
label define statefip_lbl 04 `"Arizona"', add
label define statefip_lbl 05 `"Arkansas"', add
label define statefip_lbl 06 `"California"', add
label define statefip_lbl 08 `"Colorado"', add
label define statefip_lbl 09 `"Connecticut"', add
label define statefip_lbl 10 `"Delaware"', add
label define statefip_lbl 11 `"District of Columbia"', add
label define statefip_lbl 12 `"Florida"', add
label define statefip_lbl 13 `"Georgia"', add
label define statefip_lbl 15 `"Hawaii"', add
label define statefip_lbl 16 `"Idaho"', add
label define statefip_lbl 17 `"Illinois"', add
label define statefip_lbl 18 `"Indiana"', add
label define statefip_lbl 19 `"Iowa"', add
label define statefip_lbl 20 `"Kansas"', add
label define statefip_lbl 21 `"Kentucky"', add
label define statefip_lbl 22 `"Louisiana"', add
label define statefip_lbl 23 `"Maine"', add
label define statefip_lbl 24 `"Maryland"', add
label define statefip_lbl 25 `"Massachusetts"', add
label define statefip_lbl 26 `"Michigan"', add
label define statefip_lbl 27 `"Minnesota"', add
label define statefip_lbl 28 `"Mississippi"', add
label define statefip_lbl 29 `"Missouri"', add
label define statefip_lbl 30 `"Montana"', add
label define statefip_lbl 31 `"Nebraska"', add
label define statefip_lbl 32 `"Nevada"', add
label define statefip_lbl 33 `"New Hampshire"', add
label define statefip_lbl 34 `"New Jersey"', add
label define statefip_lbl 35 `"New Mexico"', add
label define statefip_lbl 36 `"New York"', add
label define statefip_lbl 37 `"North Carolina"', add
label define statefip_lbl 38 `"North Dakota"', add
label define statefip_lbl 39 `"Ohio"', add
label define statefip_lbl 40 `"Oklahoma"', add
label define statefip_lbl 41 `"Oregon"', add
label define statefip_lbl 42 `"Pennsylvania"', add
label define statefip_lbl 44 `"Rhode Island"', add
label define statefip_lbl 45 `"South Carolina"', add
label define statefip_lbl 46 `"South Dakota"', add
label define statefip_lbl 47 `"Tennessee"', add
label define statefip_lbl 48 `"Texas"', add
label define statefip_lbl 49 `"Utah"', add
label define statefip_lbl 50 `"Vermont"', add
label define statefip_lbl 51 `"Virginia"', add
label define statefip_lbl 53 `"Washington"', add
label define statefip_lbl 54 `"West Virginia"', add
label define statefip_lbl 55 `"Wisconsin"', add
label define statefip_lbl 56 `"Wyoming"', add
label define statefip_lbl 61 `"Maine-New Hampshire-Vermont"', add
label define statefip_lbl 65 `"Montana-Idaho-Wyoming"', add
label define statefip_lbl 68 `"Alaska-Hawaii"', add
label define statefip_lbl 69 `"Nebraska-North Dakota-South Dakota"', add
label define statefip_lbl 70 `"Maine-Massachusetts-New Hampshire-Rhode Island-Vermont"', add
label define statefip_lbl 71 `"Michigan-Wisconsin"', add
label define statefip_lbl 72 `"Minnesota-Iowa"', add
label define statefip_lbl 73 `"Nebraska-North Dakota-South Dakota-Kansas"', add
label define statefip_lbl 74 `"Delaware-Virginia"', add
label define statefip_lbl 75 `"North Carolina-South Carolina"', add
label define statefip_lbl 76 `"Alabama-Mississippi"', add
label define statefip_lbl 77 `"Arkansas-Oklahoma"', add
label define statefip_lbl 78 `"Arizona-New Mexico-Colorado"', add
label define statefip_lbl 79 `"Idaho-Wyoming-Utah-Montana-Nevada"', add
label define statefip_lbl 80 `"Alaska-Washington-Hawaii"', add
label define statefip_lbl 81 `"New Hampshire-Maine-Vermont-Rhode Island"', add
label define statefip_lbl 83 `"South Carolina-Georgia"', add
label define statefip_lbl 84 `"Kentucky-Tennessee"', add
label define statefip_lbl 85 `"Arkansas-Louisiana-Oklahoma"', add
label define statefip_lbl 87 `"Iowa-N Dakota-S Dakota-Nebraska-Kansas-Minnesota-Missouri"', add
label define statefip_lbl 88 `"Washington-Oregon-Alaska-Hawaii"', add
label define statefip_lbl 89 `"Montana-Wyoming-Colorado-New Mexico-Utah-Nevada-Arizona"', add
label define statefip_lbl 90 `"Delaware-Maryland-Virginia-West Virginia"', add
label define statefip_lbl 99 `"State not identified"', add
label values statefip statefip_lbl
label define age_lbl 00 `"Under 1 year"'
label define age_lbl 01 `"1"', add
label define age_lbl 02 `"2"', add
label define age_lbl 03 `"3"', add
label define age_lbl 04 `"4"', add
label define age_lbl 05 `"5"', add
label define age_lbl 06 `"6"', add
label define age_lbl 07 `"7"', add
label define age_lbl 08 `"8"', add
label define age_lbl 09 `"9"', add
label define age_lbl 10 `"10"', add
label define age_lbl 11 `"11"', add
label define age_lbl 12 `"12"', add
label define age_lbl 13 `"13"', add
label define age_lbl 14 `"14"', add
label define age_lbl 15 `"15"', add
label define age_lbl 16 `"16"', add
label define age_lbl 17 `"17"', add
label define age_lbl 18 `"18"', add
label define age_lbl 19 `"19"', add
label define age_lbl 20 `"20"', add
label define age_lbl 21 `"21"', add
label define age_lbl 22 `"22"', add
label define age_lbl 23 `"23"', add
label define age_lbl 24 `"24"', add
label define age_lbl 25 `"25"', add
label define age_lbl 26 `"26"', add
label define age_lbl 27 `"27"', add
label define age_lbl 28 `"28"', add
label define age_lbl 29 `"29"', add
label define age_lbl 30 `"30"', add
label define age_lbl 31 `"31"', add
label define age_lbl 32 `"32"', add
label define age_lbl 33 `"33"', add
label define age_lbl 34 `"34"', add
label define age_lbl 35 `"35"', add
label define age_lbl 36 `"36"', add
label define age_lbl 37 `"37"', add
label define age_lbl 38 `"38"', add
label define age_lbl 39 `"39"', add
label define age_lbl 40 `"40"', add
label define age_lbl 41 `"41"', add
label define age_lbl 42 `"42"', add
label define age_lbl 43 `"43"', add
label define age_lbl 44 `"44"', add
label define age_lbl 45 `"45"', add
label define age_lbl 46 `"46"', add
label define age_lbl 47 `"47"', add
label define age_lbl 48 `"48"', add
label define age_lbl 49 `"49"', add
label define age_lbl 50 `"50"', add
label define age_lbl 51 `"51"', add
label define age_lbl 52 `"52"', add
label define age_lbl 53 `"53"', add
label define age_lbl 54 `"54"', add
label define age_lbl 55 `"55"', add
label define age_lbl 56 `"56"', add
label define age_lbl 57 `"57"', add
label define age_lbl 58 `"58"', add
label define age_lbl 59 `"59"', add
label define age_lbl 60 `"60"', add
label define age_lbl 61 `"61"', add
label define age_lbl 62 `"62"', add
label define age_lbl 63 `"63"', add
label define age_lbl 64 `"64"', add
label define age_lbl 65 `"65"', add
label define age_lbl 66 `"66"', add
label define age_lbl 67 `"67"', add
label define age_lbl 68 `"68"', add
label define age_lbl 69 `"69"', add
label define age_lbl 70 `"70"', add
label define age_lbl 71 `"71"', add
label define age_lbl 72 `"72"', add
label define age_lbl 73 `"73"', add
label define age_lbl 74 `"74"', add
label define age_lbl 75 `"75"', add
label define age_lbl 76 `"76"', add
label define age_lbl 77 `"77"', add
label define age_lbl 78 `"78"', add
label define age_lbl 79 `"79"', add
label define age_lbl 80 `"80"', add
label define age_lbl 81 `"81"', add
label define age_lbl 82 `"82"', add
label define age_lbl 83 `"83"', add
label define age_lbl 84 `"84"', add
label define age_lbl 85 `"85"', add
label define age_lbl 86 `"86"', add
label define age_lbl 87 `"87"', add
label define age_lbl 88 `"88"', add
label define age_lbl 89 `"89"', add
label define age_lbl 90 `"90 (90+, 1988-2002)"', add
label define age_lbl 91 `"91"', add
label define age_lbl 92 `"92"', add
label define age_lbl 93 `"93"', add
label define age_lbl 94 `"94"', add
label define age_lbl 95 `"95"', add
label define age_lbl 96 `"96"', add
label define age_lbl 97 `"97"', add
label define age_lbl 98 `"98"', add
label define age_lbl 99 `"99+"', add
label values age age_lbl
label define sex_lbl 1 `"Male"'
label define sex_lbl 2 `"Female"', add
label define sex_lbl 9 `"NIU"', add
label values sex sex_lbl
label define race_lbl 100 `"White"'
label define race_lbl 200 `"Black"', add
label define race_lbl 300 `"American Indian/Aleut/Eskimo"', add
label define race_lbl 650 `"Asian or Pacific Islander"', add
label define race_lbl 651 `"Asian only"', add
label define race_lbl 652 `"Hawaiian/Pacific Islander only"', add
label define race_lbl 700 `"Other (single) race, n.e.c."', add
label define race_lbl 801 `"White-Black"', add
label define race_lbl 802 `"White-American Indian"', add
label define race_lbl 803 `"White-Asian"', add
label define race_lbl 804 `"White-Hawaiian/Pacific Islander"', add
label define race_lbl 805 `"Black-American Indian"', add
label define race_lbl 806 `"Black-Asian"', add
label define race_lbl 807 `"Black-Hawaiian/Pacific Islander"', add
label define race_lbl 808 `"American Indian-Asian"', add
label define race_lbl 809 `"Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 810 `"White-Black-American Indian"', add
label define race_lbl 811 `"White-Black-Asian"', add
label define race_lbl 812 `"White-American Indian-Asian"', add
label define race_lbl 813 `"White-Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 814 `"White-Black-American Indian-Asian"', add
label define race_lbl 815 `"American Indian-Hawaiian/Pacific Islander"', add
label define race_lbl 816 `"White-Black--Hawaiian/Pacific Islander"', add
label define race_lbl 817 `"White-American Indian-Hawaiian/Pacific Islander"', add
label define race_lbl 818 `"Black-American Indian-Asian"', add
label define race_lbl 819 `"White-American Indian-Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 820 `"Two or three races, unspecified"', add
label define race_lbl 830 `"Four or five races, unspecified"', add
label define race_lbl 999 `"Blank"', add
label values race race_lbl
label define educ_lbl 000 `"NIU or no schooling"'
label define educ_lbl 001 `"NIU or blank"', add
label define educ_lbl 002 `"None or preschool"', add
label define educ_lbl 010 `"Grades 1, 2, 3, or 4"', add
label define educ_lbl 011 `"Grade 1"', add
label define educ_lbl 012 `"Grade 2"', add
label define educ_lbl 013 `"Grade 3"', add
label define educ_lbl 014 `"Grade 4"', add
label define educ_lbl 020 `"Grades 5 or 6"', add
label define educ_lbl 021 `"Grade 5"', add
label define educ_lbl 022 `"Grade 6"', add
label define educ_lbl 030 `"Grades 7 or 8"', add
label define educ_lbl 031 `"Grade 7"', add
label define educ_lbl 032 `"Grade 8"', add
label define educ_lbl 040 `"Grade 9"', add
label define educ_lbl 050 `"Grade 10"', add
label define educ_lbl 060 `"Grade 11"', add
label define educ_lbl 070 `"Grade 12"', add
label define educ_lbl 071 `"12th grade, no diploma"', add
label define educ_lbl 072 `"12th grade, diploma unclear"', add
label define educ_lbl 073 `"High school diploma or equivalent"', add
label define educ_lbl 080 `"1 year of college"', add
label define educ_lbl 081 `"Some college but no degree"', add
label define educ_lbl 090 `"2 years of college"', add
label define educ_lbl 091 `"Associate's degree, occupational/vocational program"', add
label define educ_lbl 092 `"Associate's degree, academic program"', add
label define educ_lbl 100 `"3 years of college"', add
label define educ_lbl 110 `"4 years of college"', add
label define educ_lbl 111 `"Bachelor's degree"', add
label define educ_lbl 120 `"5+ years of college"', add
label define educ_lbl 121 `"5 years of college"', add
label define educ_lbl 122 `"6+ years of college"', add
label define educ_lbl 123 `"Master's degree"', add
label define educ_lbl 124 `"Professional school degree"', add
label define educ_lbl 125 `"Doctorate degree"', add
label define educ_lbl 999 `"Missing/Unknown"', add
label values educ educ_lbl
*"
save "hw1.dta", replace
replace inctot=. if inctot==999999999
replace inctot=. if inctot==999999998
* But this is only for 60s data, so don't need it
replace inctot=0 if inctot<0
reg inctot race if year==2010
reg inctot race if year==2019