Introduction Course Stata Axe Santé HES-SO Valais-Wallis

Today is 19 Dec 2021

This is the version of 20. December 2021

Preface

This introductory course was financed by the Axe Santé / Health Technology Innovation Center of the HES-SO Valais-Wallis.

Just a few words on how the webpages of this course were made. I used dyndoc see for example this youtube video here.

Here is another nice website for a tutorial on [how to use dyndoc.](https://www.techtips.surveydesign.com.au/post/how-to-create-an-html-webpage-in-stata-using-markdown)

In a nutshell: I created a txt file where i can mix comments, text and stata code together. I used some twists: For the comments, I had problems with the "[comment]: <> (this is a comment )" approach, as some comment lines were still shown in the html document.

Therefore I used the solution proposed here.

I added some jquery to add a simple table of content see here and css see here

In the end, you put all in one folder, set the working directory of Stata to this folder and then type dyndoc name_of_the_textfile.txt in the command line of Stata. If you want to replace the created html file, add ,replace to the command.

Of course, more time would buy me the opportunity to make this better looking…

Philosophy of this Course

This is not a course on statistics, just an introduction to Stata. We use Stata 17 for this course - but most things will work in older versions too. By the way, its Stata and not - as a lot of people write in their publications - STATA. So behave and don’t shout see here for the proof There is a face to face course with this content, however, the course could also be done alone by just following these pages. As almost always, there will be some errors and I would be happy for a comment.

Stata has an excellent collection of books, check out their bookstore

Different Flavours of Stata

There are different flavours of stata and before you buy a licence, you should check this website to decide on which version you need

First Tour of Stata after Installation

By the way, Stata has its own YouTube channel.

Stata has several windows that you will use. You can move this windows and rearrange them differently, if you like see a short video here. By the way, if you rearranged the windows, you might also want to change the colours: edit–>preferences–>genereal preferences–>Results and Viewer to e.g. classic see video here

Settings

Some settings can easily be found by clicking the edit–>preferences menues; however, would be nice to do this programmatically. type query in the command line (and hit enter) You’ll see a list of settings you can change.

Three Ways to Interact with Stata

  1. The command window: here you write commands and send them by pressing enter to Stata
  2. The point and click menue "Statistics" or "Graphics"
  3. The do-files via the do editor window

We will not talk that much about the point and click menu. During the first lessons, we will mainly use the command windows. Of course, everatime you want something to be stored and be reproducable, you would want to create a do-file.

Please see this video for a short illustration of the three methods

Stata Commands and Ado Commands

Most commands you will use are built in already in Stata. From time to time, you might want to do something that is either not exactly possible or you do not like the way Stata does it, then you can download user written commands, so called ado-files. If you know the name of an ado-file, you can download it from the command line with ssc install name_of_ado . Or you could type findit name_of_ado.

If you would like to watch a video that show how you could produce user written commands, you find a video here.

Updating Stata

Minor updates, i.e. the ones not requiring a new licence, can be installed by typing update all in the commant line; or you type update query. If you want to update ado files, you can write ado update in the command line. Stata will show a list with updatable ado-files. You can then update them with ado update, update

Looking for Help

If you know the name of a command for which you might want some help, just type help name_of_command, so for example: help table In the helpfile, you will also see a link to the PDF documentation - which is just great with many examples. Most or all examples can be run with datasets that are provided by Stata. You can load the datasets with sysuse (for example sysuse auto). Some examples use webuse to download a dataset fromm a repository. If you type help resources in the commmand line, a list of very useful resources pops up. Stata has a very active user community and online you also find very useful help.

Add some Data to Play Around

To play around with some commands, we will first load the data: copy the following into your command line:

You can have a look at the data by typing browse - or just br in the command line. If you would type edit or ed in the command line, then you could even edit the data - what we normally don’t want. Most of the data have some notes that comes with. So type notes into the command line and see what happens. We could also add a comment to the dataset. Try this out and write the following in the command line:

*notes: We downloaded this dataset with the following command use http://people.ucalgary.ca/~patten/Datasets/deidentified_dataset.dta into Stata. The paper can be found here: https://link.springer.com/article/10.1186/s12875-018-0862-y *

Then type again notes into the command line. Our new note will then appear. You can also add notes to a variable (what the authors already did) by writing *notes variablename: Age in years*. You can add more than one Note to each variable. We can use the command *describe* to get an overview of the data. You might also want to [look here for other useful commands to explore a dataset](https://stats.idre.ucla.edu/stata/seminars/notes/stata-class-notesexploring-data/) For example the command *codebook* gives a codebook - in the example below just for two variables. The command labelbook will present all the labels stored with the dataset. In the example below just for the labels named sex and SF369C

. use http://people.ucalgary.ca/~patten/Datasets/deidentified_dataset.dta

. describe 

Contains data from http://people.ucalgary.ca/~patten/Datasets/deidentified_datas
> et.dta
 Observations:           516                  
    Variables:           133                  13 Oct 2018 14:07
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
idno            int     %8.0g                 
PRACTICE        float   %9.0g                 
collectionpoint byte    %8.0g      COLLECTI   Data collection point
ageR            float   %9.0g                 Age Group
CSRI2           byte    %8.0g      sex        What is your gender?
CSRI3           byte    %8.0g      CSRI3      What is your marital status?
CSRI4           byte    %8.0g      CSRI4      What is the highest degree or
                                                level of school you have
                                                completed?
CSRI5           byte    %8.0g      CSRI5      What is your total household
                                                income?
CSRI6           byte    %8.0g      CSRI6      What is your mother tongue?
CSRI7           byte    %8.0g      CSRI7      Country of Birth?
CSRI8           byte    %8.0g      CSRI8      Who do you usually live with?
CSRI9           byte    %8.0g      CSRI9      Current employment status
TIME            float   %9.0g                 
PHQ91           byte    %8.0g      PHQ91      Little interest or pleasure in
                                                doing things
PHQ92           byte    %8.0g      PHQ92      Feeling down, depressed, or
                                                hopeless
PHQ93           byte    %8.0g      PHQ93      Trouble falling or staying asleep,
                                                or sleeping too much
PHQ94           byte    %8.0g      PHQ94      Feeling tired or having little
                                                energy
PHQ95           byte    %8.0g      PHQ95      Poor appetite or overeating
PHQ96           byte    %8.0g      PHQ96    * Feeling bad about yourself or that
                                                you are a failure or have let
                                                yourself or you
PHQ97           byte    %8.0g      PHQ97    * Trouble concentrating on things,
                                                such as reading the newspaper or
                                                watching telev
PHQ98           byte    %8.0g      PHQ98    * Moving or speaking so slowly that
                                                other poeple could have noticed?
                                                Or the opposi
PHQ99           byte    %8.0g      PHQ99      Thoughts that you would be better
                                                off dead or of hurting yourself
                                                in some way
PHQ_TOT         byte    %8.0g                 
LEAPS1          byte    %8.0g      LEAPS1     What kind of paid work do you do
LEAPS2          byte    %8.0g      LEAPS2     hours scheduled or expected to
                                                work
LEAPS3          byte    %8.0g      LEAPS3     hours of work missed because of
                                                the way you were feeling
LEAPS4A         byte    %8.0g      LEAPS4A    low energy or motivation
LEAPS4B         byte    %8.0g      LEAPS4B    Poor concentration or memory
LEAPS4C         byte    %8.0g      LEAPS4C    Anxiety or irritability
LEAPS4D         byte    %8.0g      LEAPS4D    Getting less work done
LEAPS4E         byte    %8.0g      LEAPS4E    Doing poor quality work
LEAPS4F         byte    %8.0g      LEAPS4F    Making more mistakes
LEAPS4G         byte    %8.0g      LEAPS4G    Having trouble getting along with
                                                people, or avioding them
LEAPS_M         double  %12.0g                
LEAPS_tot       double  %12.0g                
SDS1            byte    %8.0g      SDS1       The symptoms have disrupted your
                                                work/school work:
SDS2            byte    %8.0g      SDS2       The symptoms have disrupted your
                                                social life/leisure activities:
SDS3            byte    %8.0g      SDS3       The symptoms have disrupted your
                                                family life/home
                                                responsibilities:
SDS_M           double  %12.0g                
SDSTOT_fixed    float   %9.0g                 
SDS4            byte    %8.0g      SDS4       days last week lost
SDS5            byte    %8.0g      SDS5       days last week unproductive
SF361           byte    %8.0g      SF361      In general, would you say your
                                                health is
SF362           byte    %8.0g      SF362      Compared to one year ago, how
                                                would you rate your health in
                                                general now?
SF363A          byte    %8.0g      SF363A   * The following questions are about
                                                activities you might do during a
                                                typical day.
SF363B          byte    %8.0g      SF363B   * Moderate activities, such as
                                                moving a table, pushing a vacuum
                                                cleaner, bowling,
SF363C          byte    %8.0g      SF363C     Lifting or carrying groceries
SF363D          byte    %8.0g      SF363D     Climbing several flights of stairs
SF363E          byte    %8.0g      SF363E     Climbing one flight of stairs
SF363F          byte    %8.0g      SF363F     Bending, kneeling, or stopping
SF363G          byte    %8.0g      SF363G     Walking more than a mile
SF363H          byte    %8.0g      SF363H     Walking several hundred yards
SF363I          byte    %8.0g      SF363I     Walking one hundred yards
SF363J          byte    %8.0g      SF363J     Bathing or dressing yourself
SF364A          byte    %8.0g      SF364A   * During the past 4 weeks, how much
                                                of the time have you had any of
                                                the following
SF364B          byte    %8.0g      SF364B     Accomplished less than you would
                                                like
SF364C          byte    %8.0g      SF364C     Were limited in the kind of work
                                                or other activities
SF364D          byte    %8.0g      SF364D   * Had difficulty performing the work
                                                or other activities (for
                                                example, it took ext
SF365A          byte    %8.0g      SF365A   * During the past 4 weeks, how much
                                                of the time have you had any of
                                                the following
SF365B          byte    %8.0g      SF365B     Accomplished less than you would
                                                like
SF365C          byte    %8.0g      SF365C     Did work or other activities less
                                                carefully than usual
SF366           byte    %8.0g      SF366    * During the past 4 weeks, to what
                                                extent has your physical health
                                                or emotional pr
SF367           byte    %8.0g      SF367      How much bodily pain have you had
                                                during the past 4 weeks?
SF368           byte    %8.0g      SF368    * During the past 4 weeks, how much
                                                did pain interfere with your
                                                normal work (incl
SF369A          byte    %8.0g      SF369A   * These questions are about how you
                                                feel and how things have been
                                                with you during
SF369B          byte    %8.0g      SF369B     Have you been very nervous?
SF369C          byte    %8.0g      SF369C     Have you felt so down in the dumps
                                                that nothing could cheer you up?
SF369D          byte    %8.0g      SF369D     Have you felt calm and peaceful?
SF369E          byte    %8.0g      SF369E     Did you have a lot of energy?
SF369F          byte    %8.0g      SF369F     Have you felt downhearted and
                                                depressed?
SF369G          byte    %8.0g      SF369G     Did you feel worn out?
SF369H          byte    %8.0g      SF369H     Have you been happy?
SF369I          byte    %8.0g      SF369I     Did you feel tired?
SF3610          byte    %8.0g      SF3610   * During the past 4 weeks, how much
                                                of the time has your physical
                                                health or emotio
SF3611A         byte    %8.0g      SF3611A  * How true or false is each of the
                                                following statements for you? I
                                                seem to get sic
SF3611B         byte    %8.0g      SF3611B    I am as healthy as anybody I know
SF3611C         byte    %8.0g      SF3611C    I expect my health to get worse
SF3611D         byte    %8.0g      SF3611D    My health is excellent
SF36_PF         float   %9.0g                 
SF36_RP         float   %9.0g                 
SF36_RE         float   %9.0g                 
SF36_VT         float   %9.0g                 
SF36_MH         float   %9.0g                 
SF36_SF         float   %9.0g                 
SF36_BP         float   %9.0g                 
SF36_GH         float   %9.0g                 
SF36_TOT        float   %9.0g                 
CSIa            byte    %23.0g     CSI1       The services I get here are a big
                                                help to me
CSIb            byte    %23.0g     LABK       People here really seem to care
                                                about me
CSIc            byte    %23.0g     LABK       I would come back here if I need
                                                help again
CSI4            byte    %23.0g     LABK       I feel that no one here really
                                                listens to me
CSI4_R          float   %9.0g                 
CSIe            byte    %23.0g     LABK       People here treat me like a
                                                person, not like a number
CSIf            byte    %23.0g     LABK       I have learned a lot here about
                                                how to deal with my problems
CSI7            byte    %23.0g     LABK       People here want to do things
                                                their way, instead of helping me
                                                find my way
CSI7_R          float   %9.0g                 
CSIh            byte    %23.0g     LABK       I would recommend this place to
                                                people I care about
CSIi            byte    %23.0g     LABK       People here really know what they
                                                are doing
CSIj            byte    %23.0g     LABK       I get the kind of help here that I
                                                really need
CSIk            byte    %23.0g     LABK       People here accept me for who I am
CSIl            byte    %23.0g     LABK       I feel much better now than when I
                                                first came here
CSIm            byte    %23.0g     LABK       I thought no one could help me
                                                until I came here
CSIn            byte    %23.0g     LABK       The help I get here is really
                                                worth what it costs
CSIo            byte    %23.0g     LABK       People here put my needs ahead of
                                                their needs
CSI16           byte    %23.0g     LABK       People here put me down when I
                                                disagree with them
CSI16_R         float   %9.0g                 
CSI1q           byte    %23.0g     LABK       The biggest help I get here is
                                                learning how to help myself
CSI18           byte    %23.0g     LABK       People here are just trying to get
                                                rid of me
CSI18_R         float   %9.0g                 
CSIs            byte    %23.0g     LABK       People who know me say this place
                                                has made a positive change in me
CSIt            byte    %23.0g     LABK       People here have shown me how to
                                                get help from other places
CSIu            byte    %23.0g     LABK       People here seem to understand how
                                                I feel
CSI22           byte    %23.0g     LABK       People here are only concerned
                                                about getting paid
CSI22_R         float   %9.0g                 
CSIw            byte    %23.0g     LABK       I feel I can really talk to people
                                                here
CSIx            byte    %23.0g     LABK       The help I get here is better than
                                                I expected
CSIy            byte    %8.0g      CSI25      I look forward to the sessions I
                                                have with people here
csimean         float   %9.0g                 
csi_tot         float   %9.0g                 
sci_SF_mean     float   %9.0g                 
csi_SF_tot      float   %9.0g                 
T1              float   %9.0g                 One month indicator
T2              float   %9.0g                 Two month indicator
T3              float   %9.0g                 Three month indicator
T6              float   %9.0g                 Six month indicator
grp             float   %12.0g     group_labels
                                              Study group
int1            float   %9.0g                 Month one by treatment interaction
int2            float   %9.0g                 Month two by treatment interaction
int3            float   %9.0g                 Month three by treatment
                                                interaction
int6            float   %9.0g                 Month four by treatment
                                                interaction
Tb              float   %9.0g                 Baseline PHQ9 total score
employment      float   %9.0g                 
antidep         float   %9.0g      YesNo      antidepressant use (any time
                                                point)
                                            * indicated variables have notes
--------------------------------------------------------------------------------
Sorted by: TIME

. codebook ageR LEAPS1

--------------------------------------------------------------------------------
ageR                                                                   Age Group
--------------------------------------------------------------------------------

                  Type: Numeric (float)

                 Range: [1,6]                         Units: 1
         Unique values: 6                         Missing .: 152/516

            Tabulation: Freq.  Value
                           66  1
                           98  2
                           78  3
                           59  4
                           37  5
                           26  6
                          152  .

--------------------------------------------------------------------------------
LEAPS1                                         What kind of paid work do you do 
--------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: LEAPS1

                 Range: [1,27]                        Units: 1
         Unique values: 19                        Missing .: 270/516

              Examples: 10    Human Resources
                        21    Other
                        .     
                        .     

. labelbook  sex SF369C 

--------------------------------------------------------------------------------
Value label SF369C 
--------------------------------------------------------------------------------

      Values                                    Labels
       Range:  [-99,6]                   String length:  [11,20]
           N:  7                 Unique at full length:  yes
        Gaps:  yes                 Unique at length 12:  yes
  Missing .*:  0                           Null string:  no
                               Leading/trailing blanks:  no
                                    Numeric -> numeric:  no
  Definition
         -99   Not Applicable
           1   All of the time
           2   Most of the time
           3   Some of the time
           4   A little of the time
           5   None of the time
           6   No response

   Variables:  SF369C


--------------------------------------------------------------------------------
Value label sex 
--------------------------------------------------------------------------------

      Values                                    Labels
       Range:  [0,1]                     String length:  [4,6]
           N:  2                 Unique at full length:  yes
        Gaps:  no                  Unique at length 12:  yes
  Missing .*:  0                           Null string:  no
                               Leading/trailing blanks:  no
                                    Numeric -> numeric:  no
  Definition
           0   female
           1   male

   Variables:  CSRI2


We can also add a label to a dataset, which can be seen when we use describe. Be aware, that this will overwrite existing data labels.

. label data "This is our exercise data set"

. label data "This is our data set to play around"

We will talk later on data management more, but here just how to save a dataset: the command save will save it to the current working directory, which you can find out by typing pwd. So, normally, at the beginning of a working session - most often in a do-file - , you would set the working directory with, for example:

F:\Dropbox\000_homepages\pt-wissen\stata.

You can also do this by clicking on FILE–>change working directory… You could also add a path in front of the dataset name, either an absolute path or a relative path.

Absolute path: "C:\Windows\calc.exe"

Relative path: "../02_data/file_to_load.dta" (the ../ means that you tell Stata to go one folder up and then enter the 02_data folder, and there load the file_to_load.dta file. )

. pwd
F:\Dropbox\000_homepages\pt-wissen\stata

. save "this_is_our_exercise_dataset.dta", replace 
file this_is_our_exercise_dataset.dta saved

The Command Syntax

The basic anatomy of a command line is: [prefix:] command [varlist] [if] [in] [weight] [, options]

The parts that are put in [] above are optional, you don’t need them always.

Here an example: The prefix bysort: can be used in front of a command to produce a stratified execution of the command. *bysort ageR: sum LEAPS_tot *

. bysort ageR: sum LEAPS_tot 

--------------------------------------------------------------------------------
-> ageR = 1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |         40      7.9125    7.556528          0         28

--------------------------------------------------------------------------------
-> ageR = 2

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |         75    5.813333    4.625702          0         21

--------------------------------------------------------------------------------
-> ageR = 3

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |         60    6.166667     5.90078          0         27

--------------------------------------------------------------------------------
-> ageR = 4

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |         40       7.725    5.444298          0         22

--------------------------------------------------------------------------------
-> ageR = 5

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |          8       5.625    4.438066          0         12

--------------------------------------------------------------------------------
-> ageR = 6

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |          0

--------------------------------------------------------------------------------
-> ageR = .

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |          0


A similar results - even a bit nicer - could be reached by using the option by instead of the prefix bysort

. tabstat LEAPS_tot, by(ageR) stats(mean, sd, min max, p25, p50, p75, n)

Summary for variables: LEAPS_tot
Group variable: ageR (Age Group)

    ageR |      Mean        SD       Min       Max       p25       p50       p75
---------+----------------------------------------------------------------------
       1 |    7.9125  7.556528         0        28       2.5         5        13
       2 |  5.813333  4.625702         0        21         2         5         8
       3 |  6.166667   5.90078         0        27         2         5       9.5
       4 |     7.725  5.444298         0        22         4       6.5        10
       5 |     5.625  4.438066         0        12         1         7       8.5
       6 |         .         .         .         .         .         .         .
---------+----------------------------------------------------------------------
   Total |  6.621076  5.751314         0        28         2         5         9
--------------------------------------------------------------------------------

    ageR |         N
---------+----------
       1 |        40
       2 |        75
       3 |        60
       4 |        40
       5 |         8
       6 |         0
---------+----------
   Total |       223
--------------------

Let’s look at the [if] part. Have a special look at the second example, where we combine - with or - two conditions.

. summarize LEAPS_tot if ageR==1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |         40      7.9125    7.556528          0         28

. summarize LEAPS_tot if ageR==1 | ageR==2

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   LEAPS_tot |        115    6.543478    5.867905          0         28

We can also show how the in functions, have a look at the following example (of course it depends on how the data are sorted):

You can sort in ascending order with the command sort. If you want to sort in descending order, you use gsort - (gsort+ = ascending, gsort - = descending ).

. sort idno

. list LEAPS_tot in 1/10

     +----------+
     | LEAPS_~t |
     |----------|
  1. |       13 |
  2. |        . |
  3. |        5 |
  4. |        7 |
  5. |        . |
     |----------|
  6. |        . |
  7. |        . |
  8. |        . |
  9. |        4 |
 10. |       12 |
     +----------+

or like this:

. sort idno

. count 
  516

. list LEAPS_tot in 15

     +----------+
     | LEAPS_~t |
     |----------|
 15. |        . |
     +----------+

A negative number indicates minus from the end , the lowercase letters l and f mean last observation (l) and first observation (f)

or like this:

. sort idno

. count 
  516

. list LEAPS_tot in -10/l

     +----------+
     | LEAPS_~t |
     |----------|
507. |        9 |
508. |        2 |
509. |        6 |
510. |        5 |
511. |        6 |
     |----------|
512. |        3 |
513. |        7 |
514. |        8 |
515. |        1 |
516. |        0 |
     +----------+

Variable Names

Stata is case sensitive. Variable names can’t start with numbers. A variable can be a maximum of 32 characters. If there is no ambuigity, you don’t need to write the whole variable name, however, if there is ambuigity, Stata will throw an error. You can also put a * as a wildcard for zero, one or several characters.

Look at the example, the variable is called ageR, but its enough to write age. However, because there are several variables that start with LEAPS, the second command would not run (that’s why I added the prefix *capture: * in front of it, so that the programme continues to run despite the error (if not, this html would not be created). The error message (that is now suppresed by the capture: command) would be: LEAP ambiguous abbreviation However, the third line works.

. sum age 

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        ageR |        364    2.947802    1.486401          1          6

. capture: sum LEAP // This will not run, error: LEAP ambiguous abbreviation

. sum LEAP*

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      LEAPS1 |        246    14.93902    7.840784          1         27
      LEAPS2 |        229    7.262009    2.578827          1         11
      LEAPS3 |        226         1.5    1.567021          1          9
     LEAPS4A |        223    1.367713    1.056613          0          4
     LEAPS4B |        222    1.085586    1.005356          0          4
-------------+---------------------------------------------------------
     LEAPS4C |        223    1.278027    1.108449          0          4
     LEAPS4D |        223    .8654709    1.039673          0          4
     LEAPS4E |        222    .5630631    .8734835          0          4
     LEAPS4F |        223     .632287    .8643269          0          4
     LEAPS4G |        223    .8206278    1.144562          0          4
-------------+---------------------------------------------------------
     LEAPS_M |        223     .945868    .8216163          0          4
   LEAPS_tot |        223    6.621076    5.751314          0         28

Variable Labels

There are three kind of labels in Stata: the data label, which we saw already above, the value labels, at which we will look later, and the variable labels. In our example dataset, some variables already have labels. You seee the labels either in the Variables window, or you can type describe in the command line.

. describe

Contains data from this_is_our_exercise_dataset.dta
 Observations:           516                  This is our data set to play
                                                around
    Variables:           133                  19 Dec 2021 09:46
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
idno            int     %8.0g                 
PRACTICE        float   %9.0g                 
collectionpoint byte    %8.0g      COLLECTI   Data collection point
ageR            float   %9.0g                 Age Group
CSRI2           byte    %8.0g      sex        What is your gender?
CSRI3           byte    %8.0g      CSRI3      What is your marital status?
CSRI4           byte    %8.0g      CSRI4      What is the highest degree or
                                                level of school you have
                                                completed?
CSRI5           byte    %8.0g      CSRI5      What is your total household
                                                income?
CSRI6           byte    %8.0g      CSRI6      What is your mother tongue?
CSRI7           byte    %8.0g      CSRI7      Country of Birth?
CSRI8           byte    %8.0g      CSRI8      Who do you usually live with?
CSRI9           byte    %8.0g      CSRI9      Current employment status
TIME            float   %9.0g                 
PHQ91           byte    %8.0g      PHQ91      Little interest or pleasure in
                                                doing things
PHQ92           byte    %8.0g      PHQ92      Feeling down, depressed, or
                                                hopeless
PHQ93           byte    %8.0g      PHQ93      Trouble falling or staying asleep,
                                                or sleeping too much
PHQ94           byte    %8.0g      PHQ94      Feeling tired or having little
                                                energy
PHQ95           byte    %8.0g      PHQ95      Poor appetite or overeating
PHQ96           byte    %8.0g      PHQ96    * Feeling bad about yourself or that
                                                you are a failure or have let
                                                yourself or you
PHQ97           byte    %8.0g      PHQ97    * Trouble concentrating on things,
                                                such as reading the newspaper or
                                                watching telev
PHQ98           byte    %8.0g      PHQ98    * Moving or speaking so slowly that
                                                other poeple could have noticed?
                                                Or the opposi
PHQ99           byte    %8.0g      PHQ99      Thoughts that you would be better
                                                off dead or of hurting yourself
                                                in some way
PHQ_TOT         byte    %8.0g                 
LEAPS1          byte    %8.0g      LEAPS1     What kind of paid work do you do
LEAPS2          byte    %8.0g      LEAPS2     hours scheduled or expected to
                                                work
LEAPS3          byte    %8.0g      LEAPS3     hours of work missed because of
                                                the way you were feeling
LEAPS4A         byte    %8.0g      LEAPS4A    low energy or motivation
LEAPS4B         byte    %8.0g      LEAPS4B    Poor concentration or memory
LEAPS4C         byte    %8.0g      LEAPS4C    Anxiety or irritability
LEAPS4D         byte    %8.0g      LEAPS4D    Getting less work done
LEAPS4E         byte    %8.0g      LEAPS4E    Doing poor quality work
LEAPS4F         byte    %8.0g      LEAPS4F    Making more mistakes
LEAPS4G         byte    %8.0g      LEAPS4G    Having trouble getting along with
                                                people, or avioding them
LEAPS_M         double  %12.0g                
LEAPS_tot       double  %12.0g                
SDS1            byte    %8.0g      SDS1       The symptoms have disrupted your
                                                work/school work:
SDS2            byte    %8.0g      SDS2       The symptoms have disrupted your
                                                social life/leisure activities:
SDS3            byte    %8.0g      SDS3       The symptoms have disrupted your
                                                family life/home
                                                responsibilities:
SDS_M           double  %12.0g                
SDSTOT_fixed    float   %9.0g                 
SDS4            byte    %8.0g      SDS4       days last week lost
SDS5            byte    %8.0g      SDS5       days last week unproductive
SF361           byte    %8.0g      SF361      In general, would you say your
                                                health is
SF362           byte    %8.0g      SF362      Compared to one year ago, how
                                                would you rate your health in
                                                general now?
SF363A          byte    %8.0g      SF363A   * The following questions are about
                                                activities you might do during a
                                                typical day.
SF363B          byte    %8.0g      SF363B   * Moderate activities, such as
                                                moving a table, pushing a vacuum
                                                cleaner, bowling,
SF363C          byte    %8.0g      SF363C     Lifting or carrying groceries
SF363D          byte    %8.0g      SF363D     Climbing several flights of stairs
SF363E          byte    %8.0g      SF363E     Climbing one flight of stairs
SF363F          byte    %8.0g      SF363F     Bending, kneeling, or stopping
SF363G          byte    %8.0g      SF363G     Walking more than a mile
SF363H          byte    %8.0g      SF363H     Walking several hundred yards
SF363I          byte    %8.0g      SF363I     Walking one hundred yards
SF363J          byte    %8.0g      SF363J     Bathing or dressing yourself
SF364A          byte    %8.0g      SF364A   * During the past 4 weeks, how much
                                                of the time have you had any of
                                                the following
SF364B          byte    %8.0g      SF364B     Accomplished less than you would
                                                like
SF364C          byte    %8.0g      SF364C     Were limited in the kind of work
                                                or other activities
SF364D          byte    %8.0g      SF364D   * Had difficulty performing the work
                                                or other activities (for
                                                example, it took ext
SF365A          byte    %8.0g      SF365A   * During the past 4 weeks, how much
                                                of the time have you had any of
                                                the following
SF365B          byte    %8.0g      SF365B     Accomplished less than you would
                                                like
SF365C          byte    %8.0g      SF365C     Did work or other activities less
                                                carefully than usual
SF366           byte    %8.0g      SF366    * During the past 4 weeks, to what
                                                extent has your physical health
                                                or emotional pr
SF367           byte    %8.0g      SF367      How much bodily pain have you had
                                                during the past 4 weeks?
SF368           byte    %8.0g      SF368    * During the past 4 weeks, how much
                                                did pain interfere with your
                                                normal work (incl
SF369A          byte    %8.0g      SF369A   * These questions are about how you
                                                feel and how things have been
                                                with you during
SF369B          byte    %8.0g      SF369B     Have you been very nervous?
SF369C          byte    %8.0g      SF369C     Have you felt so down in the dumps
                                                that nothing could cheer you up?
SF369D          byte    %8.0g      SF369D     Have you felt calm and peaceful?
SF369E          byte    %8.0g      SF369E     Did you have a lot of energy?
SF369F          byte    %8.0g      SF369F     Have you felt downhearted and
                                                depressed?
SF369G          byte    %8.0g      SF369G     Did you feel worn out?
SF369H          byte    %8.0g      SF369H     Have you been happy?
SF369I          byte    %8.0g      SF369I     Did you feel tired?
SF3610          byte    %8.0g      SF3610   * During the past 4 weeks, how much
                                                of the time has your physical
                                                health or emotio
SF3611A         byte    %8.0g      SF3611A  * How true or false is each of the
                                                following statements for you? I
                                                seem to get sic
SF3611B         byte    %8.0g      SF3611B    I am as healthy as anybody I know
SF3611C         byte    %8.0g      SF3611C    I expect my health to get worse
SF3611D         byte    %8.0g      SF3611D    My health is excellent
SF36_PF         float   %9.0g                 
SF36_RP         float   %9.0g                 
SF36_RE         float   %9.0g                 
SF36_VT         float   %9.0g                 
SF36_MH         float   %9.0g                 
SF36_SF         float   %9.0g                 
SF36_BP         float   %9.0g                 
SF36_GH         float   %9.0g                 
SF36_TOT        float   %9.0g                 
CSIa            byte    %23.0g     CSI1       The services I get here are a big
                                                help to me
CSIb            byte    %23.0g     LABK       People here really seem to care
                                                about me
CSIc            byte    %23.0g     LABK       I would come back here if I need
                                                help again
CSI4            byte    %23.0g     LABK       I feel that no one here really
                                                listens to me
CSI4_R          float   %9.0g                 
CSIe            byte    %23.0g     LABK       People here treat me like a
                                                person, not like a number
CSIf            byte    %23.0g     LABK       I have learned a lot here about
                                                how to deal with my problems
CSI7            byte    %23.0g     LABK       People here want to do things
                                                their way, instead of helping me
                                                find my way
CSI7_R          float   %9.0g                 
CSIh            byte    %23.0g     LABK       I would recommend this place to
                                                people I care about
CSIi            byte    %23.0g     LABK       People here really know what they
                                                are doing
CSIj            byte    %23.0g     LABK       I get the kind of help here that I
                                                really need
CSIk            byte    %23.0g     LABK       People here accept me for who I am
CSIl            byte    %23.0g     LABK       I feel much better now than when I
                                                first came here
CSIm            byte    %23.0g     LABK       I thought no one could help me
                                                until I came here
CSIn            byte    %23.0g     LABK       The help I get here is really
                                                worth what it costs
CSIo            byte    %23.0g     LABK       People here put my needs ahead of
                                                their needs
CSI16           byte    %23.0g     LABK       People here put me down when I
                                                disagree with them
CSI16_R         float   %9.0g                 
CSI1q           byte    %23.0g     LABK       The biggest help I get here is
                                                learning how to help myself
CSI18           byte    %23.0g     LABK       People here are just trying to get
                                                rid of me
CSI18_R         float   %9.0g                 
CSIs            byte    %23.0g     LABK       People who know me say this place
                                                has made a positive change in me
CSIt            byte    %23.0g     LABK       People here have shown me how to
                                                get help from other places
CSIu            byte    %23.0g     LABK       People here seem to understand how
                                                I feel
CSI22           byte    %23.0g     LABK       People here are only concerned
                                                about getting paid
CSI22_R         float   %9.0g                 
CSIw            byte    %23.0g     LABK       I feel I can really talk to people
                                                here
CSIx            byte    %23.0g     LABK       The help I get here is better than
                                                I expected
CSIy            byte    %8.0g      CSI25      I look forward to the sessions I
                                                have with people here
csimean         float   %9.0g                 
csi_tot         float   %9.0g                 
sci_SF_mean     float   %9.0g                 
csi_SF_tot      float   %9.0g                 
T1              float   %9.0g                 One month indicator
T2              float   %9.0g                 Two month indicator
T3              float   %9.0g                 Three month indicator
T6              float   %9.0g                 Six month indicator
grp             float   %12.0g     group_labels
                                              Study group
int1            float   %9.0g                 Month one by treatment interaction
int2            float   %9.0g                 Month two by treatment interaction
int3            float   %9.0g                 Month three by treatment
                                                interaction
int6            float   %9.0g                 Month four by treatment
                                                interaction
Tb              float   %9.0g                 Baseline PHQ9 total score
employment      float   %9.0g                 
antidep         float   %9.0g      YesNo      antidepressant use (any time
                                                point)
                                            * indicated variables have notes
--------------------------------------------------------------------------------
Sorted by: idno

Maybe we take some minutes to skim through the help file for the labels. So let’s type help labels in the command line. This will open the viewer window with the help for labels. This is a shorter version of the help for labels. If you want a longer version, click on the (View complete PDF manual entry) at the second line of the help file.

. help labels

In the help file we saw that it is quite easy to label a variable. So let’s label the variable idno, which does not have a label yet.

. label variable idno "Identification Number of Participants"

. describe idno

Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
idno            int     %8.0g                 Identification Number of
                                                Participants

Can we change a variable label? Let’s try it: (works fine).

. label variable idno "Identification Code of Participants"

. describe idno

Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
idno            int     %8.0g                 Identification Code of
                                                Participants

Types of Variables

Maybe its time to look at different data types. The best way to learn this is with the PDF manual. We can easily access this chapter via the help file of data types. There, we will click on the second line (View complete PDF manual entry), and in the PDF wi will cklick on See [U] 12 Data for details.

We can also visit this same document online by clicking here Of course I will not repeat here what we can read in the help chapter. So see here again after reading the chapter 12 on Data.

. help data types

We should also read the chapters 24 * working with strings* and 26 working with categorical data and factor variables of the manual. We save the chapter 25 working with dates and times for later.

Now, let’s do an exercise. We will create a new variable called sex. We will first create this as a string variable with the content women and men and other. This way of storing the information on sex is not efficient, we would better store it as a categorical numerical variable. Therefore, we create from the string variable a categorical numerical variable.

. tab CSRI2

    What is |
       your |
    gender? |      Freq.     Percent        Cum.
------------+-----------------------------------
     female |        264       72.53       72.53
       male |        100       27.47      100.00
------------+-----------------------------------
      Total |        364      100.00

. tab CSRI2, nol 

    What is |
       your |
    gender? |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        264       72.53       72.53
          1 |        100       27.47      100.00
------------+-----------------------------------
      Total |        364      100.00

. gen sex_string=""
(516 missing values generated)

. replace sex_string="Men" if CSRI2==1
variable sex_string was str1 now str3
(100 real changes made)

. replace sex_string="Women" if CSRI2==0
variable sex_string was str3 now str5
(264 real changes made)

. tab sex_string CSRI2

           | What is your gender?
sex_string |    female       male |     Total
-----------+----------------------+----------
       Men |         0        100 |       100 
     Women |       264          0 |       264 
-----------+----------------------+----------
     Total |       264        100 |       364 

As I wrote above, to store data like this is not the best way, so most often we would want to create a numeric variable and add value labels to each different value.

. encode sex_string, gen(sex_num)

. tab sex_string sex_num

           |        sex_num
sex_string |       Men      Women |     Total
-----------+----------------------+----------
       Men |       100          0 |       100 
     Women |         0        264 |       264 
-----------+----------------------+----------
     Total |       100        264 |       364 

. tab sex_num 

    sex_num |      Freq.     Percent        Cum.
------------+-----------------------------------
        Men |        100       27.47       27.47
      Women |        264       72.53      100.00
------------+-----------------------------------
      Total |        364      100.00

. tab sex_num, nol

    sex_num |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        100       27.47       27.47
          2 |        264       72.53      100.00
------------+-----------------------------------
      Total |        364      100.00

. recode sex_num (1=0) (2=1)
(364 changes made to sex_num)

. tab sex_num 

    sex_num |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        100       27.47       27.47
        Men |        264       72.53      100.00
------------+-----------------------------------
      Total |        364      100.00

. labelbook sex_num 

--------------------------------------------------------------------------------
Value label sex_num 
--------------------------------------------------------------------------------

      Values                                    Labels
       Range:  [1,2]                     String length:  [3,5]
           N:  2                 Unique at full length:  yes
        Gaps:  no                  Unique at length 12:  yes
  Missing .*:  0                           Null string:  no
                               Leading/trailing blanks:  no
                                    Numeric -> numeric:  no
  Definition
           1   Men
           2   Women

   Variables:  sex_num


. label define sex_num 1"Women" 0"Men", modify

. label value sex_num sex_num 

. tab sex_num CSRI2

           | What is your gender?
   sex_num |    female       male |     Total
-----------+----------------------+----------
       Men |         0        100 |       100 
     Women |       264          0 |       264 
-----------+----------------------+----------
     Total |       264        100 |       364 

. 

In our data, we have only two values for sex. But even there is no other value, we could add a label for potential other values, such as “other”. We could overwrite the existing label (which is named sex_num), or we could create a new label - let’s call it sex_WomenMenOther.

. label define sex_WomenMenOther 0"Men" 1"Women" 2"Other"

. label value sex_num sex_WomenMenOther

. tab sex_num, mi

    sex_num |      Freq.     Percent        Cum.
------------+-----------------------------------
        Men |        100       19.38       19.38
      Women |        264       51.16       70.54
          . |        152       29.46      100.00
------------+-----------------------------------
      Total |        516      100.00

Please read also the PDF manual entry for encode and the help file for recode.

Simple Tables

We use the new table function that are available since the version 17. After the table command, you put two parentheses, the first for the rows and the second for the columns. Look at this example (the second variant is witouth the totals column):

. table (ageR)(CSRI2)

-------------------------------------
          |    What is your gender?  
          |   female    male    Total
----------+--------------------------
Age Group |                          
  1       |       53      13       66
  2       |       72      26       98
  3       |       55      23       78
  4       |       36      23       59
  5       |       27      10       37
  6       |       21       5       26
  Total   |      264     100      364
-------------------------------------

. table (ageR)(CSRI2), nototals

-----------------------------------
          |   What is your gender? 
          |       female       male
----------+------------------------
Age Group |                        
  1       |           53         13
  2       |           72         26
  3       |           55         23
  4       |           36         23
  5       |           27         10
  6       |           21          5
-----------------------------------

We can also create nested tables.

. table (ageR, CSRI2) (collectionpoint)

-------------------------------------------------------------------------
                         |              Data collection point            
                         |  1 Month   2 Month   3 Month   6 Month   Total
-------------------------+-----------------------------------------------
Age Group                |                                               
  1                      |                                               
    What is your gender? |                                               
      female             |       16        12        12        13      53
      male               |        4         3         3         3      13
      Total              |       20        15        15        16      66
  2                      |                                               
    What is your gender? |                                               
      female             |       18        17        16        21      72
      male               |        6         5         7         8      26
      Total              |       24        22        23        29      98
  3                      |                                               
    What is your gender? |                                               
      female             |       16        12        13        14      55
      male               |        6         7         6         4      23
      Total              |       22        19        19        18      78
  4                      |                                               
    What is your gender? |                                               
      female             |       10         9         9         8      36
      male               |        7         5         5         6      23
      Total              |       17        14        14        14      59
  5                      |                                               
    What is your gender? |                                               
      female             |        8         7         6         6      27
      male               |        3         3         2         2      10
      Total              |       11        10         8         8      37
  6                      |                                               
    What is your gender? |                                               
      female             |        5         4         6         6      21
      male               |        1         1         1         2       5
      Total              |        6         5         7         8      26
  Total                  |                                               
    What is your gender? |                                               
      female             |       73        61        62        68     264
      male               |       27        24        24        25     100
      Total              |      100        85        86        93     364
-------------------------------------------------------------------------

Let’s add the percentages, too

. table (ageR, CSRI2) (collectionpoint),statistic(frequency) statistic(percent)

--------------------------------------------------------------------------
                         |               Data collection point            
                         |  1 Month   2 Month   3 Month   6 Month    Total
-------------------------+------------------------------------------------
Age Group                |                                                
  1                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |       16        12        12        13       53
        Percent          |     4.40      3.30      3.30      3.57    14.56
      male               |                                                
        Frequency        |        4         3         3         3       13
        Percent          |     1.10      0.82      0.82      0.82     3.57
      Total              |                                                
        Frequency        |       20        15        15        16       66
        Percent          |     5.49      4.12      4.12      4.40    18.13
  2                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |       18        17        16        21       72
        Percent          |     4.95      4.67      4.40      5.77    19.78
      male               |                                                
        Frequency        |        6         5         7         8       26
        Percent          |     1.65      1.37      1.92      2.20     7.14
      Total              |                                                
        Frequency        |       24        22        23        29       98
        Percent          |     6.59      6.04      6.32      7.97    26.92
  3                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |       16        12        13        14       55
        Percent          |     4.40      3.30      3.57      3.85    15.11
      male               |                                                
        Frequency        |        6         7         6         4       23
        Percent          |     1.65      1.92      1.65      1.10     6.32
      Total              |                                                
        Frequency        |       22        19        19        18       78
        Percent          |     6.04      5.22      5.22      4.95    21.43
  4                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |       10         9         9         8       36
        Percent          |     2.75      2.47      2.47      2.20     9.89
      male               |                                                
        Frequency        |        7         5         5         6       23
        Percent          |     1.92      1.37      1.37      1.65     6.32
      Total              |                                                
        Frequency        |       17        14        14        14       59
        Percent          |     4.67      3.85      3.85      3.85    16.21
  5                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |        8         7         6         6       27
        Percent          |     2.20      1.92      1.65      1.65     7.42
      male               |                                                
        Frequency        |        3         3         2         2       10
        Percent          |     0.82      0.82      0.55      0.55     2.75
      Total              |                                                
        Frequency        |       11        10         8         8       37
        Percent          |     3.02      2.75      2.20      2.20    10.16
  6                      |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |        5         4         6         6       21
        Percent          |     1.37      1.10      1.65      1.65     5.77
      male               |                                                
        Frequency        |        1         1         1         2        5
        Percent          |     0.27      0.27      0.27      0.55     1.37
      Total              |                                                
        Frequency        |        6         5         7         8       26
        Percent          |     1.65      1.37      1.92      2.20     7.14
  Total                  |                                                
    What is your gender? |                                                
      female             |                                                
        Frequency        |       73        61        62        68      264
        Percent          |    20.05     16.76     17.03     18.68    72.53
      male               |                                                
        Frequency        |       27        24        24        25      100
        Percent          |     7.42      6.59      6.59      6.87    27.47
      Total              |                                                
        Frequency        |      100        85        86        93      364
        Percent          |    27.47     23.35     23.63     25.55   100.00
--------------------------------------------------------------------------

An excellent video on how to build tables in Stata can be found here, there is also a [second part for two-way tables]https://youtu.be/u_Efw1oWxWk).

You might also find these webpages interesting:

Stata introduction on tables

Stata tables, part 1

Stata 17, part2, the new collect command for tables

Stata tables, part 3

Stata tables, part 4

Stata tables, part 5

Stata tables, part 6

Stata tables, part 7

Create and Replace Variables

If we create a numeric variables, the standard type of this variable is float. You could change this by useing *set type double * - check the standad settings with query. Maybe you want to read this webpage

. gen numeric_float_variable=rnormal(50, 12)

. describe numeric_float_variable

Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
numeric_float~e float   %9.0g                 

. sum numeric_float_variable

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
numeric_fl~e |        516    50.38684    11.60436   11.99885   92.54129

Sometimes, the type of the variable can cause some troubles. In the following example, a comparison of a float number is problematic:

. generate x = 1.1

. count if x==1.1
  0

We would expect that the command count if would find observations, because we created a lot of them. In the next code-sections you see two solutions:

<<dd_-do>>
count if x==float(1.1)
<</dd_do>>

Here we specified that the comparison is made with “float” precision. In the next solution, we create the variable as double (which has more precision).

. gen double y=1.1

. describe y

Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
y               double  %10.0g                

. count if y==1.1
  516

You find the explanation here.

When you will calculate with dates, it will be important to use the double format see here.

Back to replacing values of variables. Often you generate an empty variable and then you replace it based on conditions.

. tab CSRI2

    What is |
       your |
    gender? |      Freq.     Percent        Cum.
------------+-----------------------------------
     female |        264       72.53       72.53
       male |        100       27.47      100.00
------------+-----------------------------------
      Total |        364      100.00

. tab CSRI2, nol 

    What is |
       your |
    gender? |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        264       72.53       72.53
          1 |        100       27.47      100.00
------------+-----------------------------------
      Total |        364      100.00

. capture: drop sex 

. gen employed_women=. 
(516 missing values generated)

. replace employed_women = 1 if CSRI2 == 0 & employment == 1
(119 real changes made)

. replace employed_women = 0 if CSRI2 == 1 & employment == 0 
(47 real changes made)

. label define empl_wom 0 "Unemployed (Men or Women)" 1"Employed Women"

. label values employed_women empl_wom

. tab employed_women

           employed_women |      Freq.     Percent        Cum.
--------------------------+-----------------------------------
Unemployed (Men or Women) |         47       28.31       28.31
           Employed Women |        119       71.69      100.00
--------------------------+-----------------------------------
                    Total |        166      100.00

Create Variables with Extended gen egen

We will often use extended gen functions, as implemented in egen. For example if we want to calculate the number of missing values in a series of variables.

. egen missing_LEAPS = rowmiss(LEAPS1 - LEAPS4G)

. tab missing_LEAPS

missing_LEA |
         PS |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        217       42.05       42.05
          1 |          4        0.78       42.83
          2 |          2        0.39       43.22
          7 |          6        1.16       44.38
          8 |          2        0.39       44.77
          9 |         16        3.10       47.87
         10 |        269       52.13      100.00
------------+-----------------------------------
      Total |        516      100.00

There are many other egen functions, so check out the help file help egen.

Working with Dates and Times

For those who worked with older Stata versions, Stata added new features, see here. We will now save the current dataset and load an exemple with dates and times.

. save exercise.dta, replace 
file exercise.dta saved

. // import delimited dates_and_times.csv, clear // if you have saved it locally
>  
. use http://pt-wissen.ch/stata/dates_and_times.dta, clear  

The first date is saved as a string variable, so we first need to transform it:

. gen tagmonatjahr=date( Tag_Monat_Jahr, "DMY")

Open the data in the browser and have a look at what we have created. These are the number of days since January first 1960. Now we want to transform this into something more readable.

. format tagmonatjahr %td

Now we can read it.

For the variable with the time dmy_hms, we need to create a double type variable:

. gen double tagmonatjahr_stundeminutesekunde=clock(dmy_hms, "DMYhms")

. format tagmonatjahr_stundeminutesekunde %tc

. list tagmonatjahr tagmonatjahr_stundeminutesekunde

     +--------------------------------+
     | tagmona~r   tagmonatjahr_stu~e |
     |--------------------------------|
  1. | 28jan2020   28jan2020 17:55:05 |
  2. | 03feb2019   03feb2019 09:30:10 |
  3. | 01dec1987   01dec1987 20:30:10 |
  4. | 08sep1821   08sep1821 10:45:20 |
     +--------------------------------+

. gen date_of_birth = date(dob, "DMY")

. format date_of_birth %td 

. gen date_of_death = date(dod, "DMY")
(2 missing values generated)

. format date_of_death %td

. list dob dod date_of_birth date_of_death 

     +-------------------------------------------------+
     | dob          dod          date_~rth   date_~ath |
     |-------------------------------------------------|
  1. | 18.06.1942                18jun1942           . |
  2. | 09.10.1940   08.12.1980   09oct1940   08dec1980 |
  3. | 25.02.1943   29.11.2001   25feb1943   29nov2001 |
  4. | 7.7.1940                  07jul1940           . |
     +-------------------------------------------------+

Calculating the time between two dates is easy :

. gen age_years_at_death_v1 = (date_of_death - date_of_birth)/365.25
(2 missing values generated)

Since Stata 17 there is a new way to do this (version 4 in the code below).

. gen age_years_at_death_v2=age(date_of_birth, date_of_death)
(2 missing values generated)

. gen age_years_at_death_v3=age_frac(date_of_birth, date_of_death)
(2 missing values generated)

. gen age_years_at_death_v4=datediff(date_of_birth, date_of_death, "year")
(2 missing values generated)

Exercise: Calculate with Date and Time Variables

Let’s do some exercises: calculate the dates for the other variables, and then calculate the time between the white album and the rooftop concert, as well as the duration of the root top concert.

First, we read in some data 
<dd_do>>

		input  id str10 Tag_Monat_Jahr	str10 Monat_Tag_Jahr	str16 dmy_hms	str10 dob	str10 dod	str10 date_event	str10   date_release_white_album str16 start_rooftop_concert	str165 end_rooftop_concert
1	28.1.2020	1.28.2020	"28:01:2020:17:55:05"	18.06.1942	NA	3.01.1962	22.11.1968	"30.01.1969.12:30"	"30.01.1969.13:12"
2	03.2.2019	2.3.2019	"03.2.2019:09:30:10"	09.10.1940	08.12.1980	3.01.1962	22.11.1968	"30.01.1969.12:30"	"30.01.1969.13:12"
3	01.12.1987	12.01.1987	"01/12/1987/20:30:10"	25.02.1943	29.11.2001	3.01.1962	22.11.1968	"30.01.1969.12:30"	"30.01.1969.13:12"
4	8.9.1821	9.8.1821	"08:09:1821:10:45:20"	7.7.1940	NA	3.01.1962	22.11.1968	"30.01.1969.12:30"	"30.01.1969.13:12"
end 

<</dd_do>

The data are in string format, so we need to transform it to Stata date (if only date and not time in the string) or datetime (if date and time is present in the string variable). First, the string with only date. The first line transforms it to a number which contains the days since 01jan1960

<<dd_do>

		gen tagmonatjahr=date( Tag_Monat_Jahr, "DMY")
		list tagmonatjahr
<</dd_do>

The second line makes this readable for us.

<<dd_do>

		format tagmonatjahr %td
		list tagmonatjahr
<</dd_do>

If the string variables contains time, we need to generate a double variable.

<<dd_do>
		
		gen double tagmonatjahr_stundeminutesekunde=clock(dmy_hms, "DMYhms")

		format tagmonatjahr_stundeminutesekunde %tc
<</dd_do>

The same for the remaining variables:

<<dd_do>
	
		gen date_of_birth = date(dob, "DMY")
		format date_of_birth %td
		gen date_of_death = date(dod, "DMY")
		format date_of_death %td

		list dob dod date_of_birth date_of_death
<</dd_do>

There are different methods to calculate the years between two timepoints, here to calculate the age at death. The first method does not take into account the differences in lengths of specials years, it just assumes that every year is 365.25 days.

<<dd_do>
		gen age_years_at_death_v1 = (date_of_death - date_of_birth)/365.25

<</dd_do>

The next method calculates the years - rounded to an integer:

<<dd_do>
		gen age_years_at_death_v2=age(date_of_birth, date_of_death)
<</dd_do>

The next method calculates years with fractions of a year added:

<<dd_do>		
		gen age_years_at_death_v3=age_frac(date_of_birth, date_of_death)
<</dd_do>

Another method to calcuate the time between two timepoints:

<<dd_do>
		gen age_years_at_death_v4=datediff(date_of_birth, date_of_death, "year")
<</dd_do>

Now we want to calculate the time between the release of the White Album and the start of the rooftoop concert. We will run into a problem, because the two dates are not in the same format, the release of the white album is a date (i.e. no time included), and the start of the rooftop concert contains time.

<<dd_do>
		gen date_release_white_album_date = date(date_release_white_album, "DMY")
		format date_release_white_album_date %td
		
		gen start_rooftop_concert_dt = clock(start_rooftop_concert, "DMYhm")
		format start_rooftop_concert_dt %tc
<</dd_do>

Therefore, we need to convert the data, see here: https://www.stata.com/manuals13/ddatetime.pdf#ddatetimeSyntaxSIF-to-SIFconversion We decide to convert the date variable date_release_white_album to a clock variable (date and time included), so this will ad a time to this variable.

<<dd_do>
		gen date_release_white_album_dt = cofd(date_release_white_album_date)
		format date_release_white_album_dt %tc
		list date_release_white_album_dt start_rooftop_concert_dt
<</dd_do>

Now, after we have converted the variable, we can calculate the duration in days between the two variables with clockdiff. If we would have converted the variable to date instead of clock, we would have to use datediff instead of clockdiff.

<<dd_do>
		gen days_between_white_album_concert=clockdiff(date_release_white_album_dt, start_rooftop_concert_dt, "day")
		list days_between_white_album_concert date_release_white_album_dt start_rooftop_concert_dt
<</dd_do> 

Just to give some context to this exercise:A song from the white album and here an extract of the rooftop concert.

Do-Files

Until now we worked in the command editor. Normally, we would like to make every step reproducible, therefore, we want to work in so called do-files. Stata has its own do-editor, but you can also work in any other editor you prefer. Please check the new features in the do editor for Stata 17.

Please check out this video on new functions in the do-editor and see here a video on the new bookmark function in the Stata do-editor.

Falls Sie lieber ein Video in Deutsch möchten, finden Sie hier eines.

In smaller projects, it makes sense to put all code within one do-file. In larger projects, I create one project folder that contains several subfolders, amongst them a folder for the scripts. In this script folder, I have several do-files. One large do-file has the advantage that you exactly know where a specific task was done - namely in this very do-file. You can structure your do-file with sections and you can search it easily. However, it still can be overwhelming to have all the code in one and I preferer to split the do-files up by tasks.

Fig.1 - An example of a bunch of do-files. Trade-off between to large do-files and too many do-files.

The master do-file is the one do-file you let run in the end. The master do-file will invoke - with the command include all the other do-files .

Fig.2 - An example of a master do file. This is the only place where an absolute path will be used - here to set the working directory. Even this path could be made relative.