当前位置：文档之家› Stata面板门槛回归-南开大学王群勇

Stata面板门槛回归-南开大学王群勇

The Stata Journal(2015)

15,Number1,pp.121–134

Fixed-e?ect panel threshold model using Stata

Qunyong Wang

Institute of Statistics and Econometrics

Nankai University

Tianjin,China

QunyongWang@https://www.doczj.com/doc/2811989624.html,

Abstract.Threshold models are widely used in macroeconomics and?nancial

analysis for their simple and obvious economic implications.With these models,

however,estimation and inference is complicated by the existence of nuisance

parameters.To combat this issue,Hansen(1999,Journal of Econometrics93:345–

368)proposed the?xed-e?ect panel threshold model.In this article,I introduce

a new command(xthreg)for implementing this model.I also use Monte Carlo

simulations to show that,although the size distortion of the threshold-e?ect test

is small,the coverage rate of the con?dence interval estimator is unsatisfactory.I

include an example on?nancial constraints(originally from Hansen[1999,Journal

of Econometrics93:345–368])to further demonstrate the use of xthreg.

Keywords:st0373,xthreg,panel threshold,?xed e?ect

1Introduction

Heterogeneity is a common problem of panel data.That is to say,each individual in a study is di?erent,and structural relationships may vary across individuals.The classical ?xed e?ect or random e?ect re?ects only the heterogeneity in intercepts.Hsiao(2003) considers many varying slope models for this problem.Among these models,Hansen’s (1999)panel threshold model has a simple speci?cation but obvious implications for economic policy.Though threshold models are familiar in time-series analysis,their use with panel data has been limited.

The threshold model describes the jumping character or structural break in the re-lationship between variables.This model type is popular in nonlinear time series,one example being the threshold autoregressive(TAR)model(Tong1983).This model can capture many economic phenomena.For example,using?ve-year interval averages of standard measures of?nancial development,in?ation,and growth for84countries from 1960to1995,Rousseau and Wachtel(2009)showed that there is an in?ation thresh-old for the?nance and growth relationship that lies between13–25%.When in?ation exceeds the threshold,?nance ceases to increase economic growth.In?ation’s e?ect on economic growth depends on the in?ation level.High levels of in?ation are harmful to economic growth,while low levels of in?ation are bene?cial to economic growth.As an-other example,the technical spillover of foreign direct investment(FDI)has been widely studied.Girma(2005)found that the productivity bene?t from FDI increases with ab-sorptive capacity until some threshold level,at which point it becomes less pronounced. There is also a minimum absorptive capacity threshold level below which productivity spillovers from FDI are negligible or even negative.

c 2015StataCorp LP st0373

122Fixed-e?ect panel threshold model using Stata This article is arranged as follows.In section2,I review some basic theories about ?xed-e?ect panel threshold models.I then describe the new xthreg command in sec-tion3.In section4,I perform Monte Carlo simulations to study test-power distortion and the coverage rate of con?dence interval estimators in?nite samples.I illustrate use of the command with an example from Hansen(1999)in section5.In section6,I conclude the article.

2Fixed-e?ect panel threshold models

2.1Single-threshold model

Consider the following single-threshold model:

y it=μ+X it(q it<γ)β1+X it(q it≥γ)β2+u i+e it(1) The variable q it is the threshold variable,andγis the threshold parameter that divides the equation into two regimes with coe?cientsβ1andβ2.The parameter u i is the individual e?ect,while e it is the disturbance.We can also write(1)as

y it=μ+X it(q it,γ)β+u i+e it

where

X it(q it,γ)=

X it I(q it<γ) X it I(q it≥γ)

Givenγ,the ordinary least-squares estimator ofβis

β={X?(γ) X?(γ)}?1{X?(γ) y?}

where y?and X?are within-group deviations.The residual sum of squares(RSS)is equal to e? e?.To estimateγ,one can search over a subset of the threshold variable q it. Instead of searching over the whole sample,we restrict the range within the interval (γ,γ),which are quantiles of q it.γ’s estimator is the value that minimizes the RSS,that is,

γ=arg min

S1(γ)

Ifγis known,the model is no di?erent from the ordinary linear model.But if γis unknown,there is a nuisance parameter problem,which makes theγestimator’s distribution nonstandard.Hansen(1999)proved that γis a consistent estimator forγ, and he argued that the best way to testγ=γ0is to form the con?dence interval using the“no-rejection region”method with a likelihood-ratio(LR)statistic,as follows:

LR1(γ)={LR1(γ)?LR1( γ)}

σ2

??→ξ

Pr(x<ξ)=(1?e?x2)2(2)

Q.Wang 123

Given signi?cance level α,the lower limit corresponds to the maximum value in the LR series,which is less than the αquantile,and the upper limit corresponds to the minimum value in the LR series,which is less than the αquantile.The αquantile can be computed from the following inverse function of (2):

c (α)=?2log 1?√1?α

For example,for α=0.1,0.05,and 0.01,the quantiles are 6.53,7.35,and 10.59,respec-tively.If LR 1(γ0)exceeds c (α),then we reject H 0.

Testing for a threshold e?ect is the same as testing for whether the coe?cients are the same in each regime.The null hypothesis and the alternative hypothesis (the linear versus the single-threshold model)are

H 0:β1=β2

H a :β1=β2

The F statistic is constructed as

F 1=

S 0?S 1

σ2

(3)

Under H 0,the threshold γis not identi?ed,and F 1has nonstandard asymptotic

distribution.We use bootstrap on the critical values of the F statistic to test the signi?cance of the threshold e?ect.S 0is the RSS of the linear model.Hansen (1996)suggested the following bootstrap design:

Step 1:Fit the model under H a and obtain the residual e ?it .

Step 2:Make a cluster resampling e ?it with replacement,and obtain the new residual

v ?it .

Step 3:Generate a new series under the H a data-generating process (DGP ),y ?

it =X ?it β+

v ?

it ,where βcan take arbitrary values.

Step 4:Fit the model under H 0and H a ,and compute the F statistic using (3).Step 5:Repeat steps 1–4B times,and the probability of F is Pr =I (F >F 1),namely,

the proportion of F >F 1in bootstrap number B .

2.2Multiple-thresholds model

If there are multiple thresholds (that is,multiple regimes),we ?t the model sequentially.We use a double-threshold model as an example.

y it =μ+X it (q it <γ1)β1+X it (γ1≤q it <γ2)β2+X it (q it ≥γ2)β3+u i +e it Here,γ1and γ2are the thresholds that divide the equation into three regimes with coe?cients β1,β2,and β3.We need to compute this (N ×T )2times using the grid search method,which is infeasible in practice.According to Bai (1997)and Bai and Perron (1998),the sequential estimator is consistent,so we estimate the thresholds as follows:

124Fixed-e?ect panel threshold model using Stata Step1:Fit the single-threshold model to obtain the threshold estimatorγ1and the RSS S1( γ1).

Step2:Given γ1,the second threshold and its con?dence interval are

γr2=arg min

γ2

{S r2(γ2)}

S r2=S{min( γ1,γ2)max( γ1,γ2)}

LR r2(γ2)={S r2(γ2)?S r2( γr2)}

σ222

Step3: γr2is e?cient but γr1is not.We reestimate the?rst threshold as

γr1=arg min

γ1

{S r1(γ1)}

S r1=S{min(γ1, γ2)max(γ1, γ2)}

LR r1(γ1)={S r1(γ1)?S r1( γr1)}

σ221

The threshold-e?ect test is also sequential;that is,if we reject the null hypothesis in a single-threshold model,then we must test the double-threshold model.The null hy-pothesis is a single-threshold model,and the alternative hypothesis is a double-threshold model.The F statistic is constructed as

F2={S1( γ1)?S r2( γr2)}

σ222(4)

The bootstrapping design for this is similar to that in the single-threshold model.In step3,we generate a new series under the H0DGP,y?it=X?itβS+v?it.The estimator βS is the estimator in a single-threshold model,that is,under the H a DGP.We use the

predicted values.

For models with more than two threshold parameters,the process is similar.Chan (1993)and Hansen(1999)show that the dependence of the estimation and inference of βon the threshold estimate is not of?rst-order asymptotic importance,so the inference ofβcan proceed becauseγis given.

Q.Wang125 3The xthreg command

3.1Syntax

xthreg depvar

indepvars

,rx(varlist)qx(varname)

thnum(#)

grid(#)trim(numlist)bs(numlist)thlevel(#)gen(newvarname)noreg nobslog thgiven options

where depvar is the dependent variable and indepvars are the regime-independent vari-ables.

3.2Options

rx(varlist)is the regime-dependent variable.Time-series operators are allowed.rx() is required.

qx(varname)is the threshold variable.Time-series operators are allowed.qx()is required.

thnum(#)is the number of thresholds.In the current version(Stata13),#must be equal to or less than3.The default is thnum(1).

grid(#)is the number of grid points.grid()is used to avoid consuming too much time when computing large samples.The default is grid(300).

trim(numlist)is the trimming proportion to estimate each threshold.The number of trimming proportions must be equal to the number of thresholds speci?ed in thnum().The default is trim(0.01)for all thresholds.For example,to?t a triple-threshold model,you may set trim(0.010.010.05).

bs(numlist)is the number of bootstrap replications.If bs()is not set,xthreg does not use bootstrap for the threshold-e?ect test.

thlevel(#)speci?es the con?dence level,as a percentage,for con?dence intervals of the threshold.The default is thlevel(95).

gen(newvarname)generates a new categorical variable with0,1,2,...for each regime.

The default is gen(cat).

noreg suppresses the display of the regression result.

nobslog suppresses the iteration process of the bootstrap.

thgiven?ts the model based on previous results.

options are any options available for xtreg(see[XT]xtreg).

Time-series operators are allowed in depvar,indepvars,rx(),and qx().

126Fixed-e?ect panel threshold model using Stata 3.3Stored results

xthreg uses xtreg(see[XT]xtreg)to?t the?xed-e?ect panel threshold model given the threshold estimator.Along with the standard stored results from xtreg,xthreg also stores the following results in e():

Scalars

e(thnum)number of thresholds

e(grid)number of grid search points

Macros

e(depvar)name of dependent variable

e(ix)regime-independent variables

e(rx)regime-dependent variables

e(qx)threshold variable

Matrices

e(Thrss)threshold estimator and con?dence interval

e(Fstat)threshold-e?ect test result

e(bs)bootstrap number

e(trim)trimming proportion

e(LR)LR statistics for single-threshold model

e(LR21)LR statistics for?rst threshold in double-threshold model

e(LR22)LR statistics for second threshold in double-threshold model

e(LR3)LR statistics for third threshold in triple-threshold model

The?xed-e?ect panel threshold model requires balanced panel data,which is checked automatically by xthreg.The estimation and test of the threshold e?ect are computed in Mata.

You can?t the model with more thresholds using the previous result.For example, assume you use xthreg to?t a single-threshold model by using thnum(1),but the single threshold is not su?cient to capture the nonlinear e?ect.To now?t a double-threshold model,you can run the xthreg command using thgiven.Stata will search the second threshold using the previous result and will not?t the single-threshold model.I illustrate this in the example below.

4Monte Carlo simulation

xthreg implements the method by Hansen(1999),in which the bootstrap method is used to test the null hypothesis of no threshold.Under the null,the distribution is continuous,so the bootstrap method can be applied.However,there is no formal justi?cation for using the bootstrap test with a multiple-threshold model.Moreover,if bootstrap is used to create con?dence intervals for the threshold model,then there may be a problem.

Enders,Falk,and Siklos(2007)compared the?nite-sample performance of the fol-lowing three methods in TAR:inverting the LR statistic using the asymptotic critical values,using the bootstrapped distribution of the LR to determine the critical values, and using the bootstrap percentile method.They found that none of the three meth-ods performs satisfactorily for the discontinuous-TAR model.All three methods are too conservative,creating con?dence intervals that are too wide.Of the methods,the bootstrapped LR method performs the worst.

Q.Wang127 However,new bootstrap methods for the TAR model have recently been introduced. Gonzalo and Wolf(2005)proposed to use a subsampling method—like that introduced by Politis,Romano,and Wolf(1999)—to improve the?nite-sample performance of the threshold estimator in the self-exciting TAR model.Andrews and Guggenberger(2009) then introduced a hybrid subsampling method and size-corrected methods of construct-ing tests and con?dence intervals that have correct asymptotic size.However,literature concerning the panel threshold model is still very limited,and there is still no theoret-ical development for overcoming wide con?dence intervals.Following Hansen(1999),I invert the LR statistic to construct the con?dence interval of the threshold estimator. Hansen(1999)did not perform Monte Carlo simulations to study the coverage rate for this method.

In this section,I perform simulations to study the size of the threshold-e?ect test and the coverage rate of the threshold estimator.For the real signi?cance level of the threshold-e?ect test,the DGP for the null hypothesis(without the threshold e?ect)is simply a linear regression model,that is,

y it=1+z it+x it+u i+e it

The variables are generated according to z it,x it～χ2(1),u i～χ2(1)?1,and e it～N(0,1),where z is regime independent and x is regime dependent.The alternative hypothesis is that the coe?cient of x is regime dependent on some threshold variable q. We set n=50,100,200,500and T=5,20,50.The bootstrap iteration number is set to300for this single-threshold model.The iteration number of Monte Carlo simulation is set to500.

The simulation is time consuming because bootstrap is used to test the threshold e?ect for each simulation.On a computer with Inter Core i7,2.22GH z processor,and 8GB RAM,the simulation takes about26hours.Table1lists the result.

The test size distortion is small,even for small n and T.However,there is no con-vergence of the bias with increasing sample sizes.As Hansen(1997)noted,“the boot-strap procedure attains the?rst-order asymptotic distribution,so p-values constructed from the bootstrap are asymptotically valid.Because the asymptotic distribution is nonpivotal,bootstrap size will not have an accelerated rate of convergence relative to conventional asymptotic approximations.”

128Fixed-e?ect panel threshold model using Stata Table1.Simulation result of the size of the threshold-e?ect test

n T1%5%10%

5050.0100.0640.118

10050.0240.0620.114

20050.0220.0580.094

50050.0080.0480.106

50200.0120.0500.108

100200.0160.0460.090

200200.0080.0400.098

500200.0140.0580.122

50500.0140.0580.108

100500.0120.0440.114

200500.0100.0480.102

500500.0140.0560.122

The following DGP is used to simulate the coverage rate where q it～χ2(1):

y it=1+z it+x it(q it<1)+2x it(q it≥1)+u i+e it

The iteration number of Monte Carlo simulation is set to500.The result is given in table2.

Table2.Simulation result of the coverage rate

n T RMSE Rate n T RMSE Rate n T RMSE Rate

5050.01030.46050200.000400.80850500.000100.952

10050.00260.636100200.000050.920100500.000050.994

20050.00050.768200200.000030.988200500.000031.000

50050.00010.948500200.000010.998500500.000021.000 Obviously,the root of the mean squared error decreases with larger n or larger T. Figure1is the kernel density plot of the estimators for n=50,100,200,500and T=5, which demonstrates the consistency of the estimator.Because of the consistency of the threshold estimator,the estimators of the regression coe?cient given the threshold are consistent with the ordinary linear?xed-e?ect model.However,as n or T increases, the con?dence interval gets wider and the coverage rate of the threshold estimator gets bigger than the nominal level.This means that the con?dence interval gets too wide for large n and T,because the inverse LR method is not a very e?cient method for constructing con?dence intervals.As previously discussed,this problem is not speci?c to the panel threshold model;it is common to almost all threshold models.In sum,the inverse LR method does not perform that well for small sample sizes,and more research is needed to improve the coverage rate for the panel threshold model.

130Fixed-e?ect panel threshold model using Stata .use hansen1999,clear

(The Value and Performance of U.S.Corporations(B.H.Hall&R.E.Hall,1993))

.xthreg i q1q2q3d1qd1,rx(c1)qx(d1)thnum(1)grid(400)trim(0.01)bs(300)

Estimating the threshold parameters:1st......Done

Boostrap for single threshold

..................................................+50

..................................................+100

..................................................+150

..................................................+200

..................................................+250

..................................................+300

Threshold estimator(level=95):

model Threshold Lower Upper

Th-10.01540.01410.0167

Threshold effect test(bootstrap=300):

Threshold RSS MSE Fstat Prob Crit10Crit5Crit1 Single17.78180.002335.200.003311.974914.025922.8402 Fixed-effects(within)regression Number of obs=7910

Group variable:id Number of groups=565

R-sq:within=0.0951Obs per group:min=14 between=0.0692avg=14.0

overall=0.0660max=14

F(7,7338)=110.21 corr(u_i,Xb)=-0.3972Prob>F=0.0000

i Coef.Std.Err.t P>|t|[95%Conf.Interval]

q1.0105555.000891711.840.000.0088075.0123035

q2-.0202872.0025602-7.920.000-.025306-.0152683

q3.0010785.0001952 5.530.000.0006959.0014612

d1-.0229482.0042381-5.410.000-.031256-.0146403

qd1.0007392.00142780.520.605-.0020597.0035381 _cat#c.c1

0.0552454.005334310.360.000.0447885.0657022

1.0862498.005202216.580.000.07605

2.0964476

_cons.0628165.001695737.050.000.0594925.0661405

sigma_u.03980548

sigma_e.04922656

rho.39535508(fraction of variance due to u_i)

F test that all u_i=0:F(564,7338)= 6.90Prob>F=0.0000

Q.Wang131 The output consists of four parts.The?rst part outputs the estimation and boot-strap results.The second part outputs the threshold estimators and their con?dence intervals.Th-1denotes the estimator in single-threshold models.In the threshold estimator table,Th-21and Th-22denote the two estimators in a double-threshold model.Sometimes,Th-21is the same as Th-1.The third part lists the threshold-e?ect test,including the RSS,the mean squared error(MSE),the F statistic(Fstat), the probability value of the F statistic(Prob),and critical values at10%,5%,and1% signi?cance levels(Crit10,Crit5,and Crit1,respectively).The fourth part outputs the?xed-e?ect regression.

In this example,the single-threshold model’s estimator is0.0154with95%con?dence interval[0.0141,0.0167].The F statistic is highly signi?cant.Therefore,we reject the linear model and?t a double-or triple-threshold model.

Next,we directly?t a triple-threshold model based on the result above.The trim-ming values are set to be0.01and0.05for the estimation of the second and third thresholds.Note that the trimming proportion(0.01)for the single-threshold model still needs to be set because xthreg searches the second threshold using the trimmed series in the single-threshold model.We set the bootstrap number to300for the double-and triple-threshold models;however,we set this to0for the single-threshold model because there is no need to use bootstrap for it again(the full option is bs(0300300)). We suppress the output of bootstrap replications and the?xed-e?ect regression.

.xthreg i q1q2q3d1qd1,rx(c1)qx(d1)thnum(3)grid(400)trim(0.010.010.05

>)bs(0300300)thgiven nobslog noreg

Estimating the threshold parameters:2nd......3rd......Done

Boostrapping for threshold effect test:2nd......3rd......Done

Threshold estimator(level=95):

model Threshold Lower Upper

Th-10.01540.01410.0167

Th-210.01540.01410.0167

Th-220.54180.52680.5473

Th-30.47780.47550.4823

Threshold effect test(bootstrap=0300300):

Threshold RSS MSE Fstat Prob Crit10Crit5Crit1 Single17.78180.002335.200.003311.974914.025922.8402

Double17.72580.002224.970.013310.890112.840927.6450

Triple17.71190.0022 6.200.593316.800621.874031.6187 Note that xthreg gets a similar but not identical F statistic because of the ran-domness of bootstrap sampling.Of course,this does not a?ect the conclusion.In the threshold-e?ect test table,Single corresponds to H0(linear model)and H a(single-threshold model),Double corresponds to H0(single-threshold model)and H a(double-threshold model),and so forth.Obviously,the double-threshold model is accepted with probability value0.59.

Q.Wang133 estimator is consistent but the con?dence interval gets wider and the coverage rate of the threshold estimator gets bigger than the nominal level as n or T increases.

7Acknowledgments

I thank H.Joseph Newton(the editor)for providing advice and encouragement.Pro-fessional and valuable comments from Zongwu Cai,Bruce E.Hansen,Ted Juhl,and an anonymous referee substantially helped the revision of this article.The article is ?nanced by the Natural Science Foundation of China(Grant Number71101075).

8References

Andrews,D.W.K.,and P.Guggenberger.2009.Hybrid and size-corrected subsampling methods.Econometrica77:721–762.

Bai,J.1997.Estimating multiple breaks one at a time.Econometric Theory13:315–352.

Bai,J.,and P.Perron.1998.Estimating and testing linear models with multiple struc-tural changes.Econometrica66:47–78.

Chan,K.S.1993.Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model.Annals of Statistics21:520–533.

Enders,W.,B.L.Falk,and P.Siklos.2007.A threshold model of real U.S.GDP and the problem of constructing con?dence intervals in TAR models.Studies in Nonlinear Dynamics and Econometrics11:1–28.

Girma,S.2005.Absorptive capacity and productivity spillovers from FDI:A threshold regression analysis export.Oxford Bulletin of Economics and Statistics67:281–306. Gonzalo,J.,and M.Wolf.2005.Subsampling inference in threshold autoregressive models.Journal of Econometrics127:201–224.

Hansen,B.E.1996.Inference when a nuisance parameter is not identi?ed under the null hypothesis.Econometrica64:413–430.

.1997.Inference in TAR models.Studies in Nonlinear Dynamics and Economet-rics2:1–16.

.1999.Threshold e?ects in non-dynamic panels:Estimation,testing,and infer-ence.Journal of Econometrics93:345–368.

Hsiao,C.2003.Analysis of Panel Data.2nd ed.Cambridge:Cambridge University Press.

Politis,D.N.,J.P.Romano,and M.Wolf.1999.Subsampling.New York:Springer.

134Fixed-e?ect panel threshold model using Stata Rousseau,P.L.,and P.Wachtel.2009.What is happening to the impact of?nancial deepening on economic growth?Vanderbilt University Department of Economics Working Papers No.0915,Vanderbilt University Department of Economics.

https://www.doczj.com/doc/2811989624.html,/p/van/wpaper/0915.html.

Tong,H.1983.Threshold models in non-linear time series analysis.In Lecture Notes in Statistics,21,ed.D.Brillinger,S.Fienberg,J.Gani,J.Hartigan,and K.Krickeberg. Berlin:Springer.

About the author

Qunyong Wang received his PhD from Nankai University.He works at the Institute of Statistics and Econometrics at Nankai University.He is currently a visiting professor at the University of Kansas under the?nancial support of the China Scholarship Council.

The Stata Journal(2015)

15,Number1,pp.135–154

Frailty models and frailty-mixture models for

recurrent event times

Ying Xu

Center for Quantitative Medicine Duke–NUS Graduate Medical School

Singapore

and Department of Biostatistics Singapore Clinical Research Institute

Singapore

tina.xuying@https://www.doczj.com/doc/2811989624.html,.sg

Yin Bun Cheung

Center for Quantitative Medicine Duke–NUS Graduate Medical School

Singapore

and Department of International Health University of Tampere

Finland

yinbun.cheung@https://www.doczj.com/doc/2811989624.html,.sg

Abstract.The analysis of recurrent event times faces three challenges:between-

subject heterogeneity(frailty),within-subject event dependence,and the possibil-

ity of a cured fraction.Frailty can be handled by including a latent random-e?ects

term in a Cox-type model.Event dependence may be considered as contributing to

the intervention e?ect,or it may be considered as a source of nuisance,depending

on the analysts’speci?c research questions.If it is seen as a nuisance,the analysis

can stratify the recurrent event times according to event order.If it is seen as

contributing to the intervention e?ect,strati?cation should not be used.Models

with and without strati?cation for event order estimate two types of treatment

e?ects.They are analogous to per-protocol analysis and intention-to-treat analy-

sis,respectively.In the context of chronic disease treatment,we want to estimate

whether there is a cured fraction;for infectious disease prevention,this is called a

nonsusceptible fraction.In infectious disease prevention,we want to understand

whether an intervention protects each of its recipients to some extent(“leaky”

model)or whether it totally protects some recipients but o?ers no protection to

the rest(“all-or-none”model).The truth may be a mixture of the two modes of

protection.We describe a class of regression models that can handle all three is-

sues in the analysis of recurrent event times.The model parameters are estimated

by the expectation-maximization algorithm,and their variances are estimated by

Louis’s formula.We provide a new command,strmcure,for implementing these

models.

Keywords:st0374,strmcure,frailty models,frailty-mixture models,recurrent

event times,event dependence,cured fraction

1Introduction

Recurrent event times are common in biomedical and social studies.Some examples include times to respiratory symptom exacerbations,hospital readmissions,and malaria disease https://www.doczj.com/doc/2811989624.html,pared with the analysis of time-to-?rst or only event,the analysis of recurrent event times o?ers some advantages.For example,in analysis of time-to-?rst event,the unobserved heterogeneity(or frailty)impacts on the individual follow-up time(or person-time):subjects who are more frail will experience their?rst events and c 2015StataCorp LP st0374

STATA面板数据模型操作命令要点

STATA 面板数据模型估计命令一览表一、静态面板数据的STATA 处理命令 εαβit ++=x y it i it 固定效应模型 μβit +=x y it it ε αμit +=it it 随机效应模型（一）数据处理输入数据 ●tsset code year 该命令是将数据定义为“面板”形式 ●xtdes 该命令是了解面板数据结构 ●summarize sq cpi unem g se5 ln 各变量的描述性统计（统计分析） ●gen lag_y=L.y /////// 产生一个滞后一期的新变量

gen F_y=F.y /////// 产生一个超前项的新变量 gen D_y=D.y /////// 产生一个一阶差分的新变量 gen D2_y=D2.y /////// 产生一个二阶差分的新变量（二）模型的筛选和检验 ●1、检验个体效应（混合效应还是固定效应）（原假设：使用OLS混合模型）●xtreg sq cpi unem g se5 ln,fe 对于固定效应模型而言，回归结果中最后一行汇报的F统计量便在于检验所有的个体效应整体上显著。在我们这个例子中发现F统计量的概率为0.0000，检验结果表明固定效应模型优于混合OLS模型。 ●2、检验时间效应（混合效应还是随机效应）（检验方法：LM统计量）（原假设：使用OLS混合模型） ●qui xtreg sq cpi unem g se5 ln,re (加上“qui”之后第一幅图将不会呈现) xttest0

可以看出，LM检验得到的P值为0.0000，表明随机效应非常显著。可见，随机效应模型也优于混合OLS模型。 ●3、检验固定效应模型or随机效应模型（检验方法：Hausman检验）原假设：使用随机效应模型（个体效应与解释变量无关）通过上面分析，可以发现当模型加入了个体效应的时候，将显著优于截距项为常数假设条件下的混合OLS模型。但是无法明确区分FE or RE的优劣，这需要进行接下来的检验，如下： Step1：估计固定效应模型，存储估计结果 Step2：估计随机效应模型，存储估计结果 Step3：进行Hausman检验 ●qui xtreg sq cpi unem g se5 ln,fe est store fe qui xtreg sq cpi unem g se5 ln,re est store re hausman fe (或者更优的是hausman fe,sigmamore/ sigmaless) 可以看出，hausman检验的P值为0.0000，拒绝了原假设，认为随机效应模型的基本假设得不到满足。此时，需要采用工具变量法和是使用固定效应模型。

5分钟速学stata面板数据回归(初学者超实用!)

5分钟速学stata面板数据回归（超实用！）第一步：编辑数据。面板数据的回归，比如该回归模型为：Y it=β0+β1X1it+β2X2it+β3X3it+εt，在stata中进行回归，需要先将各个变量的数据逐个编辑好，该模型中共有Y X1 X2 X3三个变量，那么先从Y的数据开始编辑，将变量Y的面板数据编辑到stata软件中，较方便的做法是，将excel的数据直接复制到stata软件的数据编辑框中，而excel中的数据需要如下图编辑：从数据的第二行开始选中20个样本数据，如图：

直接复制粘贴至stata中的data editor中，如图: 第二步：格式调整。首先，请将代表样本的var1Y变量数据是选20个省份5年的数据为样本，那么口令为rename var1 province 。例如：本例中的Y变量数据编辑接下来需要输入口令为reshape long var,i(province) 其中，var代表的是所有的年份（var2,var3,var4,var5,var6），转化后格式如图：转化成功后，继续重命名，其中_j这里代表原始表中的年份，var代表该变量的名称

例如，我们编辑的是Y变量的数据，所以口令3和口令4的输入如下：口令3：rename _j year 口令4：rename var taxi （注：taxi就是Y变量，我们用taxi表示Y）命名完，数据编辑框如下图所示。第三步：排序。例如，本例中的Y变量（taxi），是20个省份和5年的面板数据，那么口令4为sort province year （虽意思是将province按升序排列，然后再根据排好的province数列排year这一列升序排列。然很多时候在执行sort之前，数据已经符合排序要求了，但为以防万一，请务必执行此操作）第三步：保存。

(完整版)Stata门限模型的操作和结果详细解读

一、门限面板模型概览如果你不愿意看下面一堆堆的文字，更不想看计量模型的估计和检验原理，那就去《数量经济技术经济研究》上，找一篇标题带有“双门槛（或者双门限）”的文章，浏览一遍，看看文章计量部分列示的统计量和检验结果。这样，在软件操作时，你就知道每一步得到的结果有什么意义，怎么解释了，起码心里会有点印象。一般情况下，一个研究生花费在研究上的时间越多，他的成果越丰富，也就是说，研究成果和研究时间存在某种正向关联。但是，这种关联是线性的吗？在最初阶段，他可能看了两三年的文献，也没有写出一篇优秀的文章，但是一旦过了这个基础期，他的能量和成果将如火山爆发一样喷涌出来，此时，他投入少量的时间，就能产出大量优质文章。再过几年，他可能会进入另外一种境界，虽然比以前有了极大提高，但是研究进入新的瓶颈期，文章发表的数量减少。由此可以看出，研究成果与研究年限存在一种阶段性的线性关系。这个基础期的结点、瓶颈期的起点就像“门槛”一样把研究阶段分成三个部分，在不同部分，成果和时间的线性关系都不同。这个效应被称为门槛效应或门限效应。门限效应，是指当一个经济参数达到特定的数值后，引起另外一个经济参数发生突然转向其它发展形式的现象。作为原因现象的临界值称为门限值。在上面的例子中，成果和时间存在非线性关系，但是在每个阶段是线性关系。有些人将这样的模型称为门槛模型，或者门限模型。如果模型的研究对象包含多个个体多个年度，那么就是门限面板模型。汉森（Bruce E. Hansen）在门限回归模型上做出了很多贡献。了解门限模型最好的办法，首先就要阅读他的文章。他的文章很有特点：条理很清晰，推导过程详细，语言简练，语法不复杂。有关他的论文、程序、数据可以参考Hansen的个人网站： https://www.doczj.com/doc/2811989624.html,/~bhansen/progs/progs_subject.htm。 Hansen于1996年在《Econometrica》上发表文章《Inference when a nuisance parameter is not identified under the null hypothesis》，提出了时间序列门限自回归模型（TAR）的估计和检验。之后，他在门限模型上连续追踪，发表了几篇经典文章，尤其是1999年的《Threshold effects in non-dynamic panels: Estimation, testing and inference》，2000年的《Sample splitting and threshold estimation》和2004年与他人合作的《Instrumental Variable Estimation of a Threshold Model》。在这些文章中，Hansen介绍了包含个体固定效应的静态平衡面板数据门限回归模型，阐述了计量分析方法。方法方面，首先要通过减去时间均值方程，消除个体固定效应，然后再利用OLS（最小二乘法）进行系数估计。如果样本数量有限，那么可以使用自举法（Bootstrap）重复抽取样本，提高门限效应的显著性检验效率。在Hansen（1999）的模型中，解释变量中不能包含内生解释变量，无法扩展应用领域。Caner和Hansen在2004年解决了这个问题。他们研究了带有内生变量和一个外生门限变量的面板门限模型。与静态面板数据门限回归模型有所不同，在含有内生解释变量的面板数据门限回归模型中，需要利用简化型对内生变量进行一定的处理，然后用2SLS（两阶段最小二乘法）或者GMM（广义矩估计）对参数进行估计。当然，有关门限回归模型的最新研究，还可以参考《Inflation and Growth: New Evidence From a Dynamic Panel Threshold Analysis》（Stephanie Kremer，Alexander Bick，Dieter Nautz，2009）。二、计量模型的假设、估计和检验略

stata处理面板数据及修正命令集合

步骤一：导入数据原始表如下，数据请以时间（1998，1999，2000，2001??）为横轴，样本名（北京，天津，河北??）为纵轴将中文地名替换为数字。注意：表中不能有中文字符，否则会出现错误。面板数据中不能有空值。去除年份的一行，将其余部分复制到stata的data editor中，或保存为csv格式。打开stata，调用数据。方法一：直接复制到data editor中。方法二：使用口令：insheet using??文件路径调用例如：insheet using? C:\STUDY\paper\taxi.csv 其中csv格式可用excel的“另存为”导出步骤二：调整格式首先请将代表样本的var1重命名口令：rename var1?样本名例如：rename var1 province ?也可直接在var1处双击，在弹出的窗口中修改: 接下来将数据转化为面板数据的格式口令：reshape long var, i(样本名) 例如：reshape long var, i(province) 其中var代表的是所有的年份（var2,var3,var4??）转化成功后继续重命名，其中_j 这里代表原始表中的年份，var代表该变量的名称口令例如： rename _j year rename var taxi

也可直接在需要修改的名称处双击，在弹出的窗口中修改步骤三：排序口令：sort?变量名例如：sort province year 意思为将province按升序排列，然后再根据排好的province数列排year这一列最后，保存。至此，一个变量的前期数据处理就完成了，请如法炮制的处理所有的变量，也就是说每个变量都做一个dta文件。在处理新变量前请使用口令：clear 将stata重置步骤四：合并数据任意打开一个处理过的变量的dta文件作为基础表（推荐使用因变量的dta文件，这里使用so2作为因变量）口令：?merge?样本名时间?using?文件路径例如：merge province year using C:\STUDY\paper\taxi.dta ?意思是将taxi的数据添加到so2的数据表中然后使用口令：tab _merge 然后使用口令：drop _merge 将数据表中的_merge一列去掉，接着重新使用口令：sort?样本名时间例如：sort province year 为新生成的表排序。如法炮制，将所有的变量都添加到基础表中，

Stata面板数据分析

5分钟搞定Stata面板数据分析简易教程步骤一：导入数据原始表如下，数据请以时间（1998，1999，2000，2001??）为横轴，样本名（北京，天津，河北??）为纵轴将中文地名替换为数字。

注意：表中不能有中文字符，否则会出现错误。面板数据中不能有空值。去除年份的一行，将其余部分复制到stata的data editor中，或保存为csv格式。

打开stata，调用数据。方法一：直接复制到data editor中。方法二：使用口令：insheet using 文件路径调用例如：insheet using C:\STUDY\paper\taxi.csv 其中csv格式可用excel的“另存为”导出如图：

步骤二：调整格式首先请将代表样本的var1重命名口令：rename var1 样本名例如：rename var1 province 也可直接在var1处双击，在弹出的窗口中修改:

接下来将数据转化为面板数据的格式口令：reshape long var, i(样本名) 例如：reshape long var, i(province) 其中var代表的是所有的年份（var2,var3,var4??）转化后的格式如图：转化成功后继续重命名，其中_j 这里代表原始表中的年份，var代表该变量的名称口令例如： rename _j year rename var taxi 也可直接在需要修改的名称处双击，在弹出的窗口中修改如图：

步骤三：排序口令：sort 变量名例如：sort province year 意思为将province按升序排列，然后再根据排好的province数列排year这一列如图：

Stata面板回归操作过程、基本指令及概要

Stata面板回归操作过程、基本指令及概要在使用Stata过程中，录入面板数据后，一般需要对初始数据进行识别，因此需要首先进行面板数据的识别，其指令为： 1.面板数据识别指令： tsset region year 案例： ②部分初始数据录入数据操作为：

②将上述初始数据录入stata后（注意：录入数据及首行只能是英文字母或者数字，不能有汉字），显示如下： ③输入指令tsset region year，显示如下结果 . tsset region year panel variable: region (strongly balanced) time variable: year, 2005 to 2014 delta: 1 unit 2.面板数据固定效应回归指令： xtregy erseqs x1 x2 x3 x4 x5,fe 案例：录入数据，并进行面板数据识别之后，输入以上指令： xtregy erseqs x1 x2 x3 x4 x5,fe 其中，xtreg为面板回归指令，y为选取的因变量，ers、eqs、x1、x2、x3、x4、x5为自变量，末尾加fe表示为固定效应，如果末尾加re则是随机效应。上述回归结果显示如下：

3.面板数据随机效应回归指令： xtregy erseqs x1 x2 x3 x4 x5,re 4.hausman 检验指令： Hausman检验是固定效应或者随机效应回归之后，需要加入的一个检验，具体指令如下： quixtregy erseqs x1 x2 x3 x4 x5,fe est store fe quixtregy erseqs x1 x2 x3 x4 x5,fe est store re hausmanfe re 5.门限回归指令

5分钟搞定Stata面板数据分析

【原创】5分钟搞定Stata面板数据分析简易教程ver2.0作者：张达 5分钟搞定Stata面板数据分析简易教程步骤一：导入数据原始表如下，数据请以时间（1998，1999，2000，2001??）为横轴，样本名（北京，天津，河北??）为纵轴将中文地名替换为数字。

注意：表中不能有中文字符，否则会出现错误。面板数据中不能有空值。去除年份的一行，将其余部分复制到stata的data editor中，或保存为csv格式。

步骤二：调整格式首先请将代表样本的var1重命名口令：rename var1 样本名例如：rename var1 province 也可直接在var1处双击，在弹出的窗口中修改:

STATA面板数据模型操作命令

S T A T A 面板数据模型估计命令一览表一、静态面板数据的STATA 处理命令 εαβit ++=x y it i it 固定效应模型 εαμit +=it it 随机效应模型（一）数据处理输入数据 ●tsset code year 该命令是将数据定义为“面板”形式 ●xtdes 该命令是了解面板数据结构 ●summarize sq cpi unem g se5 ln 各变量的描述性统计（统计分析） ●gen lag_y=L.y /////// 产生一个滞后一期的新变量 gen F_y=F.y /////// 产生一个超前项的新变量 gen D_y=D.y /////// 产生一个一阶差分的新变量 gen D2_y=D2.y /////// 产生一个二阶差分的新变量（二）模型的筛选和检验 ●1、检验个体效应（混合效应还是固定效应）（原假设：使用OLS 混合模型） ●xtreg sq cpi unem g se5 ln,fe 对于固定效应模型而言，回归结果中最后一行汇报的F 统计量便在于检验所有的个体效应整体上显着。在我们这个例子中发现F 统计量的概率为0.0000，检验结果表明固定效应模型优于混合OLS 模型。 ●2、检验时间效应（混合效应还是随机效应）（检验方法：LM 统计量）（原假设：使用OLS 混合模型） ●qui xtreg sq cpi unem g se5 ln,re (加上“qui ”之后第一幅图将不会呈现) xttest0 可以看出，LM 检验得到的P 值为0.0000，表明随机效应非常显着。可见，随机效应

模型也优于混合OLS模型。 ●3、检验固定效应模型or随机效应模型（检验方法：Hausman检验）原假设：使用随机效应模型（个体效应与解释变量无关）通过上面分析，可以发现当模型加入了个体效应的时候，将显着优于截距项为常数假设条件下的混合OLS模型。但是无法明确区分FE or RE的优劣，这需要进行接下来的检验，如下： Step1：估计固定效应模型，存储估计结果 Step2：估计随机效应模型，存储估计结果 Step3：进行Hausman检验 ●qui xtreg sq cpi unem g se5 ln,fe est store fe qui xtreg sq cpi unem g se5 ln,re est store re hausman fe (或者更优的是hausman fe,sigmamore/ sigmaless) 可以看出，hausman检验的P值为0.0000，拒绝了原假设，认为随机效应模型的基本假设得不到满足。此时，需要采用工具变量法和是使用固定效应模型。（三）静态面板数据模型估计 ●1、固定效应模型估计 ●xtreg sq cpi unem g se5 ln,fe (如下图所示) 其中选项fe表明我们采用的是固定效应模型，表头部分的前两行呈现了模型的估计方法、界面变量的名称（id）、以及估计中使用的样本数目和个体的数目。第3行到第5行列示了模型的拟合优度、分为组内、组间和样本总体三个层面，通常情况下，关注的是组内（within），第6行和第7行分别列示了针对模型中所有非常数变量执行联合检验得到的F统计量和相应的P值，可以看出，参数整体上相当显着。需要注意的是，表中最后一行列示了检验固定效应是否显着的F统计量和相应的P值。显然，本例中固定效应非常显着。 ●2、随机效应模型估计