Los datos fiscales de la Muestra Continua de Vidas Laborales: algunas ideas para su explotación
The main objective of this study was to present the possibilities of in-depth mining of the information contained in the CSWL «tax file» and the personal details of Social Security contributors for the period 2004-2009. Using data from the tax module has advantages and disadvantages. Its advantages over other statistical sources are the following. First, a basic aspect is the availability of data on the income for individuals that can be linked between several waves (longitudinal data) and personal information (personal files) and work information (contribution files) with regard to different job categories according to their types of income: salaried workers, pensioners, self-employed and recipients of unemployment benefits. This income information is not available in the LFS (although it has provided wage distribution data expressed in deciles based on Form 190 since 2010) and, although the Personal Income Tax Filers Panel contains tax data, it does not contain detailed labour variables.
Second, the list of recipients produced by payers includes everyone receiving income subject to income tax, regardless of their obligation to file a declaration or otherwise, even when the remunerations are below the statutory minimum levels for exemption, are payments with a zero rate of tax withholding or are exempt income. This information is not available in the Personal Income Tax Filers Panel, which only contains tax information for individuals who are obliged to file a declaration.
Third, the information in the CSWL is an accurate reflection of the information supplied by other sources. For example, the data from the sample are comparable to those provided by the Labour Cost Survey for salaries, and the labour statistics published by the Public Employment Service for the amount recognized for recipients of unemployment benefit.
The main disadvantages of the database are as follows. First, the major effort required of the researcher when reading the files in statistical packages such as SPSS, SAS and/or STATA in order to handle a database with millions of records. For example, there are more than 2 million records of payers in the 2009 tax module, with an average of almost two payers per person, and some individuals have more than 1,800 records.
Second, there are groups that are not included in the CSWL tax data: these are the inactive population that has never worked, and workers who have a social welfare provision other than the Social Security system (civil servants receiving pensions) or those with none. These are some of the differences with the LFS, which has information on the inactive population and civil servants. There is another group, the unemployed not receiving benefits, for which information is available in the CSWL based on data for registrations and cancellations in employment and in the unemployment compensation system.
Third, and related to the above, the CSWL also has no tax information for either residents of Navarre and the Basque Country (although it does for those working outside those regions) or workers under the Special Home Regime and self-employed workers in any Social Security regime (with some exceptions).
The procedure followed to link the tax module, personal and contributor files involved several steps (illustrated in the article with detailed statistical descriptions and proposals for further analysis that can be undertaken) that have led to some relevant recommendations for their mining.
In short, correct treatment of the data contained in the CSWL tax module (and its link with the other files from the sample) shows its usefulness for carrying out interesting analysis, such as the following: changes in income distribution, with distinctions by type of payment and by group, wage dynamics and their determinants, the existence of wage gains or losses generated by labour mobility and passing through the unemployment compensation system, and the differences in the amount of unemployment benefits between individuals and their possible influence on the process of exiting unemployment.
REFERENCIA COMPLETA:
Arranz, J.M., García-Serrano, C. (2011), «Los datos fiscales de la Muestra Continua de Vidas Laborales: algunas ideas para su explotación», Hacienda Pública Española, 199(4), 151-186.
http://www.ief.es/documentos/recursos/publicaciones/revistas/hac_pub/199_Art5.pdf