by Uwe Warner, Thomas Tamisier and Fernand Feltz
Now online and regularly accessed, the Sedo homepage is the entry point to a mine of useful information about citizens' lives in Luxembourg. Underneath, the site is powered by a customisable platform for maintaining huge banks of data. Ergonomic search methods quickly deliver mains trends or detailed figures. Advanced users and people without statistical training can perform their own calculations with the guarantee of clear and accurate results.
The use of the Net for querying socio-economic databases has recently caught the attention of both experts and the public at large, with a number of online services provided by statistical organisations of various countries, such as those in Europe that belong to the CESSDA. Not only does the networked publication of statistics allow instantaneous update and correction, but it is also suited to building a relationship between the data provider and the data consumer. The consumer can deliver his feedback and formalise his requirements and the provider has the means to control the use of the data, so that the figures are better and more properly used. Other advantages are easing the retrieval of large data sets, allowing them to be customised, and enabling end-users to execute statistical functions for processing the data on the server.
The CEPS/INSTEAD is a public research institute for statistics and socio-economic studies dedicated to population, poverty and socio-economic policy. Through the Sedo (Socio-Economic Database Online) platform, set up in co-operation with the Centre de Recherche Public - Gabriel Lippmann, it offers a generalised and convenient access to the voluminous database on the national population. The information, collected through socio-economic household panels, consists of longitudinal data, which means that the same questions are periodically checked with the same respondents. Relevant domains are: demography, education, post-natal child-care, employment, equal opportunities, resources, housing, equipment and consumption and opinions about the monetary situation of the household and its evolution. Such exhaustive and disparate information is difficult to exploit due to the complexity and volume of the data. Up to now, only researchers with a solid background in longitudinal analysis could effectively benefit from this work, whereas it is relevant to all kinds of professional activities outside the area of the statistician experts. In addition, thanks to its precision and its constant updating, the socio-economic picture drawn by the combination of these miscellaneous themes is apt to significantly ease communication between the population and public servants and to support the decisions made by the political authorities.
Since autumn 2005, Sedo has been available online, featuring three different types of user access. A monitored free access for everybody to statistical tables and graphs on the themes of the study. A controlled access on demand using a user login and password that allows registered users to explore the banks of data without having to install statistical software on their own computers. An access of registered users, making a confidentiality pledge who are trained in statistical analysis to maintain individual and sophisticated statistics and research on the micro data. Finally, an Internet based discussion forum allows all users to exchange their experience and allows communication with those conducting the surveys and the research community. Compared to other information tools in the community, the Sedo project has great flexibility, a universal portability and the possibility of adapting to different databases. The most obvious benefit of the system is being able to obtain customised presentations of statistics with guaranteed accuracy and without the need for specialised statistical tools or an advanced expertise in statistical techniques.
For the sake of the reusability, it has been preferred to build an integrated platform rather than developing features on top of existing tools. This approach gives room for the integration of powerful statistical tools and lets the user perform online calculations according to the data. With every single variable or result are associated information types such as the statistical measurement (metric, ordinal, nominal), the population concerned (individuals, households), or the expected results for the variable. Based on these types, the system defines the statistical analysis that can be performed in order to ensure the coherence and soundness of the operations delivered to the user. To guarantee the coherence of the results and to ensure that they are correct with respect to the interpretation criteria of the user, Sedo also features an automatic calculation of the actual sampling size after a sequence of statistical operations.
From the programmer's point of view, full latitude is allowed for the integration of thesauruses and the customisation of the search-engine. The system is also characterised by a total freedom in the choice and the management of the databases.
With Sedo, the database server can be distinct from the server where the client requests are processed and the statistical operations are performed. Thanks to this independence of data storage and data processing, Sedo is portable to any kind of database. This separation is also useful for protecting the confidentiality of the data. In this case we must consider the different kind of users who will not be entitled to the same access to the data. It also minimises the volume of data used during the calculations, because only the data relevant to the requests is extracted and passed on to the processing phase.
Uwe Warner, Centre d'Etude de Population, de Pauvreté et de Politiques
Tel: +352 58 58 55 554
Thomas Tamisier, Centre de Recherche Public - Gabriel Lippmann, Luxembourg
Tel: +352 47 02 61 622.