Together with Andreas Reinstaller, Werner Hölzl and Christian Schmid I have been contributing to the report on the product space of european countries
Dienstag, 22. Januar 2013
Montag, 29. Oktober 2012
XSLT script to transform xml tree to table for import into sql database
PARENT_ID, CHILD_ID, CHILD Text
I need this to get it into a postgresql database.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template name="linebreak">
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="map">
<xsl:apply-templates select="child::node"/>
</xsl:template>
<xsl:template match="node">
<xsl:param name="commaCount">0</xsl:param>
<xsl:if test="$commaCount > 0">
<xsl:call-template name="writeCommas">
<xsl:with-param name="commaCount" select="$commaCount"/>
</xsl:call-template>
</xsl:if>
<xsl:value-of select="../@ID"/>,<xsl:value-of select="@ID"/>,<xsl:value-of select="@TEXT"/>
<xsl:call-template name="linebreak"/>
<xsl:apply-templates select="child::node">
<xsl:with-param name="commaCount" select="$commaCount + 1"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template name="writeCommas">
<xsl:param name="commaCount">0</xsl:param>
<xsl:if test="$commaCount > 0"> <xsl:call-template name="writeCommas">
<xsl:with-param name="commaCount" select="$commaCount - 1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Dienstag, 23. Oktober 2012
household savings is not household savings
http://epp.eurostat.ec.europa.eu/tgm/table.do?tab=table&init=1&plugin=1&language=en&pcode=tsdec240
is the household savings rate calculated by eurostat which is defined as:
“(ESA95 code: B8G) divided by gross disposable income (B6G), with the latter being adjusted for the change in the net equity of households in pension funds reserves (D8net)”
I have documented the differences here:http://rpubs.com/johanneskutsam/savingsrate
Texreg and discourse analysis
Phillip Leifeld has developed two useful tools: texreg – a tool to print latex tables from different types regressions in R and a software for discourse analysis that looks useful for my political science amigos.
Mittwoch, 10. Oktober 2012
multiple regressions in one table
You often want to compare coefficients between different models. mtable from the memisc package does exactly that. As its author (Martin Elff) just pointed out to me it is possible to use mtable of the number of regression results is variable. See the following example:
library(memisc)
x=rnorm(10)
y=rnorm(10)
z=rnorm(10)
l1=lm(x~z)
l2=lm(z~y)
li=list(l1,l2)
attributes(li)=list(names=list(1,2))
do.call(mtable,li)
In a future version of mtable Martin said there will be a functionality to include lists directly
mtable(list=list("1"=l1,"2"=l2))
Thanks Maritn for the great tool!
Montag, 8. Oktober 2012
Eurostat in R in GIT
I have published the source for the free read.eurostat() in R at https://github.com/toprach/eurostat_r.git
Users of STATA can buy my stata version from here: https://sites.google.com/site/eustatuse/
Update: I have just found out that the rdatamarket library can also download data from EUROSTAT so you may have a look at it too.
install.packages(“rdatamarket”)
library(rdatamarket)
dmlist(http://data.is/PllzbZ)
My impression is that rdatamarket is a little slower than my implementation – but rdatamarket gives access to UN, Eurostat, Worldbank, FAO, and other providers – via a single Interface. I guess I will use my library for EUROSTAT and rdatamarket for the other sources.
There is also a library WDI to access Worldbank data directly from R.
Samstag, 6. Oktober 2012
Downloading datasets from EUROSTAT with R
read.eurostat(datasetname, LANGUAGE=”en”)
Is a function I have written to facilitate the downloading and creation of data.table objects from eurostat data.
library is used to download data from eurostat and to find datasets
#later extend to extend to find datasets with certain dimensions
#download data from eurostat
#unpack and convert to dataframe
#load label descriptions
#load factors
#save as r data object
datasetname
#this="jvs_a_nace2"
LANGUAGE="en"
read.eurostat=function(datasetname,LANGUAGE="en",save=FALSE){
library(RCurl)
library(data.table)
library(reshape)
library(stringr)
baseurl="http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=data%2F"
fullfilename=paste(datasetname,".tsv.gz",sep="")
temp <- paste(tempfile(),".gz",sep="")
download.file(paste(baseurl,fullfilename,sep=""),temp)
dataconnection <- gzfile(temp)
d=read.delim(dataconnection)
longdata=melt(d,id=colnames(d)[1])
firstname=colnames(d)[1] # remove .time and count how many headings are there
firstname=substr(firstname,1,nchar(firstname)-nchar(".time"))
headings=toupper(strsplit(firstname,".",fixed=TRUE)[[1]])
headingcount=length(headings)
colnames(longdata)=c("dimensions","time","value")
#get the data on the dimension tables
df=data.frame(dimensions=as.character(longdata[,"dimensions"]))
df = transform(df, dimensions= colsplit(dimensions, split = "\\,",names=headings))
dimensions=data.table(df$dimensions)
#download the dimension labels - save headings as better variable
dimfile=paste("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=dic%2F",LANGUAGE,"%2Fdimlst.dic",sep="")
temp <- paste(tempfile(),".gz",sep="")
download.file(dimfile,temp)
dataconnection <- gzfile(temp)
dimdata=read.delim(dataconnection,header=FALSE)
colnames(dimdata)=c("colname","desc")
lab=dimdata$desc
names(lab)=dimdata$colname
#create headings that speak for themselves for columns
speakingheadings=as.character(lab[headings])
#download factors for each heading and add
for(heading in headings){
factorfile=paste("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&file=dic%2F",LANGUAGE,"%2F",tolower(heading),".dic",sep="")
temp <- paste(tempfile(),".gz",sep="")
download.file(factorfile,temp)
dataconnection <- gzfile(temp)
factordata=read.delim(dataconnection,header=FALSE)
colnames(factordata)=c(heading,paste(heading,"_desc",sep=""))
#join the heading to the heading dataset
dimensions=merge(dimensions,factordata,by=heading,all.x=TRUE)
}
#at the end at speaking headings
setnames(dimensions,colnames(dimensions)[1:length(speakingheadings)],speakingheadings)
#add data columns by writing and reading again---FASTER ;-)
temp=tempfile()
values=data.frame(value=as.character(longdata$value),stringsAsFactors=FALSE)
v=try(transform(values, value= colsplit(value, split = "\\ ",names=c("value","flag"))),silent=TRUE)
if(class(v)=="try-error"){
print("there are no flags")
values=data.table(values)
values$value=as.character(values$value)
values$value=as.double(values$value)
}else{
values=v$value
values=data.table(values)
values$value=as.character(values$value)
values$value=as.double(values$value)
values$flag=as.character(values$flag)
values[value==flag,flag:=NA]
}
#fix the time column
identifytimedim=as.character(longdata$time[1])
#TODO continue here: identify Q for quarter, M for month, - if non YEAR
if found Q then zoo -> Q else zoo -> M
as.yearqtr("200706", "%Y%m")
grep(,"\\")
eurostatdata=cbind(dimensions,time=longdata$time,values)
if(save) save(eurostatdata,file=paste(datasetname,".RData",sep=""))
eurostatdata
}