This package implements the estimation of a topological ordering for a Linear Structural Equation Model (SEM) with non-Gaussian errors, as outlined in Ruiz et. al (2022+):
G Ruiz, OH Madrid-Padilla, Q Zhou. “Sequentially learning the topological ordering of directed acyclic graphs with likelihood ratio scores.” Transactions on Machine Learning Research, 2022+. [Link]
See the manual for function documentation. See below for some examples.
The package devtools
is needed for installation.
devtools::install_github(repo='gabriel-ruiz/scorelingam')
Load scorelingam
once installed.
library(scorelingam)
p = 5; numRoots = 1; numParentsMin = 1; numParentsMax = 2
scaleParam = runif(n=p,min=0.5,max=1.2)
causalOrder = 5:1
lingamParams = randomWeightedAdjacencyMatrix(p=p,num.roots=numRoots,pa.min=numParentsMin,pa.max=numParentsMax,
pa.wt.min = 0.25,pa.wt.max = 0.9,prob.pos = 0.5,perm = causalOrder)
# weighted adjacency matrix
print(lingamParams$B)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0
## [2,] 0.3839051 0.0000000 0.0000000 0.0000000 0
## [3,] 0.4824674 -0.8666502 0.0000000 0.0000000 0
## [4,] 0.0000000 0.0000000 0.6784132 0.0000000 0
## [5,] 0.0000000 0.0000000 -0.5450611 0.6573171 0
When generating the data matrix, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default)
n = 5000
X = generateData(B=lingamParams$B,scale = scaleParam,perm = causalOrder,n = n,family = 'laplace')
dim(X)
## [1] 5000 5
When estimating a topological ordering for the linear SEM, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default).
# neighborhoods specified to be all other nodes
mbhat = lapply(1:ncol(X),function(j){(1:ncol(X))[-j]})
estOrder = scorelingam(X=X,mb=mbhat,numUpdates=ncol(X),family='laplace')
print(estOrder)
## [,1]
## [1,] 5
## [2,] 4
## [3,] 3
## [4,] 2
## [5,] 1
Calculate the proportion of parents sorted after a child (lower is better).
# check errors (ideally close to zero)
checkSortingErrors(estOrder=estOrder,A=lingamParams$B)
## [1] 0
paHat = getParents(mb=mbhat,ordering=estOrder)
(Bhat = getWeights(X=scale(X,scale=F,center=T),pa=paHat))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000000000 0.00000000 0.0000000 0.0000000 0
## [2,] 0.391906115 0.00000000 0.0000000 0.0000000 0
## [3,] 0.494511285 -0.85062512 0.0000000 0.0000000 0
## [4,] -0.008728058 -0.03435524 0.6737897 0.0000000 0
## [5,] 0.007557937 0.02683172 -0.5371846 0.6568952 0
# maximum entry-wise difference in absolute value
norm(x=lingamParams$B-Bhat,type = 'i')
## [1] 0.0477068
p = 10000;numRoots = 100; numParentsMin = 1; numParentsMax = 2
scaleParam = runif(n=p,min=0.5,max=1.2)
causalOrder = c(seq(2,p,by=2),seq(1,p-1,by=2))
lingamParams = randomWeightedAdjacencyMatrix(p=p,num.roots=numRoots,pa.min=numParentsMin,
pa.max=numParentsMax, pa.wt.min = 0.25,
pa.wt.max = 0.9,prob.pos = 0.5,perm = causalOrder)
When generating the data matrix, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default)
n = 5000
X = generateData(B=lingamParams$B,scale = scaleParam,perm = causalOrder,n = n,family = 'laplace')
dim(X)
## [1] 5000 10000
When estimating a topological ordering for the Linear SEM, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default).
# neighborhoods specified to be true markov blankets
mbhat = moralize(lingamParams$B)
start = Sys.time()
estOrder = scorelingam(X=X,mb=mbhat,numUpdates=5,family='laplace')
end = Sys.time()
difftime(end,start,units='mins')
## Time difference of 2.0367 mins
Calculate the proportion of parents sorted after child (lower is better).
checkSortingErrors(estOrder=estOrder,A=lingamParams$B)
## [1] 0.08549762