# Overview

This package implements the estimation of a topological ordering for a Linear Structural Equation Model (SEM) with non-Gaussian errors, as outlined in Ruiz et. al (2022+):

G Ruiz, OH Madrid-Padilla, Q Zhou. “Sequentially learning the topological ordering of directed acyclic graphs with likelihood ratio scores.” Transactions on Machine Learning Research, 2022+. [Link]

## Package Docummentation

See the manual for function documentation. See below for some examples.

## Installation

The package `devtools` is needed for installation.

``````devtools::install_github(repo='gabriel-ruiz/scorelingam')
``````

Load `scorelingam` once installed.

``````library(scorelingam)
``````

# An example for our sorting procedure: p = 5

## Generate data

### Linear SEM Parameters

``````p = 5; numRoots = 1; numParentsMin = 1; numParentsMax = 2
scaleParam = runif(n=p,min=0.5,max=1.2)
causalOrder = 5:1
pa.wt.min = 0.25,pa.wt.max = 0.9,prob.pos = 0.5,perm = causalOrder)

print(lingamParams\$B)

##           [,1]       [,2]       [,3]      [,4] [,5]
## [1,] 0.0000000  0.0000000  0.0000000 0.0000000    0
## [2,] 0.3839051  0.0000000  0.0000000 0.0000000    0
## [3,] 0.4824674 -0.8666502  0.0000000 0.0000000    0
## [4,] 0.0000000  0.0000000  0.6784132 0.0000000    0
## [5,] 0.0000000  0.0000000 -0.5450611 0.6573171    0
``````

### Data matrix with n = 5000

When generating the data matrix, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default)

``````n = 5000
X = generateData(B=lingamParams\$B,scale = scaleParam,perm = causalOrder,n = n,family = 'laplace')
dim(X)

##  5000    5
``````

## Obtain a topological ordering estimate

When estimating a topological ordering for the linear SEM, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default).

### Specify the neighborhood sets

``````# neighborhoods specified to be all other nodes
mbhat = lapply(1:ncol(X),function(j){(1:ncol(X))[-j]})
``````

### Obtain the ordering estimate

``````estOrder = scorelingam(X=X,mb=mbhat,numUpdates=ncol(X),family='laplace')
print(estOrder)

##      [,1]
## [1,]    5
## [2,]    4
## [3,]    3
## [4,]    2
## [5,]    1
``````

## Check the accuracy of the ordering

Calculate the proportion of parents sorted after a child (lower is better).

``````# check errors (ideally close to zero)
checkSortingErrors(estOrder=estOrder,A=lingamParams\$B)

##  0
``````

### Check accuracy of estimated weighted adjacency matrix

``````paHat = getParents(mb=mbhat,ordering=estOrder)
(Bhat = getWeights(X=scale(X,scale=F,center=T),pa=paHat))

##              [,1]        [,2]       [,3]      [,4] [,5]
## [1,]  0.000000000  0.00000000  0.0000000 0.0000000    0
## [2,]  0.391906115  0.00000000  0.0000000 0.0000000    0
## [3,]  0.494511285 -0.85062512  0.0000000 0.0000000    0
## [4,] -0.008728058 -0.03435524  0.6737897 0.0000000    0
## [5,]  0.007557937  0.02683172 -0.5371846 0.6568952    0

# maximum entry-wise difference in absolute value
norm(x=lingamParams\$B-Bhat,type = 'i')

##  0.0477068
``````

# A higher dimensional example: p = 10, 000

## Generate data

### Linear SEM Parameters

``````p = 10000;numRoots = 100; numParentsMin = 1; numParentsMax = 2
scaleParam = runif(n=p,min=0.5,max=1.2)
causalOrder = c(seq(2,p,by=2),seq(1,p-1,by=2))
pa.max=numParentsMax, pa.wt.min = 0.25,
pa.wt.max = 0.9,prob.pos = 0.5,perm = causalOrder)
``````

### Data matrix with n = 5000

When generating the data matrix, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default)

``````n = 5000
X = generateData(B=lingamParams\$B,scale = scaleParam,perm = causalOrder,n = n,family = 'laplace')
dim(X)

##   5000 10000
``````

## Obtain a topological ordering estimate and time the algorithm.

When estimating a topological ordering for the Linear SEM, the possible options for the family argument are ‘laplace’, ‘logistic’, and ‘t’, in which case the additional argument df is needed (df=10 is the default).

### Specify the neighborhood sets

``````# neighborhoods specified to be true markov blankets
mbhat = moralize(lingamParams\$B)
``````

### Obtain the ordering estimate

``````start = Sys.time()
end = Sys.time()
difftime(end,start,units='mins')

## Time difference of 2.0367 mins
``````

## Check accuracy of estimated ordering

Calculate the proportion of parents sorted after child (lower is better).

``````checkSortingErrors(estOrder=estOrder,A=lingamParams\$B)

##  0.08549762
``````