Title: | Functional Adjacency Spectral Embedding |
---|---|
Description: | Latent process embedding for functional network data with the Functional Adjacency Spectral Embedding. Fits smooth latent processes based on cubic spline bases. Also generates functional network data from three models, and evaluates a network generalized cross-validation criterion for dimension selection. For more information, see MacDonald, Zhu and Levina (2022+) <arXiv:2210.07491>. |
Authors: | Peter W. MacDonald [aut, cre, cph]
|
Maintainer: | Peter W. MacDonald <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1.9000 |
Built: | 2025-01-26 06:11:21 UTC |
Source: | https://github.com/peterwmacd/fase |
fase
fits a functional adjacency spectral embedding to snapshots
of (undirected) functional network data. The latent processes are fit
in a spline basis specified by the user, with additional options for
ridge penalization.
fase(A,d,self_loops,spline_design,lambda,optim_options,output_options)
fase(A,d,self_loops,spline_design,lambda,optim_options,output_options)
A |
An |
d |
A positive integer, the number of latent space dimensions of the functional embedding. |
self_loops |
A Boolean, if |
spline_design |
A list, containing the spline design information.
For fitting with a
For fitting with a smoothing spline design:
|
lambda |
A positive scalar, the scale factor for the generalized ridge
penalty (see Details). Defaults to |
optim_options |
A list, containing additional optional arguments controlling the gradient descent algorithm.
|
output_options |
A list, containing additional optional arguments controlling
the output of
|
fase
finds a functional adjacency spectral embedding of an
array
of
symmetric adjacency matrices on a common set of nodes, where
each
slice is associated to a scalar index
for
.
Embedding requires the specification of a latent space dimension
and spline design information (with the argument
spline_design
).
fase
can fit latent processes using either a cubic -spline
basis with
equally spaced knots, or a natural cubic spline basis with a second
derivative (generalized ridge) smoothing penalty: a smoothing spline.
To fit with a
-spline design (
spline_design$type = 'bs'
),
one must minimally provide a basis
dimension of at least
4
and at most .
When fitting with a smoothing spline design, the generalized ridge
penalty is scaled by
, where
is specified by the argument
lambda
.
see MacDonald et al., (2022+),
Appendix E for more details.
lambda
can also be used to introduce a ridge penalty on the
basis coordinates when fitting with -splines.
Fitting minimizes a least squares loss,
using gradient descent (Algorithm 2) on the basis coordinates
of each component process
Additional options for the fitting algorithm, including initialization,
can be specified by the argument optim_options
.
For more details on the fitting and initialization algorithms, see
MacDonald et al., (2022+),
Section 3.
By default, fase
will return estimates of the latent processes
evaluated at the snapshot indices as an array, after
performing a Procrustes alignment of the consecutive snapshots.
This extra alignment step can be skipped.
fase
will also return the spline design information used to fit the
embedding, convergence information for gradient descent, and (if specified)
the basis coordinates.
When fitting with -splines,
fase
can return a
network generalized cross validation criterion, described in
MacDonald et al., (2022+),
Section 3.3. This criterion can be minimized to choose appropriate values
for and
.
A list is returned with the functional adjacency spectral embedding, the spline design information, and some additional optimization output:
Z |
An |
W |
For |
spline_design |
A list, describing the spline design:
|
ngcv |
A scalar, the network generalized cross validation criterion
(see Details). Only returned for |
K |
A positive integer, the number of iterations run in gradient descent. |
converged |
An integer convergence code, |
# Gaussian edge data with sinusoidal latent processes set.seed(1) data <- gaussian_snapshot_ss(n=50,d=2, x_vec=seq(0,1,length.out=50), self_loops=FALSE,sigma_edge=4) # fase fit with B-spline design fit_bs <- fase(data$A,d=2,self_loops=FALSE, spline_design=list(type='bs',q=9,x_vec=data$spline_design$x_vec), optim_options=list(eps=1e-4,K_max=40), output_options=list(return_coords=TRUE)) # fase fit with smoothing spline design fit_ss <- fase(data$A,d=2,self_loops=FALSE, spline_design=list(type='ss',x_vec=data$spline_design$x_vec), lambda=.5, optim_options=list(eta=1e-4,K_max=40,verbose=FALSE), output_options=list(align_output=FALSE)) #NOTE: both examples fit with small optim_options$K_max=40 for demonstration
# Gaussian edge data with sinusoidal latent processes set.seed(1) data <- gaussian_snapshot_ss(n=50,d=2, x_vec=seq(0,1,length.out=50), self_loops=FALSE,sigma_edge=4) # fase fit with B-spline design fit_bs <- fase(data$A,d=2,self_loops=FALSE, spline_design=list(type='bs',q=9,x_vec=data$spline_design$x_vec), optim_options=list(eps=1e-4,K_max=40), output_options=list(return_coords=TRUE)) # fase fit with smoothing spline design fit_ss <- fase(data$A,d=2,self_loops=FALSE, spline_design=list(type='ss',x_vec=data$spline_design$x_vec), lambda=.5, optim_options=list(eta=1e-4,K_max=40,verbose=FALSE), output_options=list(align_output=FALSE)) #NOTE: both examples fit with small optim_options$K_max=40 for demonstration
fase_seq
fits a functional adjacency spectral embedding to snapshots
of (undirected) functional network data, with each
of the latent dimensions fit sequentially. The latent processes are fit
in a spline basis specified by the user, with additional options for
ridge penalization.
fase_seq(A,d,self_loops,spline_design,lambda,optim_options,output_options)
fase_seq(A,d,self_loops,spline_design,lambda,optim_options,output_options)
A |
An |
d |
A positive integer, the number of latent space dimensions of the functional embedding. |
self_loops |
A Boolean, if |
spline_design |
A list, containing the spline design information.
For fitting with a
For fitting with a smoothing spline design:
|
lambda |
A positive scalar, the scale factor for the generalized ridge
penalty (see Details). Defaults to |
optim_options |
A list, containing additional optional arguments controlling the gradient descent algorithm.
|
output_options |
A list, containing additional optional arguments controlling
the output of
|
Note that fase_seq
is a wrapper for fase
. When ,
fase_seq
coincides with fase
.
fase_seq
finds a functional adjacency spectral embedding of an
array
of
symmetric adjacency matrices on a common set of nodes, where
each
slice is associated to a scalar index
for
.
Embedding requires the specification of a latent space dimension
and spline design information (with the argument
spline_design
).
fase_seq
can fit latent processes using either a cubic -spline
basis with
equally spaced knots, or a natural cubic spline basis with a second
derivative (generalized ridge) smoothing penalty: a smoothing spline.
To fit with a
-spline design (
spline_design$type = 'bs'
),
one must minimally provide a basis
dimension of at least
4
and at most .
When fitting with a smoothing spline design, the generalized ridge
penalty is scaled by
, where
is specified by the argument
lambda
.
see MacDonald et al., (2022+),
Appendix E for more details.
lambda
can also be used to introduce a ridge penalty on the
basis coordinates when fitting with -splines.
Fitting minimizes a least squares loss,
using gradient descent (Algorithm 1) on the basis coordinates
of each component process
Additional options for the fitting algorithm, including initialization,
can be specified by the argument optim_options
.
For more details on the fitting and initialization algorithms, see
MacDonald et al., (2022+),
Section 3.
By default, fase_seq
will return estimates of the latent processes
evaluated at the snapshot indices as an array, after
performing a Procrustes alignment of the consecutive snapshots.
This extra alignment step can be skipped.
fase_seq
will also return the spline design information used to fit the
embedding, convergence information for gradient descent, and (if specified)
the basis coordinates.
When fitting with -splines,
fase_seq
can return a
network generalized cross validation criterion, described in
MacDonald et al., (2022+),
Section 3.3. This criterion can be minimized to choose appropriate values
for and
.
A list is returned with the functional adjacency spectral embedding, the spline design information, and some additional optimization output:
Z |
An |
W |
For |
spline_design |
A list, describing the spline design:
|
ngcv |
A scalar, the network generalized cross validation criterion
(see Details). Only returned for |
K |
A positive integer, the number of iterations run in gradient descent. |
converged |
An integer convergence code, |
# Gaussian edge data with sinusoidal latent processes set.seed(1) data <- gaussian_snapshot_ss(n=50,d=2, x_vec=seq(0,1,length.out=50), self_loops=FALSE,sigma_edge=4) # fase fit with B-spline design fit_bs <- fase_seq(data$A,d=2,self_loops=FALSE, spline_design=list(type='bs',q=9,x_vec=data$spline_design$x_vec), optim_options=list(eps=1e-4,K_max=40), output_options=list(return_coords=TRUE)) # fase fit with smoothing spline design fit_ss <- fase_seq(data$A,d=2,self_loops=FALSE, spline_design=list(type='ss',x_vec=data$spline_design$x_vec), lambda=.5, optim_options=list(eta=1e-4,K_max=40,verbose=FALSE)) #NOTE: both models fit with small optim_options$K_max=40 for demonstration
# Gaussian edge data with sinusoidal latent processes set.seed(1) data <- gaussian_snapshot_ss(n=50,d=2, x_vec=seq(0,1,length.out=50), self_loops=FALSE,sigma_edge=4) # fase fit with B-spline design fit_bs <- fase_seq(data$A,d=2,self_loops=FALSE, spline_design=list(type='bs',q=9,x_vec=data$spline_design$x_vec), optim_options=list(eps=1e-4,K_max=40), output_options=list(return_coords=TRUE)) # fase fit with smoothing spline design fit_ss <- fase_seq(data$A,d=2,self_loops=FALSE, spline_design=list(type='ss',x_vec=data$spline_design$x_vec), lambda=.5, optim_options=list(eta=1e-4,K_max=40,verbose=FALSE)) #NOTE: both models fit with small optim_options$K_max=40 for demonstration
gaussian_snapshot_bs
simulates a realization of a functional network
with Gaussian edges, according to an inner product latent process model.
The latent processes are generated from a -spline basis with equally
spaced knots.
gaussian_snapshot_bs(n,d,m,self_loops=TRUE, spline_design,sigma_edge=1, process_options)
gaussian_snapshot_bs(n,d,m,self_loops=TRUE, spline_design,sigma_edge=1, process_options)
n |
A positive integer, the number of nodes. |
d |
A positive integer, the number of latent space dimensions. |
m |
A positive integer, the number of snapshots.
If this argument is not specified, it
is determined from the snapshot index vector |
self_loops |
A Boolean, if |
spline_design |
A list, describing the
|
sigma_edge |
A positive scalar,
the entry-wise standard deviation for the Gaussian edge variables.
Defaults to |
process_options |
A list, containing additional optional arguments:
|
The spline design of the functional network data (snapshot indices,
basis dimension) is generated using the information provided in
spline_design
, producing a -dimensional cubic
-spline basis with equally spaced knots.
The latent process basis coordinates are generated as iid
Gaussian random variables with standard deviation
process_options$sigma_coord
. Each latent process is given by
Then, the symmetric adjacency matrix for
snapshot
has independent Gaussian entries
with standard deviation
sigma_edge
and mean
for (or
with no self loops).
A list is returned with the realizations of the basis coordinates, spline design, and the multiplex network snapshots:
A |
An array of dimension |
W |
An array of dimension |
spline_design |
A list, describing the
|
# Gaussian edge data with B-spline latent processes, Gaussian coordinates # NOTE: x_vec is automatically populated given m data <- gaussian_snapshot_bs(n=100,d=4,m=100, self_loops=FALSE, spline_design=list(q=12), sigma_edge=3, process_options=list(sigma_coord=.75))
# Gaussian edge data with B-spline latent processes, Gaussian coordinates # NOTE: x_vec is automatically populated given m data <- gaussian_snapshot_bs(n=100,d=4,m=100, self_loops=FALSE, spline_design=list(q=12), sigma_edge=3, process_options=list(sigma_coord=.75))
gaussian_snapshot_ss
simulates a realization of a functional network
with Gaussian edges, according to an inner product latent process model.
The latent processes are randomly generated sinusoidal functions.
gaussian_snapshot_ss(n,d,m,x_vec,self_loops=TRUE, sigma_edge=1,process_options)
gaussian_snapshot_ss(n,d,m,x_vec,self_loops=TRUE, sigma_edge=1,process_options)
n |
A positive integer, the number of nodes. |
d |
A positive integer, the number of latent space dimensions. |
m |
A positive integer, the number of snapshots.
If this argument is not specified, it
is determined from the snapshot index vector |
x_vec |
A vector, the snapshot evaluation indices for the data.
Defaults to an equally spaced sequence of length
|
self_loops |
A Boolean, if |
sigma_edge |
A positive scalar,
the entry-wise standard deviation for the Gaussian edge variables.
Defaults to |
process_options |
A list, containing additional optional arguments:
|
The the latent process for node in latent dimension
is given independently by
Where is Gaussian with mean
0
and standard deviation
,
is Bernoulli with mean
1/2
, and is uniform
with minimum
spline_design$x_min
and maximum spline_design$x_max
.
is a frequency parameter specified with
process_options$frequency
, and is a maximum amplitude parameter
specified with
process_options$amplitude
.
Roughly, each process is a randomly shifted sine function which goes through
f
cycles on the index set, with amplitude either increasing or
decreasing between and
.
Then, the symmetric adjacency matrix for
snapshot
has independent Gaussian entries
with standard deviation
sigma_edge
and mean
for (or
with no self loops).
This function may return the latent processes as an
array evaluated at the prespecified snapshot indices, or as a function which
takes a vector of indices and returns the corresponding evaluations of
the latent process matrices.
It also returns the spline design information required to
fit a FASE embedding to this data with a natural cubic spline.
A list is returned with the realizations of the basis coordinates, spline design, and the multiplex network snapshots:
A |
An array of dimension |
Z |
If |
spline_design |
A list, describing the
|
# Gaussian edge data with sinusoidal latent processes # NOTE: latent processes are returned as a function data <- gaussian_snapshot_ss(n=100,d=2, x_vec=seq(0,3,length.out=80), self_loops=TRUE, sigma_edge=4, process_options=list(amplitude=4, frequency=3, return_fn=TRUE))
# Gaussian edge data with sinusoidal latent processes # NOTE: latent processes are returned as a function data <- gaussian_snapshot_ss(n=100,d=2, x_vec=seq(0,3,length.out=80), self_loops=TRUE, sigma_edge=4, process_options=list(amplitude=4, frequency=3, return_fn=TRUE))
proc_align
orthogonally transforms the columns of a matrix to
find the best approximation (in terms of Frobenius norm) to a
second matrix
. Optionally, it may also return the optimal transformation
matrix.
proc_align(A,B,return_orth=FALSE)
proc_align(A,B,return_orth=FALSE)
A |
An |
B |
An |
return_orth |
A Boolean which specifies whether to return the
orthogonal transformation.
Defaults to |
If return_orth
is FALSE
, returns the
matrix resulting from applying the optimal aligning transformation to
the columns of
A
.
Otherwise, returns a list with two entries:
Ao |
The |
orth |
The |
proc_align_slicewise3
applies an orthogonal transformation
to the columns of each of the slices of an
array
to
find the best approximation (in terms of matrix Frobenius norm) to
the corresponding
slice of a
second
array
.
proc_align_slicewise3(A,B)
proc_align_slicewise3(A,B)
A |
An |
B |
An |
Returns the
array resulting from applying the optimal aligning transformations to
the columns of the
slices of
A
.
proc_align3
applies one orthogonal transformation
to the columns of each of the slices of an
array
to
find the best approximation (in terms of matrix Frobenius norm, averaged
over the
slices) to a
second
array
.
Optionally, it may also return the optimal transformation
matrix.
proc_align3(A,B,return_orth=FALSE)
proc_align3(A,B,return_orth=FALSE)
A |
An |
B |
An |
return_orth |
A Boolean which specifies whether to return the
orthogonal transformation.
Defaults to |
If return_orth
is FALSE
, returns the
array resulting from applying the optimal aligning transformation to
the columns of the
slices of
A
.
Otherwise, returns a list with two entries:
Ao |
The |
orth |
The |
rdpg_snapshot_bs
simulates a realization of a functional network
with Bernoulli edges, according to an inner product latent process model.
The latent processes are generated from a -spline basis with equally
spaced knots.
rdpg_snapshot_bs(n,d,m,self_loops=TRUE, spline_design,process_options)
rdpg_snapshot_bs(n,d,m,self_loops=TRUE, spline_design,process_options)
n |
A positive integer, the number of nodes. |
d |
A positive integer, the number of latent space dimensions. |
m |
A positive integer, the number of snapshots.
If this argument is not specified, it
is determined from the snapshot index vector |
self_loops |
A Boolean, if |
spline_design |
A list, describing the
|
process_options |
A list, containing additional optional arguments:
|
The spline design of the functional network data (snapshot indices,
basis dimension) is generated using the information provided in
spline_design
, producing a -dimensional cubic
-spline basis with equally spaced knots.
The () latent process basis coordinates
for each node are generated as
iid Dirichlet
random variables with
-dimensional parameter
process_options$alpha_coord
or
rep(process_options$alpha_coord,d)
depending on the dimension
of process_options$alpha_coord
.
Roughly, smaller values of process_options$alpha_coord
will
tend to generate latent positions closer to the corners of the simplex.
is then rescaled so the overall network density is approximately
process_options$density
, and the Euclidean norm of
never exceeds
1
.
If the density requested is too high, it will revert to the maximum density
under this model ().
Then each latent process is given by
The symmetric adjacency matrix for
snapshot
has independent Bernoulli entries
with mean
for (or
with no self loops).
A list is returned with the realizations of the basis coordinates, spline design, and the multiplex network snapshots:
A |
An array of dimension |
W |
An array of dimension |
spline_design |
A list, describing the
|
# Bernoulli edge data with B-spline latent processes, Dirichlet coordinates # NOTE: for B-splines, x_max and x_min do not need to coincide with the # max and min snapshot times. data <- rdpg_snapshot_bs(n=100,d=10, self_loops=FALSE, spline_design=list(q=8, x_vec=seq(-1,1,length.out=50), x_min=-1.1,x_max=1.1), process_options=list(alpha_coord=.2, density=1/10))
# Bernoulli edge data with B-spline latent processes, Dirichlet coordinates # NOTE: for B-splines, x_max and x_min do not need to coincide with the # max and min snapshot times. data <- rdpg_snapshot_bs(n=100,d=10, self_loops=FALSE, spline_design=list(q=8, x_vec=seq(-1,1,length.out=50), x_min=-1.1,x_max=1.1), process_options=list(alpha_coord=.2, density=1/10))