Title: | Two-Step Kernel Ridge Regression for Network Predictions |
---|---|
Description: | Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ). |
Authors: | Joris Meys [cre, aut], Michiel Stock [aut] |
Maintainer: | Joris Meys <[email protected]> |
License: | GPL-3 |
Version: | 0.1.11 |
Built: | 2024-11-19 05:26:32 UTC |
Source: | https://github.com/centerforstatistics-ugent/xnet |
This package implements the two-step kernel ridge regression model, a supervised network prediction method that can be used for all kinds of network analyses. Examples are protein-protein interaction, foodwebs, ...
Joris Meys and Michiel Stock
Send your bug reports to:
https://github.com/CenterForStatistics-UGent/xnet/issues
More background in the paper by Stock et al, 2018:
http://doi.org/10.1093/bib/bby095
These functions allow you to extract slots from objects of the
class linearFilter
.
alpha(x) na_removed(x) ## S3 method for class 'linearFilter' mean(x, ...) ## S4 method for signature 'linearFilter' mean(x, ...) ## S4 method for signature 'linearFilter' colMeans(x) ## S4 method for signature 'linearFilter' rowMeans(x) ## S4 method for signature 'linearFilter' alpha(x) ## S4 method for signature 'linearFilter' na_removed(x)
alpha(x) na_removed(x) ## S3 method for class 'linearFilter' mean(x, ...) ## S4 method for signature 'linearFilter' mean(x, ...) ## S4 method for signature 'linearFilter' colMeans(x) ## S4 method for signature 'linearFilter' rowMeans(x) ## S4 method for signature 'linearFilter' alpha(x) ## S4 method for signature 'linearFilter' na_removed(x)
x |
a |
... |
arguments passed to or from other methods. |
for mean
: the mean of the original matrix
for colMeans
: a numeric vector with the column means
for rowMeans
: a numeric vector with the row means
for alpha
: a numeric vector of length 4 with the alpha
values.
for na_removed
: a logical value indicating whether
missing values were removed prior to the fitting of the filter.
data(drugtarget) lf <- linear_filter(drugTargetInteraction, alpha = 0.25) alpha(lf) mean(lf) colMeans(lf) na_removed(lf)
data(drugtarget) lf <- linear_filter(drugTargetInteraction, alpha = 0.25) alpha(lf) mean(lf) colMeans(lf) na_removed(lf)
These functions allow converting models that inherit from the
tskrr
and
tskrrTune
class into each other,
keeping track of whether the model is homogeneous or heterogeneous.
The dots argument allows specifying values for possible extra slots
when converting from tskrr
to tskrrTune
.
More information on these slots can be found
on the help page of tskrrTune
.
These functions are not exported.
as_tuned(x, ...) as_tskrr(x, ...) ## S4 method for signature 'tskrrHomogeneous' as_tuned(x, ...) ## S4 method for signature 'tskrrHeterogeneous' as_tuned(x, ...) ## S4 method for signature 'tskrrTune' as_tskrr(x) ## S4 method for signature 'tskrrImpute' as_tskrr(x) ## S4 method for signature 'tskrr' as_tskrr(x)
as_tuned(x, ...) as_tskrr(x, ...) ## S4 method for signature 'tskrrHomogeneous' as_tuned(x, ...) ## S4 method for signature 'tskrrHeterogeneous' as_tuned(x, ...) ## S4 method for signature 'tskrrTune' as_tskrr(x) ## S4 method for signature 'tskrrImpute' as_tskrr(x) ## S4 method for signature 'tskrr' as_tskrr(x)
x |
a model of class |
... |
values for the extra slots defined by
the class |
For as_tuned
:
a tskrrTune
object of
the proper class (homogeneous or heterogeneous)
For as_tskrr
: an object of class
tskrrHomogeneous
or
tskrrHeterogeneous
depending
on whether the original object was homogeneous or heterogeneous.
This functions do NOT tune a model. they are used internally to make the connection between both types in the methods.
tune
for actually tuning a model.
tskrrTune
for
names and possible values of the slots passed through
...
This function creates a grid of values for
tuning a tskrr
model. The grid is equally spaced on
a logarithmic scale. Normally it's not needed to call this method
directly, it's usually called from tune
.
create_grid(lim = c(1e-04, 10000), ngrid = 10)
create_grid(lim = c(1e-04, 10000), ngrid = 10)
lim |
a numeric vector with 2 values giving the lower and upper limit for the grid. |
ngrid |
the number of values that have to be produced. If this number is not integer, it is truncated. The value should be 2 or larger. |
The lim
argument sets the boundaries of the domain in which
the lambdas are sought. The lambda values at which the function is
evaluated, are calculated as:
exp(seq(log(1e-4), log(1e4), length.out = ngrid))
a numeric vector with values evenly spaced on a logarithmic scale.
tune
for tuning a tskrr model.
create_grid(lim = c(1e-4, 1), ngrid = 5)
create_grid(lim = c(1e-4, 1), ngrid = 5)
These functions allow you to extract the dimensions of a tskrr object. These dimensions are essentially the dimensions of the label matrix y.
## S4 method for signature 'tskrr' dim(x)
## S4 method for signature 'tskrr' dim(x)
x |
a |
a vector with two values indicating the number of rows and the number of columns.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) dim(mod) nrow(mod) ncol(mod)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) dim(mod) nrow(mod) ncol(mod)
A dataset for examining the interaction between 54 drugs and 26 neural receptors. It consists of three different matrices.
drugTargetInteraction
drugTargetInteraction
for drugTargetInteraction: a numeric matrix of 26 rows by 54 columns.
For drugSim: a numeric square matrix with 54 rows/columns.
For targetSim: a numeric square matrix with 26 rows/columns.
The dataset consists of the following objects :
drugTargetInteraction: a matrix indicating whether or not a certain drug compound interacts with a certain neural receptor.
targetSim: a similarity matrix for the neural receptors.
drugSim: a similarity matrix for the drugs
The data originates from Yamanishi et al (2008) but was partly reworked to be suitable for two-step kernel ridge regression. This is explained in detail in the Preparation of the example data vignette.
https://doi.org/10.1093/bioinformatics/btn162
Yamanishi et al, 2008 : Prediction of drug-target interaction networks from the integration of chemical and genomic spaces.
These functions calculate either the hat matrix, the mapping matrix or the original (kernel) matrix for a two-step kernel ridge regression, based on the eigendecomposition of the kernel matrix.
eigen2hat(eigen, val, lambda) eigen2map(eigen, val, lambda) eigen2matrix(eigen, val)
eigen2hat(eigen, val, lambda) eigen2map(eigen, val, lambda) eigen2matrix(eigen, val)
eigen |
a matrix with the eigenvectors. |
val |
an numeric vector with the eigenvalues. |
lambda |
a single numeric value for the hyperparameter lambda |
For the hat matrix, this boils down to:
For the map matrix, this is :
with the matrix with eigenvectors,
a diagonal matrix
with the eigenvalues on the diagonal,
the identity matrix and
the hyperparameter linked to this kernel.
The internal calculation is optimized to avoid having to invert
a matrix. This is done using the fact that
is a
diagonal matrix.
a numeric matrix representing either the hat matrix
(eigen2hat
), the map matrix (eigen2map
) or
the original matrix (eigen2matrix
)
This functions extracts the fitted predictions from a
tskrr
object or an object
inheriting from that class. The xnet
package provides an S4 generic for the function
fitted
from the package stats
,
and a method for tskrr
objects.
## S3 method for class 'tskrr' fitted(object, labels = TRUE, ...) ## S3 method for class 'linearFilter' fitted(object, ...) ## S4 method for signature 'tskrr' fitted(object, labels = TRUE, ...) ## S4 method for signature 'linearFilter' fitted(object, ...)
## S3 method for class 'tskrr' fitted(object, labels = TRUE, ...) ## S3 method for class 'linearFilter' fitted(object, ...) ## S4 method for signature 'tskrr' fitted(object, labels = TRUE, ...) ## S4 method for signature 'linearFilter' fitted(object, ...)
object |
an object for which the extraction of model fitted values is meaningful. |
labels |
a logical value indicating whether the labels should be shown. Defaults to TRUE |
... |
arguments passed to or from other methods. |
a numeric matrix with the predictions
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) pred <- fitted(mod)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) pred <- fitted(mod)
This function returns the correct function needed to perform one of the leave-one-out cross-validations. It's primarily meant for internal use but can be useful when doing simulations.
get_loo_fun(x, ...) ## S4 method for signature 'tskrrHeterogeneous' get_loo_fun( x, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE ) ## S4 method for signature 'tskrrHomogeneous' get_loo_fun( x, exclusion = c("edges", "vertices", "interaction", "both"), replaceby0 = FALSE ) ## S4 method for signature 'linearFilter' get_loo_fun(x, replaceby0 = FALSE) ## S4 method for signature 'character' get_loo_fun( x = c("tskrrHeterogeneous", "tskrrHomogeneous", "linearFilter"), ... ) ## S4 method for signature 'tskrrTune' get_loo_fun(x, ...)
get_loo_fun(x, ...) ## S4 method for signature 'tskrrHeterogeneous' get_loo_fun( x, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE ) ## S4 method for signature 'tskrrHomogeneous' get_loo_fun( x, exclusion = c("edges", "vertices", "interaction", "both"), replaceby0 = FALSE ) ## S4 method for signature 'linearFilter' get_loo_fun(x, replaceby0 = FALSE) ## S4 method for signature 'character' get_loo_fun( x = c("tskrrHeterogeneous", "tskrrHomogeneous", "linearFilter"), ... ) ## S4 method for signature 'tskrrTune' get_loo_fun(x, ...)
x |
a character value with the class or a |
... |
arguments passed to or from other methods. |
exclusion |
a character value with possible values "interaction", "row", "column", "both" for heterogeneous models, and "edges", "vertices", "interaction" or "both" for homogeneous models. Defaults to "interaction". See details. |
replaceby0 |
a logical value indicating whether the interaction
should be simply removed ( |
This function can be used to select the correct loo function in
a simulation or tuning algorithm, based on the model object you
created. Depending on its class, the returned functions will have
different arguments, so you should only use this if you know
what you're doing and after you checked the actual returned
functions in loo_internal
.
Using replaceby0
only makes sense if you only remove the interaction.
In all other cases, this argument is ignored.
For the class tskrrHomogeneous
, it doesn't make sense to
remove rows or columns. If you chose this option, the function will
throw an error. Removing edges corresponds to the setting "edges" or
"interaction". Removing vertices corresponds to the setting "vertices" or
"both". These terms can be used interchangeably.
For the class linearFilter
it only makes sense to exclude the
interaction (i.e., a single cell). Therefore you do not have an argument
exclusion
for that method.
For the classes tskrrTune
and tskrrImpute
,
not specifying exclusion
or replaceby0
returns the used
loo function. If you specify either of them,
it will use the method for the appropriate model and return
a new loo function.
a function taking the arguments y, and possibly pred
for calculating the leave-one-out cross-validation. For class
tskrrHeterogeneous
, the returned function also
has an argument Hk and Hg, representing the hat matrix for the rows
and the columns respectively. For class tskrrHomogeneous
,
only the extra argument Hk is available. For class linearFilter
,
the extra argument is called alpha
and takes the alpha vector
of that model.
loo
for carrying out a leave on out crossvalidation,
and loo_internal
for more information on the internal
functions one retrieves with this one.
The functions described here are convenience functions to get
information out of a tskrrImpute
object.
has_imputed_values(x) which_imputed(x) is_imputed(x)
has_imputed_values(x) which_imputed(x) is_imputed(x)
x |
a |
For has_imputed_values
: a logical value indicating whether
the model has imputed values. If x
is not some form of a
tskrr
model, the function will return an error.
For which_imputed
: a integer vector with the positions
for which the values are imputed.
for is_imputed
: a matrix of the same dimensions as the
label matrix. It contains the value FALSE
at positions that
were not imputed, and TRUE
at positions that were.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) naid <- sample(length(drugTargetInteraction), 30) drugTargetInteraction[naid] <- NA impmod <- impute_tskrr(drugTargetInteraction, targetSim, drugSim) has_imputed_values(mod) has_imputed_values(impmod) # For illustration: extract imputed values id <- is_imputed(impmod) fitted(impmod)[id]
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) naid <- sample(length(drugTargetInteraction), 30) drugTargetInteraction[naid] <- NA impmod <- impute_tskrr(drugTargetInteraction, targetSim, drugSim) has_imputed_values(mod) has_imputed_values(impmod) # For illustration: extract imputed values id <- is_imputed(impmod) fitted(impmod)[id]
This function returns the hat matrix or hat matrices of
a tskrr model. xnet
creates an S4 generic for hat
and links the default method to the hat
function
of stats
hat(x, ...) ## S4 method for signature 'tskrrHeterogeneous' hat(x, which = c("row", "column")) ## S4 method for signature 'tskrrHomogeneous' hat(x, ...)
hat(x, ...) ## S4 method for signature 'tskrrHeterogeneous' hat(x, which = c("row", "column")) ## S4 method for signature 'tskrrHomogeneous' hat(x, ...)
x |
a tskrr model |
... |
arguments passed to other methods. |
which |
a character value with possible values "row" or "column" to indicate which should be returned. For homogeneous models, this parameter is ignored. |
the requested hat matrix of the model.
This function implements an optimization algorithm that allows
imputing missing values in the label matrix while fitting a
tskrr
model.
impute_tskrr( y, k, g = NULL, lambda = 0.01, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE, niter = 10000, tol = sqrt(.Machine$double.eps), start = mean(y, na.rm = TRUE), verbose = FALSE )
impute_tskrr( y, k, g = NULL, lambda = 0.01, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE, niter = 10000, tol = sqrt(.Machine$double.eps), start = mean(y, na.rm = TRUE), verbose = FALSE )
y |
a label matrix |
k |
a kernel matrix for the rows |
g |
an optional kernel matrix for the columns |
lambda |
a numeric vector with one or two values for the hyperparameter lambda. If two values are given, the first one is used for the k matrix and the second for the g matrix. |
testdim |
a logical value indicating whether symmetry
and the dimensions of the kernel(s) should be tested.
Defaults to |
testlabels |
a logical value indicating wether the row- and column
names of the matrices have to be checked for consistency. Defaults to
|
symmetry |
a character value with the possibilities "auto", "symmetric" or "skewed". In case of a homogeneous fit, you can either specify whether the label matrix is symmetric or skewed, or you can let the function decide (option "auto"). |
keep |
a logical value indicating whether the kernel hat
matrices should be stored in the model object. Doing so makes the
model object quite larger, but can speed up predictions in
some cases. Defaults to |
niter |
an integer giving the maximum number of iterations |
tol |
a numeric value indicating the tolerance for convergence of the algorithm. It is the maximum sum of squared differences between to iteration steps. |
start |
a numeric value indicating the value with which NA's are replaced in the first step of the algorithm. Defaults to 0. |
verbose |
either a logical value, 1 or 2. |
A tskrr
model of the class tskrrImputeHeterogeneous
or tskrrImputeHomogeneous
depending on whether or
not g
has a value.
data(drugtarget) naid <- sample(length(drugTargetInteraction), 30) drugTargetInteraction[naid] <- NA impute_tskrr(drugTargetInteraction, targetSim, drugSim)
data(drugtarget) naid <- sample(length(drugTargetInteraction), 30) drugTargetInteraction[naid] <- NA impute_tskrr(drugTargetInteraction, targetSim, drugSim)
This function provides an interface for the imputation of values
based on a tskrr
model and is the internal function
used by impute_tskrr
.
impute_tskrr.fit(y, Hk, Hg, naid = NULL, niter, tol, start, verbose)
impute_tskrr.fit(y, Hk, Hg, naid = NULL, niter, tol, start, verbose)
y |
a label matrix |
Hk |
a hat matrix for the rows (see also |
Hg |
a hat matrix for the columns. For homogeneous networks, this should be Hk again. |
naid |
an optional index with the values that have to be imputed,
i.e. at which positions you find a |
niter |
an integer giving the maximum number of iterations |
tol |
a numeric value indicating the tolerance for convergence of the algorithm. It is the maximum sum of squared differences between to iteration steps. |
start |
a numeric value indicating the value with which NA's are replaced in the first step of the algorithm. Defaults to 0. |
verbose |
either a logical value, 1 or 2. |
This function is mostly available for internal use. In most cases,
it makes much more sense to use impute_tskrr
, as that
function returns an object one can work with. The function
impute_tskrr.fit
could be useful when doing simulations or
creating fitting algorithms.
a list with two elements:
a matrix y
with the imputed values filled in.
a numeric value niter
with the amount of iterations
impute_tskrr
for the user-level function, and
eigen2hat
for conversion of a eigen decomposition to
a hat matrix.
data(drugtarget) K <- eigen(targetSim) G <- eigen(drugSim) Hk <- eigen2hat(K$vectors, K$values, lambda = 0.01) Hg <- eigen2hat(G$vectors, G$values, lambda = 0.05) drugTargetInteraction[c(3,17,123)] <- NA res <- impute_tskrr.fit(drugTargetInteraction, Hk, Hg, niter = 1000, tol = 10e-10, start = 0, verbose = FALSE)
data(drugtarget) K <- eigen(targetSim) G <- eigen(drugSim) Hk <- eigen2hat(K$vectors, K$values, lambda = 0.01) Hg <- eigen2hat(G$vectors, G$values, lambda = 0.05) drugTargetInteraction[c(3,17,123)] <- NA res <- impute_tskrr.fit(drugTargetInteraction, Hk, Hg, niter = 1000, tol = 10e-10, start = 0, verbose = FALSE)
The function isSymmetric
tests for symmetry of a matrix but also
takes row and column names into account. This function is a toned-down
(and slightly faster) version that ignores row and column names.
Currently, the function only works for real matrices, not complex ones.
is_symmetric(x, tol = 100 * .Machine$double.eps)
is_symmetric(x, tol = 100 * .Machine$double.eps)
x |
a matrix to be tested. |
tol |
the tolerance for comparing the numbers. |
a logical value indicating whether or not the matrix is symmetric
x <- matrix(1:16,ncol = 4) is_symmetric(x) x <- x %*% t(x) is_symmetric(x)
x <- matrix(1:16,ncol = 4) is_symmetric(x) x <- x %*% t(x) is_symmetric(x)
The functions described here are convenience functions to get
information out of a tskrrTune
object.
is_tuned(x) get_grid(x) get_loss_values(x) has_onedim(x)
is_tuned(x) get_grid(x) get_loss_values(x) has_onedim(x)
x |
a |
For is_tuned
: a logical value indicating whether the
model is tuned.
For get_grid
a list with the elements k
and
possibly g
, each containing the different lambdas tried in
the tuning for the row and column kernel matrices respectively.
For get_loss_values
a matrix with the calculated
loss values. Note that each row represents the result for one
lambda value related to the row kernel matrix K. For heterogeneous
models, every column represents the result for one lambda related
to the column kernel matrix G.
for is_onedim
a single logical value telling whether the
grid search in the object was onedimensional.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) tuned <- tune(mod, ngrid = 10) is_tuned(mod) is_tuned(tuned) # Basic visualization of the grid. gridvals <- get_grid(tuned) z <- get_loss_values(tuned) ## Not run: image(gridvals$k,gridvals$g,log(z), log = 'xy', xlab = "lambda k", ylab = "lambda g") ## End(Not run)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) tuned <- tune(mod, ngrid = 10) is_tuned(mod) is_tuned(tuned) # Basic visualization of the grid. gridvals <- get_grid(tuned) z <- get_loss_values(tuned) ## Not run: image(gridvals$k,gridvals$g,log(z), log = 'xy', xlab = "lambda k", ylab = "lambda g") ## End(Not run)
These functions allow you to extract the labels from a
tskrr
object. The function labels
and the
function dimnames
are aliases and do the exact same
thing. The functions rownames
and colnames
work like
you would expect. Note that contrary to the latter two, labels
will never return NULL
. If no labels are found, it will construct
labels using the prefixes defined in the argument prefix
.
## S3 method for class 'tskrr' labels( object, prefix = if (is_homogeneous(object)) "row" else c("row", "col"), ... ) ## S4 method for signature 'tskrr' labels( object, prefix = if (is_homogeneous(object)) "row" else c("row", "col"), ... ) ## S4 method for signature 'tskrr' dimnames(x) ## S4 method for signature 'tskrr' rownames(x, do.NULL = TRUE, prefix = "row") ## S4 method for signature 'tskrr' colnames(x, do.NULL = TRUE, prefix = "col")
## S3 method for class 'tskrr' labels( object, prefix = if (is_homogeneous(object)) "row" else c("row", "col"), ... ) ## S4 method for signature 'tskrr' labels( object, prefix = if (is_homogeneous(object)) "row" else c("row", "col"), ... ) ## S4 method for signature 'tskrr' dimnames(x) ## S4 method for signature 'tskrr' rownames(x, do.NULL = TRUE, prefix = "row") ## S4 method for signature 'tskrr' colnames(x, do.NULL = TRUE, prefix = "col")
object |
a |
prefix |
a prefix used for construction of the labels in case
none are available. For |
... |
arguments passed to/from other methods. |
x |
a |
do.NULL |
logical. If |
for labels
and dimnames
: a list with two elements k
and
g
If the original data didn't contain row- or column names for the
label matrix, rownames
and colnames
will return
NULL
. Other functions will extract the automatically generated
labels, so don't count on rownames
and colnames
if you
want to predict output from other functions!
This function fits a linear filter over a label matrix. It calculates the row, column and total means, and uses those to construct the linear filter.
linear_filter(y, alpha = 0.25, na.rm = FALSE)
linear_filter(y, alpha = 0.25, na.rm = FALSE)
y |
a label matrix |
alpha |
a vector with 4 alpha values, or a single alpha value which then is used for all 4 alphas. |
na.rm |
a logical value indicating whether missing values should be removed before calculating the row-, column- and total means. |
If there are missing values and they are removed before calculating the
means, a warning is issued. If na.rm = FALSE
and there are
missing values present, the outcome is, by definition, a matrix filled
with NA values.
an object of class linearFilter
data(drugtarget) linear_filter(drugTargetInteraction, alpha = 0.25) linear_filter(drugTargetInteraction, alpha = c(0.1,0.1,0.4,0.4))
data(drugtarget) linear_filter(drugTargetInteraction, alpha = 0.25) linear_filter(drugTargetInteraction, alpha = c(0.1,0.1,0.4,0.4))
The class represents the outcome of a linear filter, and is normally
generated by the function linear_filter
y
the original label matrix with responses.
alpha
a numeric vector with the 4 alpha values of the model.
pred
a matrix with the predictions
mean
a numeric vector containing the global mean of y
colmeans
a numeric vector containing the column means of y
rowmeans
a numeric vector containing the row means of y
.
na.rm
a logical value indicating whether missing values were removed prior to the calculation of the means.
linear_filter
for creating a linear filter model,
and getter fuctions for linearFilter
.
Perform a leave-one-out cross-validation for two-step kernel ridge regression based on the shortcuts described in Stock et al, 2018. (http://doi.org/10.1093/bib/bby095).
loo(x, ...) ## S4 method for signature 'tskrrHeterogeneous' loo( x, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE ) ## S4 method for signature 'tskrrHomogeneous' loo( x, exclusion = c("edges", "vertices", "interaction", "both"), replaceby0 = FALSE ) ## S4 method for signature 'linearFilter' loo(x, replaceby0 = FALSE)
loo(x, ...) ## S4 method for signature 'tskrrHeterogeneous' loo( x, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE ) ## S4 method for signature 'tskrrHomogeneous' loo( x, exclusion = c("edges", "vertices", "interaction", "both"), replaceby0 = FALSE ) ## S4 method for signature 'linearFilter' loo(x, replaceby0 = FALSE)
x |
an object of class |
... |
arguments passed to methods. See Details. |
exclusion |
a character value with possible values "interaction", "row", "column", "both" for heterogeneous models, and "edges", "vertices", "interaction" or "both" for homogeneous models. Defaults to "interaction". See details. |
replaceby0 |
a logical value indicating whether the interaction
should be simply removed ( |
The parameter exclusion
defines what is left out.
The value "interaction" means that a single interaction is removed.
In the case of a homogeneous model, this can be interpreted as the
removal of the interaction between two edges. The values "row" and
"column" mean that all interactions for a row edge resp. a column
edge are removed. The value "both" removes all interactions for
a row and a column edge.
In the case of a homogeneous model, "row" and "column" don't make sense and will be replaced by "both" with a warning. This can be interpreted as removing vertices, i.e. all interactions between one edge and all other edges. Alternatively one can use "edges" to remove edges and "vertices" to remove vertices. In the case of a homogeneous model, the setting "edges" translates to "interaction", and "vertices" translates to "both". For more information, see Stock et al. (2018).
Replacing by 0 only makes sense when exclusion = "interaction"
and the
label matrix contains only 0 and 1 values. The function checks whether
the conditions are fulfilled and if not, returns an error.
a numeric matrix with the leave-one-out predictions for the model.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim, lambda = c(0.01,0.01)) delta <- loo(mod, exclusion = 'both') - response(mod) delta0 <- loo(mod, replaceby0 = TRUE) - response(mod)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim, lambda = c(0.01,0.01)) delta <- loo(mod, exclusion = 'both') - response(mod) delta0 <- loo(mod, replaceby0 = TRUE) - response(mod)
These functions implement different cross-validation scenarios for two-step kernel ridge regression. It uses the shortcuts for leave-one-out cross-validation.
loo.i(Y, Hk, Hg, pred) loo.i0(Y, Hk, Hg, pred) loo.r(Y, Hk, Hg, ...) loo.c(Y, Hk, Hg, ...) loo.b(Y, Hk, Hg, ...) loo.e.sym(Y, Hk, pred) loo.e.skew(Y, Hk, pred) loo.e0.sym(Y, Hk, pred) loo.e0.skew(Y, Hk, pred) loo.v(Y, Hk, ...) loo.i.lf(Y, alpha, pred) loo.i0.lf(Y, alpha, pred)
loo.i(Y, Hk, Hg, pred) loo.i0(Y, Hk, Hg, pred) loo.r(Y, Hk, Hg, ...) loo.c(Y, Hk, Hg, ...) loo.b(Y, Hk, Hg, ...) loo.e.sym(Y, Hk, pred) loo.e.skew(Y, Hk, pred) loo.e0.sym(Y, Hk, pred) loo.e0.skew(Y, Hk, pred) loo.v(Y, Hk, ...) loo.i.lf(Y, alpha, pred) loo.i0.lf(Y, alpha, pred)
Y |
the matrix with responses |
Hk |
the hat matrix for the first kernel (rows of Y) |
Hg |
the hat matrix for the second kernel (columns of Y) |
pred |
the predictions |
... |
added to allow for specifying pred even when not needed. |
alpha |
a vector of length 4 with the alpha values from a
|
These functions are primarily for internal use and hence not exported. Be careful when using them, as they do not perform any sanity check on the input. It is up to the user to make sure the input makes sense.
a matrix with the leave-one-out predictions
loo
for the user-level function.
This function allows calculating the loss of a tskrr model using
either one of the functions defined in loss_functions
or a custom user function. If the model inherits from class
tskrrTune
and no additional arguments
are given, the loss is returned for the settings used when tuning.
The function can also be used to extract the original loss from a
permtest
object.
loss(x, ...) ## S4 method for signature 'tskrr' loss( x, fun = loss_mse, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, predictions = FALSE, ... ) ## S4 method for signature 'tskrrTune' loss( x, fun = loss_mse, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, predictions = FALSE, ... ) ## S4 method for signature 'permtest' loss(x, ...)
loss(x, ...) ## S4 method for signature 'tskrr' loss( x, fun = loss_mse, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, predictions = FALSE, ... ) ## S4 method for signature 'tskrrTune' loss( x, fun = loss_mse, exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, predictions = FALSE, ... ) ## S4 method for signature 'permtest' loss(x, ...)
x |
a model that inherits from class
|
... |
extra arguments passed to the loss function in |
fun |
a function to be used for calculating the loss. This can also be a character value giving the name of one of the loss functions provided in the package |
exclusion |
a character value with possible values "interaction",
"row", "column" or "both".
See also |
replaceby0 |
a logical value indicating whether the interaction
should be simply removed ( |
predictions |
a logical value to indicate whether the
predictions should be used instead of leave one out crossvalidation.
If set to |
a numeric value with the calculated loss
loss_functions
for possible loss functions
tune
for tuning a model based on loss functions
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) loss(mod, fun = loss_auc) tuned <- tune(mod, fun = loss_auc) loss(tuned) loss(tuned, fun = loss_mse)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) loss(mod, fun = loss_auc) tuned <- tune(mod, fun = loss_auc) loss(tuned) loss(tuned, fun = loss_mse)
These functions can be used as loss functions in tune
.
Currently, two functions are provided: a function calculating the
classic mean squared error (loss_mse
) and a function
calculating 1 - AUC (loss_auc
).
loss_mse(Y, LOO, na.rm = FALSE) loss_auc(Y, LOO)
loss_mse(Y, LOO, na.rm = FALSE) loss_auc(Y, LOO)
Y |
the label matrix with observed responses |
LOO |
the leave-one-out crossvalidation (or predictions if you
must). This one can be calculated by the function |
na.rm |
a logical value |
The AUC is calculated by sorting the Y
matrix based on
the order of the values in the LOO
matrix. The false and true
positive rates are calculated solely based on that ordering, which
allows for values in LOO
outside the range [0,1]. It's
a naive implementation which is good enough for tuning, but
shouldn't be used as a correct value for 1 - auc in case the
values in LOO
are outside the range [0,1].
The function loss_auc
should only be used for a Y
matrix that contains solely the values 0 and 1.
tune
for application of the loss function
x <- c(1,0,0,1,0,0,1,0,1) y <- c(0.8,-0.1,0.2,0.2,0.4,0.01,1.12,0.9,0.9) loss_mse(x,y) loss_auc(x,y)
x <- c(1,0,0,1,0,0,1,0,1) y <- c(0.8,-0.1,0.2,0.2,0.4,0.01,1.12,0.9,0.9) loss_mse(x,y) loss_auc(x,y)
Reorders the label matrix based on the labels of the kernel matrices.
In case there are no labels, the original label matrix is returned,
but with the labels in rows
and cols
as rownames and
column names respectively.
match_labels(y, rows, cols = NULL)
match_labels(y, rows, cols = NULL)
y |
a matrix representing the label matrix. |
rows |
a character vector with the labels for the rows or a matrix with rownames that will be used as labels. |
cols |
a character vector with the labels for the cols or a matrix
with colnames that will be used as labels. If |
a matrix with the rows and columns reordered.
mat <- matrix(1:6, ncol = 2, dimnames = list(c("b", "a", "d"), c("ca", "cb")) ) match_labels(mat, c("a","b", "d"), c("ca","cb")) #Using matrices data(drugtarget) out <- match_labels(drugTargetInteraction, targetSim, drugSim)
mat <- matrix(1:6, ncol = 2, dimnames = list(c("b", "a", "d"), c("ca", "cb")) ) match_labels(mat, c("a","b", "d"), c("ca","cb")) #Using matrices data(drugtarget) out <- match_labels(drugTargetInteraction, targetSim, drugSim)
This function does a permutation-based evaluation of the impact of different edges on the final result. It does so by permuting the kernel matrices, refitting the model and calculating a loss function.
permtest(x, ...) ## S3 method for class 'permtest' print(x, digits = max(3L, getOption("digits") - 3), ...) ## S4 method for signature 'tskrrHeterogeneous' permtest( x, n = 100, permutation = c("both", "row", "column"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, fun = loss_mse, exact = FALSE ) ## S4 method for signature 'tskrrHomogeneous' permtest( x, n = 100, permutation = c("both"), exclusion = c("interaction", "both"), replaceby0 = FALSE, fun = loss_mse, exact = FALSE ) ## S4 method for signature 'tskrrTune' permtest(x, permutation = c("both", "row", "column"), n = 100)
permtest(x, ...) ## S3 method for class 'permtest' print(x, digits = max(3L, getOption("digits") - 3), ...) ## S4 method for signature 'tskrrHeterogeneous' permtest( x, n = 100, permutation = c("both", "row", "column"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, fun = loss_mse, exact = FALSE ) ## S4 method for signature 'tskrrHomogeneous' permtest( x, n = 100, permutation = c("both"), exclusion = c("interaction", "both"), replaceby0 = FALSE, fun = loss_mse, exact = FALSE ) ## S4 method for signature 'tskrrTune' permtest(x, permutation = c("both", "row", "column"), n = 100)
x |
either a |
... |
arguments passed to other methods |
digits |
the number of digits shown in the output |
n |
the number of permutations for every kernel matrix |
permutation |
a character string that defines whether the row, column or both kernel matrices should be permuted. Ignored in case of a homogeneous network |
exclusion |
the exclusion to be used in the |
replaceby0 |
a logical value indicating whether |
fun |
a function (or a character string with the name of a
function) that calculates the loss. See also |
exact |
a logical value that indicates whether or not an exact p-value should be calculated, or be approximated based on a normal distribution. |
The test involved uses a normal approximation. It assumes that under the null hypothesis, the loss values are approximately normally distributed. The cumulative probability of a loss as small or smaller than the one found in the original model, is calculated based on a normal distribution from which the mean and sd are calculated from the permutations.
An object of the class permtest.
It should be noted that this normal approximation is an ad-hoc approach. There's no guarantee that the actual distribution of the loss under the null hypothesis is normal. Depending on the loss function, a significant deviation from the theoretic distribution can exist. Hence this functions should only be used as a rough guidance in model evaluation.
# Heterogeneous network data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) permtest(mod, fun = loss_auc)
# Heterogeneous network data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) permtest(mod, fun = loss_auc)
This class represents the permutation test outcomes. See also
the function permtest
.
orig_loss
a numeric value with the original loss of the model.
perm_losses
a numeric vector with the losses of the different permutations.
n
the number of permutations
loss_function
the function used to calculate the losses.
exclusion
a character value indicating the exclusion setting used for the test
replaceby0
a locigal value that indicates whether the
exclusion was done by replacing with zero. See also
loo
.
permutation
a character value that indicats in which kernel matrices were permuted.
pval
a p value indicating how likely it is to find a smaller loss than the one of the model based on a normal approximation.
exact
a logical value indicating whether the P value was calculated exactly or approximated by the normal distribution.
the function permtest
for the actual test.
the function loo
for the leave one out
procedures
the function t.test
for the actual test
The functions described here are convenience functions to get
information out of a permtest
object.
permutations(x) ## S4 method for signature 'permtest' x[i]
permutations(x) ## S4 method for signature 'permtest' x[i]
x |
a |
i |
either a numeric vector, a logical vector or a character vector with the elements that need extraction. |
the requested values
loss
to extract the original loss value.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) ptest <- permtest(mod, fun = loss_auc) loss(ptest) ptest[c(2,3)] permutations(ptest)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) ptest <- permtest(mod, fun = loss_auc) loss(ptest) ptest[c(2,3)] permutations(ptest)
With this function, you can visualize the grid search for optimal
lambdas from a tskrrTune
object.
In the case of two-dimensional grid search, this function plots a
contour plot on a grid, based on the functions image
and contour
. For one-dimensional grid search, the function
creates a single line plot.
plot_grid( x, addlambda = TRUE, lambdapars = list(col = "red"), log = TRUE, opts.contour = list(nlevels = 10), ... )
plot_grid( x, addlambda = TRUE, lambdapars = list(col = "red"), log = TRUE, opts.contour = list(nlevels = 10), ... )
x |
an object that inherits from
|
addlambda |
a logical value indicating whether the lambda with the minimum loss should be added to the plot. In case of a one dimensional plot, this adds a colored vertical line. In the case of a two dimensional plot, this adds a colored point at the minimum. |
lambdapars |
a list with named |
log |
a logical value indicating whether the lambdas should be plotted at a log scale (the default) or not. |
opts.contour |
options passed to the function
|
... |
arguments passed to other functions. For a one
dimensional plot, this will be the function |
NULL
invisibly
data(drugtarget) ## One dimensional tuning tuned1d <- tune(drugTargetInteraction, targetSim, drugSim, lim = c(1e-4,2), ngrid = 40, fun = loss_auc, onedim = TRUE) plot_grid(tuned1d) plot_grid(tuned1d, lambdapars = list(col = "green", lty = 1, lwd = 2), log = FALSE, las = 2, main = "1D tuning") ## Two dimensional tuning tuned2d <- tune(drugTargetInteraction, targetSim, drugSim, lim = c(1e-4,10), ngrid = 20, fun = loss_auc) plot_grid(tuned2d)
data(drugtarget) ## One dimensional tuning tuned1d <- tune(drugTargetInteraction, targetSim, drugSim, lim = c(1e-4,2), ngrid = 40, fun = loss_auc, onedim = TRUE) plot_grid(tuned1d) plot_grid(tuned1d, lambdapars = list(col = "green", lty = 1, lwd = 2), log = FALSE, las = 2, main = "1D tuning") ## Two dimensional tuning tuned2d <- tune(drugTargetInteraction, targetSim, drugSim, lim = c(1e-4,10), ngrid = 20, fun = loss_auc) plot_grid(tuned2d)
This function plots a heatmap of the fitted values in a
tskrr
model. The function is loosely based on
heatmap
, but uses a different mechanism and adds
a legend by default.
## S3 method for class 'tskrr' plot( x, dendro = c("both", "row", "col", "none"), which = c("fitted", "loo", "response", "residuals"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, nbest = 0, rows, cols, col = rev(heat.colors(20)), breaks = NULL, legend = TRUE, main = NULL, xlab = NULL, ylab = NULL, labRow = NULL, labCol = NULL, margins = c(5, 5), ... )
## S3 method for class 'tskrr' plot( x, dendro = c("both", "row", "col", "none"), which = c("fitted", "loo", "response", "residuals"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, nbest = 0, rows, cols, col = rev(heat.colors(20)), breaks = NULL, legend = TRUE, main = NULL, xlab = NULL, ylab = NULL, labRow = NULL, labCol = NULL, margins = c(5, 5), ... )
x |
a tskrr model |
dendro |
a character value indicating whether a dendrogram should be constructed. |
which |
a character value indicating whether the fitted values, the leave-one-out values, the original response values or the residuals should be plotted. |
exclusion |
if |
replaceby0 |
if |
nbest |
a single integer value indicating the amount of best values
that should be selected. If |
rows |
a numeric or character vector indicating which rows should be selected from the model. |
cols |
a numeric or character vector indicating which columns should be selected from the model. |
col |
a vector with colors to be used for plotting |
breaks |
a single value specifying the number of
breaks (must be 1 more than number of colors), or a numeric
vector with the breaks used for the color code. If |
legend |
a logical value indicating whether or not the legend should be added to the plot. |
main |
a character value with a title for the plot |
xlab |
a character label for the X axis |
ylab |
a character label for the Y axis |
labRow |
a character vector with labels to be used on the rows.
Note that these labels are used as is (possibly reordered to match
the dendrogram). They can replace the labels from the model. Set to
|
labCol |
the same as |
margins |
a numeric vector with 2 values indicating the margins to
be used for the row and column labels (cfr |
... |
currently ignored |
The function can select a part of the model for plotting. Either you
specify rows
and cols
, or you specify nbest
.
If nbest
is specified, rows
and cols
are ignored.
The n highest values are looked up in the plotted values, and only
the rows and columns related to these values are shown then. This
allows for a quick selection of the highest predictions.
Dendrograms are created by converting the kernel matrices to a distance, using
d(x,y) = K(x,x)^2 + K(y,y)^2 - 2*K(x,y)
with K being the kernel function. The resulting distances are
clustered using hclust
and converted to a
dendrogram using as.dendrogram
.
an invisible list with the following elements:
val
: the values plotted
ddK
: if a row dendrogram was requested, the row dendrogram
ddG
: if a column dendrogram was requested,
the column dendrogram
breaks
: the breaks used for the color codes
col
: the colors used
tskrr
, tune
and
link{impute_tskrr}
to construct tskrr models.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) plot(mod) plot(mod, dendro = "row", legend = FALSE) plot(mod, col = rainbow(20), dendro = "none", which = "residuals") plot(mod, labCol = NA, labRow = NA, margins = c(0.2,0.2))
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) plot(mod) plot(mod, dendro = "row", legend = FALSE) plot(mod, col = rainbow(20), dendro = "none", which = "residuals") plot(mod, labCol = NA, labRow = NA, margins = c(0.2,0.2))
Obtains the predictions from a tskrr
model for new data.
To get the predictions on the training data,
use the function fitted
or set both k
and g
to NULL
.
## S3 method for class 'tskrr' predict(object, k = NULL, g = NULL, testdim = TRUE, ...) ## S4 method for signature 'tskrr' predict(object, k = NULL, g = NULL, testdim = TRUE, ...)
## S3 method for class 'tskrr' predict(object, k = NULL, g = NULL, testdim = TRUE, ...) ## S4 method for signature 'tskrr' predict(object, k = NULL, g = NULL, testdim = TRUE, ...)
object |
an object of class |
k |
a new K matrix or |
g |
a new G matrix or |
testdim |
a logical value indicating whether the dimensions should
be checked prior to the calculation. You can set this to |
... |
arguments passed to or from other methods |
Predictions can be calculated between new vertices and the vertices used to train the model, between new sets of vertices, or both. Which predictions are given, depends on the kernel matrices passed to the function.
In any case, both the K and G matrix need the kernel values for every combination of the new vertices and the vertices used to train the model. This is illustrated for both homogeneous and heterogeneous networks in the examples.
To predict the links between a new set of vertices and the training
vertices, you need to provide the kernel matrix for either the K
or the G set of vertices. If you want to predict the mutual links
between two new sets of vertices, you have to provide both the
K and the G matrix. This is particularly important for homogeneous
networks: if you only supply the k
argument, you will get
predictions for the links between the new vertices and the vertices
on which the model is trained. So in order to get the
mutual links between the new vertices, you need to provide the kernel
matrix as the value for both the k
and the g
argument.
a matrix with predicted values.
This function is changed in version 0.1.9 so it's more consistent in how it expects the K and G matrices to be ordered. Up to version 0.1.8 the new vertices should be on the rows for the K matrix and on the columns for the G matrix. This lead to confusion.
If you're using old code, you'll get an error pointing this out. You need to transpose the G matrix in the old code to make it work with the new version.
tskrr
and tskrrTune
for
fitting the models.
## Predictions for homogeneous networks data(proteinInteraction) idnew <- sample(nrow(Kmat_y2h_sc), 20) trainY <- proteinInteraction[-idnew,-idnew] trainK <- Kmat_y2h_sc[-idnew,-idnew] testK <- Kmat_y2h_sc[idnew, - idnew] mod <- tskrr(trainY, trainK, lambda = 0.1) # Predict interaction between test vertices predict(mod, testK, testK) # Predict interaction between test and train vertices predict(mod, testK) predict(mod, g = testK) ## Predictions for heterogeneous networks data("drugtarget") idnewK <- sample(nrow(targetSim), 10) idnewG <- sample(ncol(drugSim), 10) trainY <- drugTargetInteraction[-idnewK, -idnewG] trainK <- targetSim[-idnewK, -idnewK] trainG <- drugSim[-idnewG, -idnewG] testK <- targetSim[idnewK, -idnewK] testG <- drugSim[idnewG, -idnewG] mod <- tskrr(trainY, trainK, trainG, lambda = 0.01) # Predictions for new targets on drugs in model predict(mod, testK) # Predictions for new drugs on targets in model predict(mod, g = testG) # Predictions for new drugs and targets predict(mod, testK, testG)
## Predictions for homogeneous networks data(proteinInteraction) idnew <- sample(nrow(Kmat_y2h_sc), 20) trainY <- proteinInteraction[-idnew,-idnew] trainK <- Kmat_y2h_sc[-idnew,-idnew] testK <- Kmat_y2h_sc[idnew, - idnew] mod <- tskrr(trainY, trainK, lambda = 0.1) # Predict interaction between test vertices predict(mod, testK, testK) # Predict interaction between test and train vertices predict(mod, testK) predict(mod, g = testK) ## Predictions for heterogeneous networks data("drugtarget") idnewK <- sample(nrow(targetSim), 10) idnewG <- sample(ncol(drugSim), 10) trainY <- drugTargetInteraction[-idnewK, -idnewG] trainK <- targetSim[-idnewK, -idnewK] trainG <- drugSim[-idnewG, -idnewG] testK <- targetSim[idnewK, -idnewK] testG <- drugSim[idnewG, -idnewG] mod <- tskrr(trainY, trainK, trainG, lambda = 0.01) # Predictions for new targets on drugs in model predict(mod, testK) # Predictions for new drugs on targets in model predict(mod, g = testG) # Predictions for new drugs and targets predict(mod, testK, testG)
A dataset for examining the interaction between proteins of yeast. The dataset consists of the following objects:
proteinInteraction
proteinInteraction
proteinInteraction: a numeric square matrix with 150 rows/columns
Kmat_y2h_sc: a numeric square matrix with 150 rows/columns
proteinInteraction: the label matrix based on the protein network taken from the KEGG/PATHWAY database
Kmat_y2h_sc: a kernel matrix indicating similarity of proteins.
The proteins in the dataset are a subset of the 769 proteins used in Yamanishi et al (2004). The kernel matrix used is the combination of 4 kernels: one based on expression data, one on protein interaction data, one on localization data and one on phylogenetic profile. These kernels and their combination are also explained in Yamanishi et al (2004).
https://doi.org/10.1093/bioinformatics/bth910
Yamanishi et al, 2004: Protein network inference from multiple genomic data: a supervised approach.
This function returns the residuals for
an object inheriting from class tskrr
residuals(object, ...) ## S3 method for class 'tskrr' residuals( object, method = c("predictions", "loo"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, ... ) ## S4 method for signature 'tskrr' residuals( object, method = c("predictions", "loo"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, ... )
residuals(object, ...) ## S3 method for class 'tskrr' residuals( object, method = c("predictions", "loo"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, ... ) ## S4 method for signature 'tskrr' residuals( object, method = c("predictions", "loo"), exclusion = c("interaction", "row", "column", "both"), replaceby0 = FALSE, ... )
object |
a tskrr model |
... |
arguments passed from/to other methods. |
method |
a character value indicating whether the residuals should be based on the predictions or on a leave-one-out crossvalidation. |
exclusion |
a character value with possible values "interaction", "row", "column", "both" for heterogeneous models, and "edges", "vertices", "interaction" or "both" for homogeneous models. Defaults to "interaction". See details. |
replaceby0 |
a logical value indicating whether the interaction
should be simply removed ( |
The parameter exclusion
defines what is left out.
The value "interaction" means that a single interaction is removed.
In the case of a homogeneous model, this can be interpreted as the
removal of the interaction between two edges. The values "row" and
"column" mean that all interactions for a row edge resp. a column
edge are removed. The value "both" removes all interactions for
a row and a column edge.
In the case of a homogeneous model, "row" and "column" don't make sense and will be replaced by "both" with a warning. This can be interpreted as removing vertices, i.e. all interactions between one edge and all other edges. Alternatively one can use "edges" to remove edges and "vertices" to remove vertices. In the case of a homogeneous model, the setting "edges" translates to "interaction", and "vertices" translates to "both". For more information, see Stock et al. (2018).
Replacing by 0 only makes sense when exclusion = "interaction"
and the
label matrix contains only 0 and 1 values. The function checks whether
the conditions are fulfilled and if not, returns an error.
a matrix(!) with the requested residuals
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim, lambda = c(0.01,0.01)) delta <- response(mod) - loo(mod, exclusion = "both") resid <- residuals(mod, method = "loo", exclusion = "both") all.equal(delta, resid)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim, lambda = c(0.01,0.01)) delta <- response(mod) - loo(mod, exclusion = "both") resid <- residuals(mod, method = "loo", exclusion = "both") all.equal(delta, resid)
The functions described here are convenience functions to get
information out of a tskrr
object.
## S4 method for signature 'tskrr' response(x, ...) ## S4 method for signature 'tskrrHomogeneous' lambda(x) ## S4 method for signature 'tskrrHeterogeneous' lambda(x) is_tskrr(x) is_homogeneous(x) is_heterogeneous(x) symmetry(x) get_eigen(x, which = c("row", "column")) get_kernelmatrix(x, which = c("row", "column")) has_hat(x) get_kernel(x, which = c("row", "column"))
## S4 method for signature 'tskrr' response(x, ...) ## S4 method for signature 'tskrrHomogeneous' lambda(x) ## S4 method for signature 'tskrrHeterogeneous' lambda(x) is_tskrr(x) is_homogeneous(x) is_heterogeneous(x) symmetry(x) get_eigen(x, which = c("row", "column")) get_kernelmatrix(x, which = c("row", "column")) has_hat(x) get_kernel(x, which = c("row", "column"))
x |
a |
... |
arguments passed to other methods. |
which |
a character value indicating whether the eigen decomposition for the row kernel matrix or the column kernel matrix should be returned. |
For response
: the original label matrix
For lambda
: a named numeric vector with one resp both lambda
values used in the model. The names are "k" and "g" respectively.
For is_tskrr
a logical value indicating whether the
object is a tskrr
object
For is_homogeneous
a logical value indicating whether the
tskrr model is a homogeneous one.
For is_heterogeneous
a logical value indicating whether the
tskrr model is a heterogeneous one.
For symmetry
a character value indicating the symmetry
for a homogeneous model
. If
the model is not homogeneous, NA
is returned.
For get_eigen
the eigen decomposition of the requested
kernel matrix.
For get_kernelmatrix
the original kernel matrix
for the rows or columns.
For has_hat
a logical value indicating whether
the tskrr model contains the kernel hat matrices.
The function get_kernel
is deprecated.
Use get_kernelmatrix
instead.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) is_homogeneous(mod) EigR <- get_eigen(mod) EigC <- get_eigen(mod, which = 'column') lambda(mod)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) is_homogeneous(mod) EigR <- get_eigen(mod) EigC <- get_eigen(mod, which = 'column') lambda(mod)
This function tells you whether a matrix is symmetric,
skewed symmetric, or not symmetric. It's used by tskrr
to determine which kind of homologous network is represented by
the label matrix.
test_symmetry(x, tol = .Machine$double.eps)
test_symmetry(x, tol = .Machine$double.eps)
x |
a matrix |
tol |
a single numeric value with the tolerance for comparison |
a character value with the possible values "symmetric", "skewed" or "none".
tskrrHomogeneous
for
more information on the values for the slot symmetry
mat1 <- matrix(c(1,0,0,1),ncol = 2) test_symmetry(mat1) mat2 <- matrix(c(1,0,0,-1), ncol = 2) test_symmetry(mat2) mat3 <- matrix(1:4, ncol = 2) test_symmetry(mat3)
mat1 <- matrix(c(1,0,0,1),ncol = 2) test_symmetry(mat1) mat2 <- matrix(c(1,0,0,-1), ncol = 2) test_symmetry(mat2) mat3 <- matrix(1:4, ncol = 2) test_symmetry(mat3)
tskrr
is the primary function for fitting a two-step kernel
ridge regression model. It can be used for both homogeneous and heterogeneous
networks.
tskrr( y, k, g = NULL, lambda = 1e-04, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE )
tskrr( y, k, g = NULL, lambda = 1e-04, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE )
y |
a label matrix |
k |
a kernel matrix for the rows |
g |
an optional kernel matrix for the columns |
lambda |
a numeric vector with one or two values for the hyperparameter lambda. If two values are given, the first one is used for the k matrix and the second for the g matrix. |
testdim |
a logical value indicating whether symmetry
and the dimensions of the kernel(s) should be tested.
Defaults to |
testlabels |
a logical value indicating wether the row- and column
names of the matrices have to be checked for consistency. Defaults to
|
symmetry |
a character value with the possibilities "auto", "symmetric" or "skewed". In case of a homogeneous fit, you can either specify whether the label matrix is symmetric or skewed, or you can let the function decide (option "auto"). |
keep |
a logical value indicating whether the kernel hat
matrices should be stored in the model object. Doing so makes the
model object quite larger, but can speed up predictions in
some cases. Defaults to |
a tskrr
object
response
, fitted
,
get_eigen
, eigen2hat
# Heterogeneous network data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) Y <- response(mod) pred <- fitted(mod) # Homogeneous network data(proteinInteraction) modh <- tskrr(proteinInteraction, Kmat_y2h_sc) Yh <- response(modh) pred <- fitted(modh)
# Heterogeneous network data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) Y <- response(mod) pred <- fitted(mod) # Homogeneous network data(proteinInteraction) modh <- tskrr(proteinInteraction, Kmat_y2h_sc) Yh <- response(modh) pred <- fitted(modh)
The class tskrr represents a two step kernel ridge regression fitting
object, and is normally generated by the function tskrr
.
This is a superclass so it should not be instantiated directly.
y
the matrix with responses
k
the eigen decomposition of the kernel matrix for the rows
lambda.k
the lambda value used for k
pred
the matrix with the predictions
has.hat
a logical value indicating whether the kernel hat matrices are stored in the object.
Hk
the kernel hat matrix for the rows.
labels
a list with two character vectors, k
and
g
, containing the labels for the rows resp. columns. See
tskrrHomogeneous
and
tskrrHeterogeneous
for more details.
the classes tskrrHomogeneous
and
tskrrHeterogeneous
for the actual classes.
This function provides an interface for two-step kernel ridge regression.
To use this function, you need at least one kernel matrix and one
label matrix. It's the internal engine used by the function
tskrr
.
tskrr.fit(y, k, g = NULL, lambda.k = NULL, lambda.g = NULL, ...)
tskrr.fit(y, k, g = NULL, lambda.k = NULL, lambda.g = NULL, ...)
y |
a matrix representing the links between the nodes of both networks. |
k |
an object of class |
g |
an optional object of class |
lambda.k |
a numeric value for the lambda parameter tied to the first kernel. |
lambda.g |
a numeric value for the lambda parameter tied
to the second kernel. If |
... |
arguments passed to other functions. Currently ignored. |
This function is mostly available for internal use. In most cases, it
makes much more sense to use tskrr
, as that function
returns an object one can work with. The function
tskrr.fit
could be useful when doing simulations or
fitting algorithms, as the information returned from this function
is enough to use the functions returned by get_loo_fun
.
a list with three elements:
k : the hat matrix for the rows
g : the hat matrix for the columns (or NULL
)
for homogeneous networks.
pred : the predictions
data(drugtarget) K <- eigen(targetSim) G <- eigen(drugSim) res <- tskrr.fit(drugTargetInteraction,K,G, lambda.k = 0.01, lambda.g = 0.05)
data(drugtarget) K <- eigen(targetSim) G <- eigen(drugSim) res <- tskrr.fit(drugTargetInteraction,K,G, lambda.k = 0.01, lambda.g = 0.05)
The class tskrrHeterogeneous is a subclass of the superclass
tskrr
specifically for
heterogeneous networks.
y
the matrix with responses
k
the eigen decomposition of the kernel matrix for the rows
lambda.k
the lambda value used for k
pred
the matrix with the predictions
g
the eigen decomposition of the kernel matrix for the columns
lambda.g
the lambda value used for g
has.hat
a logical value indicating whether the kernel hat matrices are stored in the object.
Hk
the kernel hat matrix for the rows.
Hg
the kernel hat matrix for the columns.
labels
a list with elements k
and g
(see
tskrr-class
).
If any element is NA
, the labels used
are integers indicating the row resp column number.
The class tskrrHomogeneous is a subclass of the superclass
tskrr
specifically for
homogeneous networks.
y
the matrix with responses
k
the eigen decomposition of the kernel matrix for the rows
lambda.k
the lambda value used for k
pred
the matrix with the predictions
symmetry
a character value that can have the possible values
"symmetric"
, "skewed"
or "not"
. It indicates
whether the y
matrix is symmetric, skewed-symmetric or not
symmetric.
has.hat
a logical value indicating whether the kernel hat matrices are stored in the object.
Hk
the kernel hat matrix for the rows.
labels
a list with elements k
and g
(see
tskrr-class
). For homogeneous networks, g
is always NA
. If k
is NA
, the labels used
are integers indicating the row resp column number.
The class tskrrImpute
is a virtual class that represents a
tskrr
model with imputed values in
the label matrix Y. Apart from the model, it contains the
following extra information on the imputed values.
imputeid
a vector with integer values indicating which of
the values in y
are imputed
niter
an integer value gving the number of iterations used
tol
a numeric value with the tolerance used
The class tskrrImputeHeterogeneous
is a subclass of the
class tskrrHeterogeneous
and
tskrrImpute
specifically for heterogeneous networks with imputed values. It is
the result of the function impute_tskrr
.
y
the matrix with responses
k
the eigen decomposition of the kernel matrix for the rows
lambda.k
the lambda value used for k
pred
the matrix with the predictions
g
the eigen decomposition of the kernel matrix for the columns
lambda.g
the lambda value used for g
has.hat
a logical value indicating whether the kernel hat matrices are stored in the object.
Hk
the kernel hat matrix for the rows.
Hg
the kernel hat matrix for the columns.
labels
a list with elements k
and g
(see
tskrr-class
).
If any element is NA
, the labels used
are integers indicating the row resp column number.
imputeid
a vector with integer values indicating which of
the values in y
are imputed
niter
an integer value gving the number of iterations used
tol
a numeric value with the tolerance used
The class tskrrImputeHomogeneous
is a subclass of the
class tskrrHomogeneous
and
tskrrImpute
specifically for homogeneous networks with imputed values. It is
the result of the function impute_tskrr
on a
homogeneous network model.
y
the matrix with responses
k
the eigen decomposition of the kernel matrix for the rows
lambda.k
the lambda value used for k
pred
the matrix with the predictions
symmetry
a character value that can have the possible values
"symmetric"
, "skewed"
or "not"
. It indicates
whether the y
matrix is symmetric, skewed-symmetric or not
symmetric.
has.hat
a logical value indicating whether the kernel hat matrices are stored in the object.
Hk
the kernel hat matrix for the rows.
labels
a list with elements k
and g
(see
tskrr-class
). For homogeneous networks, g
is always NA
. If k
is NA
, the labels used
are integers indicating the row resp column number.
imputeid
a vector with integer values indicating which of
the values in y
are imputed
niter
an integer value gving the number of iterations used
tol
a numeric value with the tolerance used
The class tskrrTune represents a tuned tskrr
model, and is the output of the function tune
. Apart from
the model, it contains extra information on the tuning procedure. This is
a virtual class only.
lambda_grid
a list object with the elements k
and possibly
g
indicating the tested lambda values for the row kernel K
and - if applicable - the column kernel G
. Both elements have
to be numeric.
best_loss
a numeric value with the loss associated with the best lambdas
loss_values
a matrix with the loss results from the searched grid. The rows form the X dimension (related to the first lambda), the columns form the Y dimension (related to the second lambda if applicable)
loss_function
the used loss function
exclusion
a character value describing the exclusion used
replaceby0
a logical value indicating whether or not the cross validation replaced the excluded values by zero
onedim
a logical value indicating whether the grid search was done in one dimension. For homogeneous networks, this is true by default.
the function tune
for the tuning itself
the class tskrrTuneHomogeneous
and
tskrrTuneHeterogeneous
for the actual classes.
The class tskrrTuneHeterogeneous represents a tuned Heterogeneous
tskrr
model. It inherits from
the classes tskrrHeterogeneous
and tskrrTune
.
The class tskrrTuneHomogeneous represents a tuned homogeneous
tskrr
model. It inherits from
the classes tskrrHomogeneous
and tskrrTune
.
This function lets you tune the lambda parameter(s) of a two-step
kernel ridge regression model for optimal performance. You can either
tune a previously fitted tskrr
model, or pass the
label matrix and kernel matrices to fit and tune a model in
one go.
## S4 method for signature 'tskrrHomogeneous' tune( x, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "edges", replaceby0 = FALSE, onedim = TRUE, ... ) ## S4 method for signature 'tskrrHeterogeneous' tune( x, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "interaction", replaceby0 = FALSE, onedim = FALSE, ... ) ## S4 method for signature 'matrix' tune( x, k, g = NULL, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "interaction", replaceby0 = FALSE, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE, onedim = is.null(g), ... )
## S4 method for signature 'tskrrHomogeneous' tune( x, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "edges", replaceby0 = FALSE, onedim = TRUE, ... ) ## S4 method for signature 'tskrrHeterogeneous' tune( x, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "interaction", replaceby0 = FALSE, onedim = FALSE, ... ) ## S4 method for signature 'matrix' tune( x, k, g = NULL, lim = c(1e-04, 1), ngrid = 10, lambda = NULL, fun = loss_mse, exclusion = "interaction", replaceby0 = FALSE, testdim = TRUE, testlabels = TRUE, symmetry = c("auto", "symmetric", "skewed"), keep = FALSE, onedim = is.null(g), ... )
x |
a |
lim |
a vector with 2 values that give the boundaries for the domain in which lambda is searched, or possibly a list with 2 elements. See details |
ngrid |
a single numeric value giving the number of points in a single dimension of the grid, or possibly a list with 2 elements. See details. |
lambda |
a vector with the lambdas that need checking for
homogeneous networks, or possibly a list with two elements for
heterogeneous networks. See Details. Defaults to
|
fun |
a loss function that takes the label matrix Y and the result of the crossvalidation LOO as input. The function name can be passed as a character string as well. |
exclusion |
a character value with possible values "interaction", "row", "column", "both" for heterogeneous models, and "edges", "vertices", "interaction" or "both" for homogeneous models. Defaults to "interaction". See details. |
replaceby0 |
a logical value indicating whether the interaction
should be simply removed ( |
onedim |
a logical value indicating whether the search should be done in a single dimension. See details. |
... |
arguments to be passed to the loss function |
k |
a kernel matrix for the rows |
g |
an optional kernel matrix for the columns |
testdim |
a logical value indicating whether symmetry
and the dimensions of the kernel(s) should be tested.
Defaults to |
testlabels |
a logical value indicating wether the row- and column
names of the matrices have to be checked for consistency. Defaults to
|
symmetry |
a character value with the possibilities "auto", "symmetric" or "skewed". In case of a homogeneous fit, you can either specify whether the label matrix is symmetric or skewed, or you can let the function decide (option "auto"). |
keep |
a logical value indicating whether the kernel hat
matrices should be stored in the model object. Doing so makes the
model object quite larger, but can speed up predictions in
some cases. Defaults to |
This function currently only performs a simple grid search for all
(combinations of) lambda values. If no specific lambda values are
provided, then the function uses create_grid
to
create an evenly spaced (on a logarithmic scale) grid.
In the case of a heterogeneous network, you can specify different values
for the two parameters that need tuning. To do so, you need to
provide a list with the settings for every parameter to the arguments
lim
, ngrid
and/or lambda
. If you
try this for a homogeneous network, the function will return an error.
Alternatively, you can speed up the grid search by searching in a
single dimension. When onedim = TRUE
, the search for a
heterogeneous network will only consider cases where both lambda values
are equal.
The arguments exclusion
and replaceby0
are used by
the function get_loo_fun
to find the correct
leave-one-out function.
By default, the function uses standard mean squared error based on
the cross-validation results as a measure for optimization. However, you
can provide a custom function if needed, as long as it takes
two matrices as input: Y
being the observed interactions and
LOO
being the result of the chosen cross-validation.
a model of class tskrrTune
loo
, loo_internal
and
get_loo_fun
for more information on how leave one out
validation works.
tskrr
for fitting a twostep kernel ridge regression.
loss_functions
for different loss functions.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) tuned <- tune(mod, lim = c(0.1,1), ngrid = list(5,10), fun = loss_auc) ## Not run: # This is just some visualization of the matrix # It can be run safely. gridvals <- get_grid(tuned) z <- get_loss_values(tuned) # loss values image(gridvals$k,gridvals$g,z, log = 'xy', xlab = "lambda k", ylab = "lambda g", col = rev(heat.colors(20))) ## End(Not run)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) tuned <- tune(mod, lim = c(0.1,1), ngrid = list(5,10), fun = loss_auc) ## Not run: # This is just some visualization of the matrix # It can be run safely. gridvals <- get_grid(tuned) z <- get_loss_values(tuned) # loss values image(gridvals$k,gridvals$g,z, log = 'xy', xlab = "lambda k", ylab = "lambda g", col = rev(heat.colors(20))) ## End(Not run)
This function allows you to refit a tskrr
with a
new lambda. It can be used to do manual tuning/cross-validation.
If the object has the hat matrices stored, these are updated
as well.
update(object, ...) ## S4 method for signature 'tskrrHomogeneous' update(object, lambda) ## S4 method for signature 'tskrrHeterogeneous' update(object, lambda)
update(object, ...) ## S4 method for signature 'tskrrHomogeneous' update(object, lambda) ## S4 method for signature 'tskrrHeterogeneous' update(object, lambda)
object |
a |
... |
arguments passed to methods |
lambda |
a numeric vector with one or two values for the hyperparameter lambda. If two values are given, the first one is used for the k matrix and the second for the g matrix. |
an updated tskrr
object
fitted with the new lambdas.
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) # Update with the same lambda mod2 <- update(mod, lambda = 1e-3) # Use different lambda for rows and columns mod3 <- update(mod, lambda = c(0.01,0.001)) # A model with the hat matrices stored lambda <- c(0.001,0.01) modkeep <- tskrr(drugTargetInteraction, targetSim, drugSim, keep = TRUE) Hk_1 <- hat(modkeep, which = "row") modkeep2 <- update(modkeep, lambda = lambda) Hk_2 <- hat(modkeep2, which = "row") # Calculate new hat matrix by hand: decomp <- get_eigen(modkeep, which = "row") Hk_byhand <- eigen2hat(decomp$vectors, decomp$values, lambda = lambda[1]) identical(Hk_2, Hk_byhand)
data(drugtarget) mod <- tskrr(drugTargetInteraction, targetSim, drugSim) # Update with the same lambda mod2 <- update(mod, lambda = 1e-3) # Use different lambda for rows and columns mod3 <- update(mod, lambda = c(0.01,0.001)) # A model with the hat matrices stored lambda <- c(0.001,0.01) modkeep <- tskrr(drugTargetInteraction, targetSim, drugSim, keep = TRUE) Hk_1 <- hat(modkeep, which = "row") modkeep2 <- update(modkeep, lambda = lambda) Hk_2 <- hat(modkeep2, which = "row") # Calculate new hat matrix by hand: decomp <- get_eigen(modkeep, which = "row") Hk_byhand <- eigen2hat(decomp$vectors, decomp$values, lambda = lambda[1]) identical(Hk_2, Hk_byhand)
These functions allow you to check whether the dimensions of the
label matrix and the kernel matrix (matrices) are compatible.
valid_dimensions
checks whether both k and g are square matrices,
whether y has as many rows as k and whether y has as many columns as g.
is_square
checks whether both dimensions are the same.
valid_dimensions(y, k, g = NULL) is_square(x)
valid_dimensions(y, k, g = NULL) is_square(x)
y |
a label matrix |
k |
a kernel matrix |
g |
an optional second kernel matrix or |
x |
any matrix |
a logical value indicating whether the dimensions of the matrices are compatible for a two step kernel ridge regression.
The function is_square
is not exported
This function checks whether the labels between the Y, K, and G
matrices make sense. This means that all the labels found as
rownames for y
can be found as rownames and column
names of k
, and all the colnames for y
can be found
as rownames and colnames of g
(if provided).
valid_labels(y, k, g = NULL)
valid_labels(y, k, g = NULL)
y |
the label matrix |
k |
the kernel matrix for the rows |
g |
the kernel matrix for the columns (optional). If not available,
it takes the value |
Compatible labels mean that it is unequivocally clear which rows and columns can be linked throughout the model. In case none of the matrices have row- or colnames, the labels are considered compatible. In all other cases, all matrices should have both row and column names. They should fulfill the following conditions:
the row- and column names of a kernel matrix must contain the same values in the same order. Otherwise, the matrix can't be symmetric.
the rownames of y
should correspond to the rownames
of k
the colnames of y
should correspond to the colnames
of g
if it is supplied, or the colnames of k
in
case g
is NULL
TRUE
if all labels are compatible, an error otherwise.
This is a non-exported convenience function.
This function calculates the weight matrix for calculating the predictions of a tskrr model.
## S4 method for signature 'tskrrHeterogeneous' weights(object) ## S4 method for signature 'tskrrHomogeneous' weights(object)
## S4 method for signature 'tskrrHeterogeneous' weights(object) ## S4 method for signature 'tskrrHomogeneous' weights(object)
object |
a |
The weight matrix is calculated from the map matrices through the
function eigen2map
.
a matrix with the weights for the tskrr model.
The package xnet
adds a S4 generic function
for weights
.