A T-SQL Adapter for the Bioconductor and GraphViz R Packges.

This was published by SQLServerCentral, it describes a generic stored procedure in SQL Server that accepts a set of ‘Node’ names and ‘Edges’ (Node refer to Node) as inputs  and returns an R script that produces the diagrams. There are two great things about the R packages, they support network hierarchies with multiple edges for each node, something many similar R packages (Data Tree) and SQL Server features (recursive queries/hierarchy data types) don’t do well. The diagrams also scale well, when there are thousands of nodes the graph is still laid out nicely in most cases.

Example Network 

The diagram below shows a machine generated network topology consisting of 36 nodes, each node is referenced by the nodes immediately left, above left and below left.

The diagram below shows a subset of the same network, ‘I19’ and ‘H24’ were selected as the anchors then only nodes referred to or referred to by were included. Labels can be applied to each node, there are various attributes for both nodes and edges, just the anchor node fill colour was set below.


Simple Example Execution
Execute the script below, after running the sp_GraphNELGene setup:


---------------------------------------------------------------------------
-- Simple hierarchy
USE tempdb;
GO
DECLARE @Node AS GraphNELNodeTableType; 
DECLARE @NodeReferedTo AS GraphNELNodeRefersToTableType;
DECLARE @AnchorNodes GraphNELAnchorNodeTableType;
 
INSERT INTO @Node (NodeName, NodeLabel) SELECT '1','Kirk';
INSERT INTO @Node (NodeName, NodeLabel) SELECT '2','Spock';
INSERT INTO @Node (NodeName, NodeLabel) SELECT '3','Scotty';
 
INSERT INTO @NodeReferedTo SELECT '1','2';
INSERT INTO @NodeReferedTo SELECT '2','3';
INSERT INTO @NodeReferedTo SELECT '3','1';
 
INSERT INTO @AnchorNodes SELECT '1';
 
EXECUTE dbo.sp_GraphNELGene 
   @Node
  ,@NodeReferedTo
  ,@AnchorNodes
  ,@IncludeLabels = 1
GO
 

The stored procedures generates and returns the R script below:


#SetRepositories()
#source("https://bioconductor.org/biocLite.R")
#biocLite()
#install.packages("graph")
#install.packages("graphNEL")
#install.packages("Rgraphviz")
rm(list = ls())
library(graph)
library(Rgraphviz)
NodeName <- c(
  "n1",
  "n2",
  "n3"
)
EdgeName = list(
  n1 = list(edges = c(2)),
  n2 = list(edges = c(3)),
  n3 = list(edges = c(1))
)
NodeLabel <- c(
  "Kirk",
  "Spock",
  "Scotty"
)
gR <- graphNEL(nodes = NodeName, edgeL = EdgeName, edgemode = "directed")
nAttrs <- list()
NodeLabel <- NodeLabel[1:numNodes(gR)]
names(NodeLabel) = nodes(gR)
nAttrs$label <- NodeLabel
nAttrs$fillcolor <- c(n1 = "grey")
eAttrs  <- list()
attrs <- list(graph=list(rankdir="LR"),node=list(shape="ellipse", fixedsize=FALSE, color="grey"),edge=list(color="grey"))
plot(gR, nodeAttrs = nAttrs, edgeAttrs=eAttrs, attrs=attrs)

When this is run in R or Visual Studio it generates the graph below:

Example of Attribute Setting and Scale

It’s quick to produce very large graphs and PDF documents even on a small laptop but the Bioconductor Organisation and SQL Server Central both have 150 KB image size limits. The small example diagram below shows a subset of a 200 node computer generated network anchored at node ‘I110’, Labels have been included, ‘Referred To By’ nodes have been excluded so image size < 150 KB.

 

Conclusion

The Bioconductor packages were developed for ‘Gnome mapping in wet lab environments’ but they might be useful for other purposes. The level of complexity, scaling and the different network types and relationships they support is really impressive.

Limitations

The R script worked in CRAN R Studio once setRepositories() was correct, and in Visual Studio with the R tools installed, the main limitations were:

  • The labelling of nodes when the diagrams scale, there’s not much space for text descriptions inside each node, the shorter the label the better .
  • R and Visual studio can be slow parsing large R scripts, they typically execute quickly though.
  • Much like recursive queries and hierarchy data types not working well with these types of network, the R Server installed with SQL Server doesn’t easily lend itself to the execution of the R scripts created by this procedure. To automate the process and publish the graphs, it’s easy to save the R script to a text file then execute it in a windows batch file or Power Shell.

Nonfunctional Requirements

Source Code and Example Execution

Features

The stored procedure (sp_GraphNELGene) accepts the following parameters:

  • Nodes – Table variable with a list of Node Names and Labels
  • Node Refers To – Table variable with a list of Node Names and the name of the Referred to Node
  • Anchor Nodes – Table variable with a list of the Anchor Node Names
  • Include Labels – Defaults to No, Displays Node Label rather than Node Name in the graph
  • Include Refers To – Defaults to Yes
  • Include Referred To By – Defaults to Yes
  • PDF File – Path and file name for generated graph

Bioconductor References

 



Categories: General

%d bloggers like this: