This was published by SQLServerCentral, it describes a generic stored procedure in SQL Server that accepts a set of ‘Node’ names and ‘Edges’ (Node refer to Node) as inputs and returns an R script that produces the diagrams. There are two great things about the R packages, they support network hierarchies with multiple edges for each node, something many similar R packages (Data Tree) and SQL Server features (recursive queries/hierarchy data types) don’t do well. The diagrams also scale well, when there are thousands of nodes the graph is still laid out nicely in most cases.
The diagram below shows a machine generated network topology consisting of 36 nodes, each node is referenced by the nodes immediately left, above left and below left.
The diagram below shows a subset of the same network, ‘I19’ and ‘H24’ were selected as the anchors then only nodes referred to or referred to by were included. Labels can be applied to each node, there are various attributes for both nodes and edges, just the anchor node fill colour was set below.
Simple Example Execution
Execute the script below, after running the sp_GraphNELGene setup:
--------------------------------------------------------------------------- -- Simple hierarchy USE tempdb; GO DECLARE @Node AS GraphNELNodeTableType; DECLARE @NodeReferedTo AS GraphNELNodeRefersToTableType; DECLARE @AnchorNodes GraphNELAnchorNodeTableType; INSERT INTO @Node (NodeName, NodeLabel) SELECT '1','Kirk'; INSERT INTO @Node (NodeName, NodeLabel) SELECT '2','Spock'; INSERT INTO @Node (NodeName, NodeLabel) SELECT '3','Scotty'; INSERT INTO @NodeReferedTo SELECT '1','2'; INSERT INTO @NodeReferedTo SELECT '2','3'; INSERT INTO @NodeReferedTo SELECT '3','1'; INSERT INTO @AnchorNodes SELECT '1'; EXECUTE dbo.sp_GraphNELGene @Node ,@NodeReferedTo ,@AnchorNodes ,@IncludeLabels = 1 GO
The stored procedures generates and returns the R script below:
#SetRepositories() #source("https://bioconductor.org/biocLite.R") #biocLite() #install.packages("graph") #install.packages("graphNEL") #install.packages("Rgraphviz") rm(list = ls()) library(graph) library(Rgraphviz) NodeName <- c( "n1", "n2", "n3" ) EdgeName = list( n1 = list(edges = c(2)), n2 = list(edges = c(3)), n3 = list(edges = c(1)) ) NodeLabel <- c( "Kirk", "Spock", "Scotty" ) gR <- graphNEL(nodes = NodeName, edgeL = EdgeName, edgemode = "directed") nAttrs <- list() NodeLabel <- NodeLabel[1:numNodes(gR)] names(NodeLabel) = nodes(gR) nAttrs$label <- NodeLabel nAttrs$fillcolor <- c(n1 = "grey") eAttrs <- list() attrs <- list(graph=list(rankdir="LR"),node=list(shape="ellipse", fixedsize=FALSE, color="grey"),edge=list(color="grey")) plot(gR, nodeAttrs = nAttrs, edgeAttrs=eAttrs, attrs=attrs)
When this is run in R or Visual Studio it generates the graph below:
Example of Attribute Setting and Scale
It’s quick to produce very large graphs and PDF documents even on a small laptop but the Bioconductor Organisation and SQL Server Central both have 150 KB image size limits. The small example diagram below shows a subset of a 200 node computer generated network anchored at node ‘I110’, Labels have been included, ‘Referred To By’ nodes have been excluded so image size < 150 KB.
The Bioconductor packages were developed for ‘Gnome mapping in wet lab environments’ but they might be useful for other purposes. The level of complexity, scaling and the different network types and relationships they support is really impressive.
The R script worked in CRAN R Studio once setRepositories() was correct, and in Visual Studio with the R tools installed, the main limitations were:
- The labelling of nodes when the diagrams scale, there’s not much space for text descriptions inside each node, the shorter the label the better .
- R and Visual studio can be slow parsing large R scripts, they typically execute quickly though.
- Much like recursive queries and hierarchy data types not working well with these types of network, the R Server installed with SQL Server doesn’t easily lend itself to the execution of the R scripts created by this procedure. To automate the process and publish the graphs, it’s easy to save the R script to a text file then execute it in a windows batch file or Power Shell.
- Generates a 1,000 node, 3,000 edge diagram with multiple anchor nodes, using R Studio, SQL Express on a Microsoft Surface Pro 4 in less than 5 minutes. – PDF file here, try using the ‘Find’ and ‘Zoom’ features to navigate – https://paulbrewer.files.wordpress.com/2018/08/bioconductor_example5.pdf
Source Code and Example Execution
- 01 – Creates the GRAPHNEL table types and stored procedure in the tempdb database – https://paulbrewer.wordpress.com/sp_graphnelgene/
- 02 – Creates the R scripts to produce the first two graphs in this post – https://paulbrewer.wordpress.com/sp_graphnelgene_exampleexecution/
The stored procedure (sp_GraphNELGene) accepts the following parameters:
- Nodes – Table variable with a list of Node Names and Labels
- Node Refers To – Table variable with a list of Node Names and the name of the Referred to Node
- Anchor Nodes – Table variable with a list of the Anchor Node Names
- Include Labels – Defaults to No, Displays Node Label rather than Node Name in the graph
- Include Refers To – Defaults to Yes
- Include Referred To By – Defaults to Yes
- PDF File – Path and file name for generated graph