Network Hierarchy – R data.tree Package Diagram Error

The R data.tree package presents hierarchies in tree structures that can be traversed and charted but it doesn’t plot ‘network’ hierarchy types where a node has two or more parents correctly. For example, given the simple hypothetical organisational chart shown below:

data.tree plots the chart shown below, the ‘Accountants’ reporting lines to both the Managers of ‘Operations’ and ‘Finance’ are not shown, the line between 4 and 2 has been omitted.

In contrast, a simple print of the data.tree does present the correct organisational structure as shown below:

The T-SQL code below was used to construct the data tree demo:


USE master;
GO
-----------------------------------------------------------------------------
-- Create Database and Tables
BEGIN TRY
	ALTER DATABASE DataTreeDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
	DROP DATABASE DataTreeDemo;
	RAISERROR('DataTreeDemo setup in progress...',0,0);
END TRY
BEGIN CATCH
	RAISERROR('DataTreeDemo setup in progress...',0,0);
END CATCH
CREATE DATABASE DataTreeDemo;
GO

USE DataTreeDemo;
GO

CREATE TABLE dbo.Person(
  PersonID int NOT NULL,
  PersonName varchar(30) NULL,
  CONSTRAINT PK_Person PRIMARY KEY CLUSTERED (PersonID) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE dbo.PersonReportsTo(
  PersonID int NOT NULL,
  ReportsToID INT NOT NULL,
  CONSTRAINT PK_PersonReportsTo PRIMARY KEY CLUSTERED (PersonID, ReportsToID) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[PersonReportsTo]  WITH CHECK ADD  CONSTRAINT [FK_PersonReportsTo_Person] FOREIGN KEY([PersonID])
REFERENCES [dbo].[Person] ([PersonID])
GO

ALTER TABLE [dbo].[PersonReportsTo] CHECK CONSTRAINT [FK_PersonReportsTo_Person]
GO

ALTER TABLE [dbo].[PersonReportsTo]  WITH CHECK ADD  CONSTRAINT [FK_PersonReportsTo_ReportsTo] FOREIGN KEY([ReportsToID])
REFERENCES [dbo].[Person] ([PersonID])
GO

ALTER TABLE [dbo].[PersonReportsTo] CHECK CONSTRAINT [FK_PersonReportsTo_ReportsTo]
GO

-----------------------------------------------------------------------------
-- Create network hierarchy demo data
INSERT INTO Person(PersonID, PersonName) VALUES(1,'CEO');
INSERT INTO Person(PersonID, PersonName) VALUES(2,'Manager of Operations');
INSERT INTO Person(PersonID, PersonName) VALUES(3,'Manager of Finance');
INSERT INTO Person(PersonID, PersonName) VALUES(4,'Accountant');
INSERT INTO Person(PersonID, PersonName) VALUES(5,'Operations Staff');
INSERT INTO Person(PersonID, PersonName) VALUES(6,'Accounts Administrator');
INSERT INTO Person(PersonID, PersonName) VALUES(7,'Chairman');
INSERT INTO Person(PersonID, PersonName) VALUES(8,'Clerk1');
INSERT INTO Person(PersonID, PersonName) VALUES(9,'Clerk2');

INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(2,1);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(3,1);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(5,2);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(4,2);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(4,3);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(6,3);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(8,4);
INSERT INTO PersonReportsTo(PersonID, ReportsToID) VALUES(9,4);

The R code below was used to produce the data.tree diagram and report:


rm(list=ls())
library("data.tree")
library("RODBC")

dbhandle <- odbcDriverConnect('driver=SQL Server;server=localhost;database=DataTreeDemo;trusted_connection=true')
results = sqlQuery(dbhandle,
"
WITH PersonsConstruct AS
(
  SELECT 0 AS PersonID, CAST('Organisation' AS VARCHAR(30)) AS PersonName, -1 AS PersonReportsTo
  UNION ALL	
  SELECT P.PersonID, PersonName, ISNULL(PRT.ReportsToID,0) AS PersonReportsTo
  FROM dbo.Person p
  LEFT OUTER JOIN dbo.PersonReportsTo prt
  ON P.PersonID = PRT.PersonID
),
  PersonPath (pathString, PersonID, PersonName, PersonReportsTo) AS
  (
  SELECT 
  CAST('/0/' AS HIERARCHYID) AS pathString,
  PersonID,
  PersonName,
  PersonReportsTo
  FROM PersonsConstruct
  WHERE PersonID = 0
  UNION ALL
  SELECT
  CAST(pathString.ToString() + CAST(pc.PersonID AS VARCHAR(30)) + '/' AS HIERARCHYID) AS pathString,  
  pc.PersonID,
  pc.PersonName,
  pc.PersonReportsTo
  FROM PersonsConstruct pc
  JOIN PersonPath pp
  ON pc.PersonReportsTo = pp.PersonID
  )
  SELECT 
  PersonID,
  PersonName,
  PersonReportsTo,
  pathString.ToString() AS pathString,
  pathString.GetLevel() AS pathStringLevel
  FROM PersonPath
  ORDER BY PersonID
  OPTION(MAXRECURSION 32767);
")

close(dbhandle)
orgChart <- as.Node(results)
print(orgChart,"PersonName","PersonReportsTo")
plot(orgChart)

Summary
The documentation associated with the R data.tree package package doesn’t mention the ‘network’ hierarchy type limitation and no error is thrown, but charts are incomplete. This has been communicated to the CRAN institute, hopefully this blog post might save someone elseĀ  time debugging the data.tree diagrams for network hierarchies.

References
https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html
https://www.statmethods.net/advstats/cart.html

 



Categories: General

%d bloggers like this: