Title: | The Ball Mapper Algorithm |
---|---|
Description: | The core algorithm is described in "Ball mapper: a shape summary for topological data analysis" by Pawel Dlotko, (2019) <arXiv:1901.07410>. Please consult the following youtube video <https://www.youtube.com/watch?v=M9Dm1nl_zSQfor> the idea of functionality. Ball Mapper provide a topologically accurate summary of a data in a form of an abstract graph. To create it, please provide the coordinates of points (in the points array), values of a function of interest at those points (can be initialized randomly if you do not have it) and the value epsilon which is the radius of the ball in the Ball Mapper construction. It can be understood as the minimal resolution on which we use to create the model of the data. |
Authors: | Pawel Dlotko [aut, cre] |
Maintainer: | Pawel Dlotko <[email protected]> |
License: | MIT + file LICENCE |
Version: | 0.2.0 |
Built: | 2025-02-18 03:40:55 UTC |
Source: | https://github.com/cran/BallMapper |
Create vertices and edges (with additional properties) of a Ball Mapper graph representation of the input data. Please be aware that the program will not perform any normalization on the data. As with cluster analysis we recommend that you consider whether to normalize the data prior to running the function.
BallMapper(points, values, epsilon)
BallMapper(points, values, epsilon)
points |
a collection of input points in a form of a data frame. These are typically points in Euclidean space. By default the Euclidean distance is used to construct the Ball Mapper. |
values |
a collection of outcome values which apply to the data points. Mean values of this variable within any given ball will be used to color the Ball Mapper graph. If it is not available, please set it to a constant array with the same length as the number of observations in the dataset. |
epsilon |
the value of radius of balls used in the Ball Mapper construction. |
The function returns a long list of outputs which are explained below: vertices, comprises two binded lists: First one which contains an increasing sequence of numbers starting from 1 to the number of vertices. Each of them corresponds to a landmark point. The second one contains the number of points covered by a ball of radius epsilon centered by the following landmark points. edges, a collection of not directed edges composed of the first and the second vertex. Ordering of vertices do not have meaning. edges_strength, For every edge [a,b] we define its strength as the number of points that are covered by both landmarks a and b. This array contains the strength of every edge in the Ball Mapper graph. points_covered_by_landmarks, is a list of vectors. I-th vector contains the positions of points covered by i-th landmark. landmarks, contains a list of positions of the landmark points used to construct the balls. coloring, is a vector having as many positions as the number of lanrmarks. It contains the averaged outcome values of the coloring variable corresponding to the points covered by each landmark.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon)
This function will provide a new coloring which is the minimal and average distance of points in the point cloud to the referece points. The output from this procedure can be used as an alternative coloring in BallMapper.
color_by_distance_to_reference_points(allPoints, refPoints)
color_by_distance_to_reference_points(allPoints, refPoints)
allPoints |
is a collection of all points in the dataset. |
refPoints |
is a subset of all points. The function will compute the distance of each point from allPoints to referencePoints |
a pair of minimal and average distances. They can be used to color the BallMapper graph. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) pts <- as.data.frame(points_covered_by_landmarks(l,1)) new_coloring_function <- color_by_distance_to_reference_points( points, pts ) l$coloring <- new_coloring_function[,1] ColorIgraphPlot(l) l$coloring <- new_coloring_function[,2] ColorIgraphPlot(l)
Produce a collection of png files with mapper graphs colored by following coordinates (so that the number of files is the same as the number of coordinates).
colorByAllVariables(outputFromBallMapper, points, fileNamePrefix = "output_", defaultXResolution = 512, defaultYResolution = 512)
colorByAllVariables(outputFromBallMapper, points, fileNamePrefix = "output_", defaultXResolution = 512, defaultYResolution = 512)
outputFromBallMapper |
an output from the BallMapper function |
points |
a collection of input points in a form of a data frame used to create Ball Mapper graph. |
fileNamePrefix |
a prefix of a file name. A plot that uses i-th variable as a coloring will contain this string as a prefix followed by the number i. Set to "output_" by default. |
defaultXResolution |
store a default resolution of image in x direction. Set to 512 by default. |
defaultYResolution |
store a default resolution of image in y direction. Set to 512 by default. |
none.
Produce a new coloring vector being an average of values of given function at points covererd by each vertex of Ball Mapper graph.
colorByAverageValueOfOtherVariable(outputFromBallMapper, newFunctionOnPoints)
colorByAverageValueOfOtherVariable(outputFromBallMapper, newFunctionOnPoints)
outputFromBallMapper |
an output from the BallMapper function |
newFunctionOnPoints |
values of function on points. |
Vector of function values on vertices on Ball Mapper graph. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) ColorIgraphPlot(l) new_coloring <- colorByAverageValueOfOtherVariable(l,cos(var)) l$coloring <- new_coloring ColorIgraphPlot(l)
Produce a new coloring vector being a standard deviation of values of given function at points covererd by each vertex of Ball Mapper graph.
colorByStDevValueOfOtherVariable(outputFromBallMapper, newFunctionOnPoints)
colorByStDevValueOfOtherVariable(outputFromBallMapper, newFunctionOnPoints)
outputFromBallMapper |
an output from the BallMapper function |
newFunctionOnPoints |
values of function on points. |
Vector of function values on vertices on Ball Mapper graph. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) ColorIgraphPlot(l) new_coloring <- colorByStDevValueOfOtherVariable(l,sin(var)) l$coloring <- new_coloring ColorIgraphPlot(l)
This procedure produces a dynamic graph with colors. It allows zoom-in operation and displays information about vertices when they are clicked upon.
coloredDynamicNetwork(outputOfBallMapper, showLegend = FALSE)
coloredDynamicNetwork(outputOfBallMapper, showLegend = FALSE)
outputOfBallMapper |
an output from the BallMapper function |
showLegend |
if set to TRUE a legend will be displayed indicating the coloring of the values of vertices. |
None
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) coloredDynamicNetwork(l)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) coloredDynamicNetwork(l)
Produce a static color visualization of the Ball Mapper graph. It is based on the output from BallMapper function.
ColorIgraphPlot(outputFromBallMapper, showVertexLabels = TRUE, showLegend = FALSE, minimal_ball_radius = 7, maximal_ball_scale = 20, maximal_color_scale = 10, seed_for_plotting = -1, store_in_file = "", default_x_image_resolution = 512, default_y_image_resolution = 512, number_of_colors = 100)
ColorIgraphPlot(outputFromBallMapper, showVertexLabels = TRUE, showLegend = FALSE, minimal_ball_radius = 7, maximal_ball_scale = 20, maximal_color_scale = 10, seed_for_plotting = -1, store_in_file = "", default_x_image_resolution = 512, default_y_image_resolution = 512, number_of_colors = 100)
outputFromBallMapper |
an output from the BallMapper function |
showVertexLabels |
a boolean value determining if the vertex labels are to be shown (TRUE by default). |
showLegend |
a boolean value determining if the legend is to be shown (FALSE by default). |
minimal_ball_radius |
provide a minimal value of the radius of balls used in visualization (7 by default). |
maximal_ball_scale |
provide a maximal value of the radius of balls used in visualization (20 by default). |
maximal_color_scale |
Provide a maximal value (starting from 0) of the color of a ball (10 by default). |
seed_for_plotting |
if set to the same number will suspend the fandom argument in the ploting rountine and produce plots with the same layout everytime. |
store_in_file |
if set to a string, will open a png file, and store the plot therein. By default it is set to an empty string. |
default_x_image_resolution |
store a default resolution of image in x direction. Set to 512 by default. |
default_y_image_resolution |
store a default resolution of image in y direction. Set to 512 by default. |
number_of_colors |
store a number of colors used in the plot. |
None.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) ColorIgraphPlot(l)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) ColorIgraphPlot(l)
This is an auxiliery function. It take the coordinates of points, ids of subset of points, and number of coordinate, and return a sorted vector of the given coodrinate in the considered points. For instance, given the collection of points: 1 2 3 4 5 6 7 8 9 and which_subset = 2,3 and number_of_coordinate = 2 the procedure below will return the vector [2,5,8].
coordinates_of_points_in_subcollection(points, which_subset, number_of_coordinate)
coordinates_of_points_in_subcollection(points, which_subset, number_of_coordinate)
points |
is a collection of input points in a form of a data frame. The same one as on the input of the Ball Mapper. |
which_subset |
Indices of points in the given subset. |
number_of_coordinate |
which coordinate of the consired points to export. |
the sorted vector of values of a given variable at the collection of points. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) coordinates_of_points_in_subcollection(points,c(6,7,8),1)
This procedure take two subset of points (that come from the vertices of Ball Mapper) and return the coordinates on which the averages of those two collections differs most. To ballance the effect of potentially different orders of magnitude of data in column, we divide the difference in means by the mean of the whole column.
find_dominant_difference_using_averages(points, subset1, subset2)
find_dominant_difference_using_averages(points, subset1, subset2)
points |
a collection of input points in a form of a data frame. The same one as on the input of the Ball Mapper. |
subset1 |
First subset of ids of points. |
subset2 |
Second subset of ids of points. |
Vector of corrdinate ids with the absolute value of difference between averages, ordered according to the second variable. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) g1 <- c(1,21 g2 <- c(11,12) find_dominant_difference_using_averages(points,g1,g2)
This procedure take two subset of points (that come from the vertices of Ball Mapper) and return the coordinates on which the averages of those two collections differs most. To ballance the effect of potentially different orders of magnitude of data in column, we divide the difference in means by the standard deviation of the whole column.
find_dominant_difference_using_averages_normalized_by_sd(points, subset1, subset2)
find_dominant_difference_using_averages_normalized_by_sd(points, subset1, subset2)
points |
a collection of input points in a form of a data frame. The same one as on the input of the Ball Mapper. |
subset1 |
First subset of ids of points. |
subset2 |
Second subset of ids of points. |
Vector of corrdinate ids with the absolute value of difference between averages normalized by the standard deviation of the considered column, ordered according to the second variable. var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) g1 <- c(1,21 g2 <- c(11,12) find_dominant_difference_using_averages(points,g1,g2)
Produce a static grayscale visualization of the Ball Mapper graph. It is based on the output from the BallMapper function.
GrayscaleIgraphPlot(outputFromBallMapper, showVertexLabels = TRUE, minimal_ball_radius = 7, maximal_ball_scale = 20, seed_for_plotting = -1, store_in_file = "", default_x_image_resolution = 512, default_y_image_resolution = 512)
GrayscaleIgraphPlot(outputFromBallMapper, showVertexLabels = TRUE, minimal_ball_radius = 7, maximal_ball_scale = 20, seed_for_plotting = -1, store_in_file = "", default_x_image_resolution = 512, default_y_image_resolution = 512)
outputFromBallMapper |
an output from the BallMapper function |
showVertexLabels |
a boolean value determining if vertex labels are to be shown (TRUE by default). |
minimal_ball_radius |
provide a minimal value of the radius of balls used in visualization (7 by default). |
maximal_ball_scale |
provides a maximal value of the radius of the balls used in visualization (20 by default). |
seed_for_plotting |
if set to the same number will suspend the fandom argument in the ploting rountine and produce plots with the same layout everytime. |
store_in_file |
if set to a string, will open a png file, and store the plot therein. By default it is set to an empty string. |
default_x_image_resolution |
store a default resolution of image in x direction. Set to 512 by default. |
default_y_image_resolution |
store a default resolution of image in y direction. Set to 512 by default. |
None.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) GrayscaleIgraphPlot(l)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) GrayscaleIgraphPlot(l)
This function normalize each column (variable) of the input dataset so that the the average of the normalized column is 0 and its standard deviation is 1.
normalize_to_average_0_stdev_1(points)
normalize_to_average_0_stdev_1(points)
points |
a collection of input points in a form of a data frame. |
Nowmalized collectpion of points.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) normalized_points <- normalize_to_average_0_stdev_1 (points)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) normalized_points <- normalize_to_average_0_stdev_1 (points)
This function normalize each column (variable) of the input dataset so that the maximum is mapped to one, minimum to zero, and the intermediate values linearly to the appropriate points in the interval (0,1).
normalize_to_min_0_max_1(points)
normalize_to_min_0_max_1(points)
points |
a collection of input points in a form of a data frame. |
Normalized collection of points.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) normalized_points <- normalize_to_min_0_max_1 (points)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) normalized_points <- normalize_to_min_0_max_1 (points)
This function returns a list of points covered by the given collection of landmarks.
points_covered_by_landmarks(outputFromBallMapper, numbers_of_landmarks)
points_covered_by_landmarks(outputFromBallMapper, numbers_of_landmarks)
outputFromBallMapper |
an output from the BallMapper function |
numbers_of_landmarks |
a vector containnig the numbers of landmarks under consideration. |
A vector of points covered by the landmarks given in numbers_of_landmarks.
Produce a two column list. The first column contain the number of point (possibly with repetitions), the second one contains the number of landmark points that cover it. For example, let us assume that point 1 is covered by landmark 1 and 2, and point 2 is covered by the landmark 2. In this case the obtained list is of a form: 1 1 1 2 2 2 This list can be used for a further analysis of various parts of Ball Mapper graph.
pointToBallList(coverageFromBallMapper)
pointToBallList(coverageFromBallMapper)
coverageFromBallMapper |
a coverage parameter of an output from BallMapper function |
List of landmarks covering each point, as described above.
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) list <- pointToBallList(l$coverage)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) list <- pointToBallList(l$coverage)
This procedure read the BallMapper object from file. The parameter of the file is filename. We assume that files: filename_vertices filename_edges filename_edges_strength filename_points_covered_by_landmarks filename_landmarks filename_coloring
readBallMapperGraphFromFile(filename)
readBallMapperGraphFromFile(filename)
filename |
prefix of the name of the file containing elements of Ball Mapper graph. |
BallMapper object var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) storeBallMapperGraphInFile(l,"my_favorite_BM_graph") l_prime <- readBallMapperGraphFromFile("my_favorite_BM_graph")
This is a simple example of dynamic visualization using networkD3 library. This version do not implement coloring of vertices, just give a general overview of the edges.
simpleDynamicNetwork(outputFromBallMapper, storeAsHtml = FALSE)
simpleDynamicNetwork(outputFromBallMapper, storeAsHtml = FALSE)
outputFromBallMapper |
an output from BallMapper function. |
storeAsHtml |
if set true, it will store the graph in HTML file. |
None
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) simpleDynamicNetwork(l)
var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame( sin(var) ) epsilon <- 0.25 l <- BallMapper(points,values,epsilon) simpleDynamicNetwork(l)
This procedure store the Ball Mapper graph in a file in the following format:
storeBallMapperGraphInFile(outputFromBallMapper, filename = "BM_graph")
storeBallMapperGraphInFile(outputFromBallMapper, filename = "BM_graph")
outputFromBallMapper |
output from the BallMapper procerure. |
filename |
the name of the file to store the data. |
None var <- seq(from=0,to=6.3,by=0.1) points <- as.data.frame( cbind( sin(var),cos(var) ) ) values <- as.data.frame(sin(var)) l <- BallMapper(points, values, 0.25) storeBallMapperGraphInFile(l,"my_favorite_BM_graph")