Learning distances between Directed Weighted Graphs

This tutorial cover the sythetic data assay presented in the ICCASP 2024 paper. Here, we illustrate and compare approaches used to learn distances using directed graphs.

[1]:

import networkx as nx
import scipy.linalg as slg
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import numpy.linalg as lg
from sklearn import preprocessing
from sklearn.manifold import MDS
import seaborn as sns
import scipy as sp
from pydiffmap import diffusion_map
import random
sns.set_style('white')
from sklearn.preprocessing import Normalizer
random.seed(8579)
import ot
import copy
from scaccordion import tl as actl
from scipy.spatial.distance import is_valid_dm
import warnings
warnings.filterwarnings("ignore")

2024-12-27 12:22:26.608637: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Creating the synthetic network

As an initial step, we implement an template directed graph as shown before. Note that, the graph is composed to 4 local components. This property will we further used to generate perturbed versions of the graph.

[2]:

g1 = actl.datasets.make_gl_ciclic_graph()
nodecolors=['orchid']*4+['lightblue']*4+['lightgreen']*4 + ['coral']*4
sncol = []
for i in g1.nodes(data=True):
    sncol.append(nodecolors[i[0]-1])
plt.figure(figsize=(3,3))
lay=nx.nx_agraph.pygraphviz_layout(g1)
nx.set_node_attributes(g1,nodecolors,'color')
nx.draw(g1,
        pos=lay,
        with_labels=True,
        edge_color=list(nx.get_edge_attributes(g1,'color').values()),
    connectionstyle="arc3,rad=0.25",node_color=sncol)

../_images/notebooks_Global_Local_Comparison_3_0.png

Adding the Local and Global perturbations

Next, we copy the template and do the following perturbations. First, we flip the edge(15->16) chaging the flow within the orange component. Second, we flip the edge (12->15) which connects two componetns

[3]:

g2 = g1.copy()
g3 = g1.copy()
g2.remove_edge(12,15)
g2.add_edge(15,12,color='red',width=3)
g3.remove_edge(16,15)
g3.add_edge(15,16,color='red',width=3)

[4]:

aux1 = g2
g2 = g3
g3 = aux1

[5]:

_,ax2 = plt.subplots(1,3,figsize=(15,5),dpi=100)
lay=nx.nx_agraph.pygraphviz_layout(g1)
ax2 = ax2.ravel()
nx.draw(g1,
        pos=lay,
        with_labels=True,ax=ax2[0],
        edge_color=list(nx.get_edge_attributes(g1,'color').values()),
    connectionstyle="arc3,rad=0.25",node_color=sncol)
nx.draw(g2, pos=lay,with_labels=True,ax=ax2[1],
        edge_color=list(nx.get_edge_attributes(g2,'color').values()),node_color=sncol,
        connectionstyle="arc3,rad=0.25",width=list(nx.get_edge_attributes(g2,'width').values()))
nx.draw(g3, pos=lay,with_labels=True,ax=ax2[2],
        edge_color=list(nx.get_edge_attributes(g3,'color').values()),node_color=sncol,
        connectionstyle="arc3,rad=0.25",width=list(nx.get_edge_attributes(g3,'width').values()))
plt.savefig("ICASSP_example_fig1.pdf")

../_images/notebooks_Global_Local_Comparison_7_0.png

Baselines

To illustrate how our method works, we introduce several baselines.

The first baseline is the Frobenius norm, here we can appreciate that the F-norm cannot identify the changes between the perturbed graphs and the original template.

[6]:

l2c0 = np.linalg.norm(nx.to_numpy_array(g1) - nx.to_numpy_array(g1))
l2c1 = np.linalg.norm(nx.to_numpy_array(g1) - nx.to_numpy_array(g2))
l2c2 = np.linalg.norm(nx.to_numpy_array(g1) - nx.to_numpy_array(g3))
print('Frobenius_norm(A_{1}-A_{2}):'+f'{round(l2c1,3)}')
print('Frobenius_norm(A_{1}-A_{2}):'+f'{round(l2c2,3)}')
fro1=[l2c0,l2c1,l2c2]

Frobenius_norm(A_{1}-A_{2}):1.414
Frobenius_norm(A_{1}-A_{2}):1.414

The second baseline is proposed by Chowdhury et al,2019. Using excentriccity cost based optimal transport.

[7]:

mg1 = nx.to_numpy_array(g1)
mg2 = nx.to_numpy_array(g2)
mg3 = nx.to_numpy_array(g3)

n=len(g1.nodes())
mA =(1/n) *np.ones(n)
gwot0 = actl.GWOT.emd2RTLB(mg1,mg1,mA,mA)
gwot = actl.GWOT.emd2RTLB(mg1,mg2,mA,mA)
gwot1 = actl.GWOT.emd2RTLB(mg1,mg3,mA,mA)

lgwot=[gwot0[0],gwot[0],gwot1[0]]

pre-EMD-time:0.0010323484738667807 minutes
post-EMD-time:0.00043499072392781576 minutes
pre-EMD-time:0.0009955684343973795 minutes
post-EMD-time:1.1547406514485676e-05 minutes
pre-EMD-time:0.0009284734725952149 minutes
post-EMD-time:7.482369740804036e-06 minutes

The last baseline is proposed by Maretic et. al.,2019. Based on the graph laplacian and the optimal transport.

[8]:

ug1 = nx.to_undirected(g1)
ug2 = nx.to_undirected(g2)
ug3 = nx.to_undirected(g3)
lug1 = nx.laplacian_matrix(ug1)
lug2 = nx.laplacian_matrix(ug2)
lug3 = nx.laplacian_matrix(ug3)

ugot =[actl.GOT.wass_dist_(lug1.todense(),lug1.todense()),
       actl.GOT.wass_dist_(lug1.todense(),lug2.todense()),
       actl.GOT.wass_dist_(lug1.todense(),lug3.todense())]

scACCorDiON: Computing DWG distances

[9]:

synthetic = {}
synthetic["G"] = nx.to_pandas_edgelist(g1).iloc[:,0:2]
synthetic["G"].columns = ['source','target']
synthetic["G"]['lr_means']=np.ones(len(synthetic['G'].index))
synthetic["G"]['source']=synthetic["G"]['source'].astype('str')
synthetic["G"]['target'] = synthetic["G"]['target'].astype('str')

synthetic["G1"] = nx.to_pandas_edgelist(g2).iloc[:,0:2]
synthetic["G1"].columns = ['source','target']
synthetic["G1"]['lr_means']=np.ones(len(synthetic['G1'].index))
synthetic["G1"]['source']=synthetic["G1"]['source'].astype('str')
synthetic["G1"]['target'] = synthetic["G1"]['target'].astype('str')

synthetic["G2"] = nx.to_pandas_edgelist(g3).iloc[:,0:2]
synthetic["G2"].columns = ['source','target']
synthetic["G2"]['lr_means']=np.ones(len(synthetic['G2'].index))
synthetic["G2"]['source']=synthetic["G2"]['source'].astype('str')
synthetic["G2"]['target'] = synthetic["G2"]['target'].astype('str')

Step2: Making a PCA using the edgelist of the graphs

[10]:

Syn = actl.Accordion(tbls=synthetic,filter=0.0,weight='lr_means')
plt.figure(figsize=(4,4))
Syn.make_pca()
sns.scatterplot(x=0,y=1,data=Syn.Cs['PCA'])
for i in range(Syn.Cs['PCA'].shape[0]):
    plt.text(x=Syn.Cs['PCA'].iloc[i,0],
             y=Syn.Cs['PCA'].iloc[i,1],
             s=Syn.Cs['PCA'].index.tolist()[i])

../_images/notebooks_Global_Local_Comparison_19_0.png

Step3: Computing the Costs available in scACCorDiON

[11]:

Syn.compute_cost(mode='GRD')
Syn.compute_cost(mode='HTD')
Syn.compute_cost(mode='HTD',beta=1)

[12]:

tmp = pd.DataFrame(Syn.Cs['GRD'],index=Syn.expgraph.nodes)

[13]:

tmp.columns = Syn.expgraph.nodes

Step4: Computing the EMD using the directed weighted graph based distances

[14]:

#Syn.Cs['GRD'][np.isnan(Syn.Cs['GRD'])]=np.nanmax(Syn.Cs['GRD'])
Syn.compute_wassestein()
Syn.compute_wassestein(cost='HTD_0.5')
Syn.compute_wassestein(cost='HTD_1')

Using Gromov-Wasserstein Optimal Transport instead Wasserstein

To compare our Wasserstein Optimal Transport formulation, we also show how the differences DWG distances(costs) can be used to compute the Gromow Wasserstein distances between two graphs.

[15]:

g1m = nx.to_pandas_adjacency(g1).apply(lambda x:x/(sum(x)+1e-10),axis=1).to_numpy()
g2m = nx.to_pandas_adjacency(g2).apply(lambda x:x/(sum(x)+1e-10),axis=1).to_numpy()
g3m = nx.to_pandas_adjacency(g3).apply(lambda x:x/(sum(x)+1e-10),axis=1).to_numpy()

[16]:

ng1m = nx.from_numpy_array(g1m,create_using=nx.DiGraph)
ng2m = nx.from_numpy_array(g2m,create_using=nx.DiGraph)
ng3m = nx.from_numpy_array(g3m,create_using=nx.DiGraph)

[17]:

ctdg = actl.distances.getGRD(g1)
ctdg1 = actl.distances.getGRD(g2)
ctdg2 = actl.distances.getGRD(g3)

[18]:

gwgrd=[ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg1,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg2,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss')]

[19]:

ctdg = actl.distances.getCTD(g1m)
ctdg1 = actl.distances.getCTD(g2m)
ctdg2 = actl.distances.getCTD(g3m)
gwhtd05=[ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg1,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg2,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss')]
ctdg = actl.distances.getCTD(g1m,beta=1)
ctdg1 = actl.distances.getCTD(g2m,beta=1)
ctdg2 = actl.distances.getCTD(g3m,beta=1)

gwhtd1=[ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg1,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss'),
ot.gromov_wasserstein2(C1=ctdg,
                       C2=ctdg2,
                       p=ot.unif(n=ctdg.shape[0]),
                       q=ot.unif(n=ctdg.shape[0]),
                       loss_fun='square_loss')]

[20]:

 distfrG = pd.DataFrame.from_dict({
                        '$Frobenius_{A}$':fro1,
                        "$GOT_{undirected}$":ugot,
                        "$GWOT$":lgwot,
                        '$EMD-GRD$':Syn.wdist['GRD']['G'].tolist(),
                        '$EMD-HTD^{0.5}$':Syn.wdist['HTD_0.5']['G'].tolist(),
                        '$EMD-HTD^{1}$':Syn.wdist['HTD_1']['G'].tolist(),
                        '$GW-L2-GRD$':gwgrd,
                        '$GW-L2-HTD^{0.5}$':gwhtd05,
                        '$GW-L2-HTD^{1}$':gwhtd1,})
sns.clustermap(distfrG,
               cmap='Reds',
               annot=True,
               fmt=".4f",
               row_cluster=False,figsize=(10,4),col_cluster=False,vmin=0,vmax=0.1)

[20]:

<seaborn.matrix.ClusterGrid at 0x7f20250ee890>

../_images/notebooks_Global_Local_Comparison_32_1.png