Finding Cell-Cell Communication groups using scACCorDiON
James S. Nagai
02.05.2023
Note: Before using ACCorDIoN make sure that you have the CrossTalkeR installed in your local R enviroment(github)
Here we introduce scACCorDIoN. In this notebook, we guide you into the framework steps. %matplotlib inline
Load Libraries
[1]:
from scaccordion import tl as actl
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import pydiffmap as dm
import pandas as pd
import networkx as nx
from skimpy import skim
from sklearn import covariance
import conorm
import phate
import warnings
import kmedoids
import numpy as np
from tqdm import tqdm
import ot
import os
from sklearn import manifold
import itertools as it
np.random.seed(859)
warnings.filterwarnings("ignore")
/home/james/.local/lib/python3.10/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.2' currently installed).
from pandas.core import (
1. Loading data and metadata
Here, we explore the Pancreas Adenocarcinoma Cohort(Peng et.al.,2019). The data available in scACCorDiON were pre-processed. The Ligand-Receptor inference were performed using the CellphoneDB methods (Efremova et. al., 2020) implemented in LIANA(Dimitrov et. al.,2022). Networks were generated using CrossTalkeR(Nagai et. al., 2021)
[2]:
! wget https://costalab.org/wp-content/uploads/2025/01/PDAC_Data.zip
--2025-01-29 14:34:24-- https://costalab.org/wp-content/uploads/2025/01/PDAC_Data.zip
Resolving costalab.org (costalab.org)... 195.30.85.240
Connecting to costalab.org (costalab.org)|195.30.85.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4004362 (3.8M) [application/zip]
Saving to: ‘PDAC_Data.zip’
PDAC_Data.zip 100%[===================>] 3.82M --.-KB/s in 0.1s
2025-01-29 14:34:24 (30.8 MB/s) - ‘PDAC_Data.zip’ saved [4004362/4004362]
[3]:
! unzip PDAC_Data.zip
Archive: PDAC_Data.zip
creating: PDAC_Data/
creating: PDAC_Data/LR_data/
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N1.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N10.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N11.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N2.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N3.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N4.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N5.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N6.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N7.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N8.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|N9.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T1.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T10.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T11.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T12.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T13.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T14.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T15.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T16.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T17.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T18.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T19.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T2.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T20.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T21.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T22.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T23.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T24.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T3.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T4.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T5.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T6.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T7.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T8.csv
inflating: PDAC_Data/LR_data/ct_Peng_PDAC_processed|T9.csv
inflating: PDAC_Data/Peng_PDAC_metaacc.h5ad
1. Read the data and selecting the tables
[4]:
## Reading the data
#path = "/home/james/sciebo/Zenodo_scACCordion/data/ctkerout/Peng_PDAC_processed/"
path = "./PDAC_Data/LR_data/"
#print(lines)
data = {}
for idi in enumerate(os.listdir(path)):
if idi[1].endswith(".csv"):
#print(idi)
i = idi[1]
data[i[i.find("|")+1:i.find(".csv")]] = pd.read_csv(path+idi[1])
print(data.keys())
## Reading metadata
pdata_metadata = pd.read_hdf("./PDAC_Data/Peng_PDAC_metaacc.h5ad")
dict_keys(['T14', 'N1', 'T12', 'T2', 'T23', 'T22', 'N5', 'N8', 'T6', 'N3', 'T4', 'N11', 'T18', 'T9', 'N2', 'T21', 'T11', 'T16', 'T17', 'T3', 'N10', 'T5', 'N9', 'T15', 'N6', 'T10', 'N7', 'T20', 'T7', 'T24', 'T19', 'T8', 'N4', 'T1', 'T13'])
1.1 Original Labels
[5]:
pdata_metadata.groupby('accLabel').size()
[5]:
accLabel
normal 11
pdac 24
dtype: int64
1.2 Table Format
[6]:
data['N1'].head()
[6]:
Unnamed: 0 | ligand | receptor | source | target | lr_means | cellphone_pvals | type_gene_A | type_gene_B | gene_A | gene_B | MeanLR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | INS | INSR | Endocrine cell | Endothelial cell | 4.543838 | 0.0 | Ligand | Receptor | INS | INSR | 4.543838 |
1 | 1 | INS | INSR | Endocrine cell | Acinar cell | 4.365789 | 0.0 | Ligand | Receptor | INS | INSR | 4.365789 |
2 | 2 | INS | INSR | Endocrine cell | Ductal cell type 1 | 4.270591 | 0.0 | Ligand | Receptor | INS | INSR | 4.270591 |
3 | 3 | INS | IGF1R | Endocrine cell | Endocrine cell | 4.250901 | 0.0 | Ligand | Receptor | INS | IGF1R | 4.250901 |
4 | 4 | INS | INSR | Endocrine cell | Fibroblast cell | 4.248840 | 0.0 | Ligand | Receptor | INS | INSR | 4.248840 |
2. Now with the tables from CrossTalkeR/Liana we build the Accordion Object
[7]:
def norm(x):
'''
Defining the normalization function.
'''
nf = conorm.tmm(x+1e-6,trim_lfc=0,trim_mag=0)
return (nf)
Creating an Accordion Object
[8]:
AaccPDAC = actl.Accordion(tbls=data,weight='lr_means',normf=norm,filter=0.2)
3. Computing a PCA using the edge distribuition per sample
[9]:
AaccPDAC.make_pca()
sns.scatterplot(x=AaccPDAC.Cs['PCA'][0],y=AaccPDAC.Cs['PCA'][1],
hue=pdata_metadata.loc[AaccPDAC.Cs['PCA'].index,'accLabel'])
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.xlabel("PC1")
plt.ylabel("PC2")
[9]:
Text(0, 0.5, 'PC2')

4 Optimal Transport: Wasserstein
In this step we compute the graph-to-graph distance solving the Optimal Transport minimization problem. Note that, the Optimal Transport is done in two main steps: (1) we use the function compute_cost() to compute an edge-to-edge distance and (2) we use the edge weights and the cost to get a graph-to-graph distance matrix using compute_wasserstein()
AaccPDAC.compute_cost_all() ## Run all the possible cost matrices
[10]:
AaccPDAC.compute_cost(mode='HTD') # Computing the cost
AaccPDAC.compute_wassestein(cost='HTD_0.5') # Getting the sample-to-sample distance
Metric Evaluation using K-means and the original labels
[11]:
dt=AaccPDAC.eval_all(y=pdata_metadata.loc[AaccPDAC.p.columns,'accLabel'].astype('category').cat.codes)
[12]:
dt
[12]:
{'HTD_0.5': {'ar': 1.0,
'r': 1.0,
'fm': 1.0,
'afm': 1.0,
'mi': 0.6224869206570488,
'nmi': 1.0,
'ami': 1.0,
'npa': 1.0,
'psi': 1.0,
'spsi': 1.0,
'nca': 1.0}}
5 Clustering using the graph-to-graph distance
Although K-medoids can also be used for clustering scACCorDiON patient-patient distance matrix, in the tutorial we focus in the usage of the K-Barycenters. This usage is motivated by the latent variables infered by K-Barycenters.
km = kmedoids.KMedoids(n_clusters=3, method='fasterpam')
plt.figure(figsize=(6,5))
c = km.fit(AaccPDAC.wdist['HTD_0.5'].to_numpy())
5.1.1 Here we compute the K-barycenters
[13]:
X =np.round(AaccPDAC.wdist['HTD_0.5'].loc[AaccPDAC.wdist['HTD_0.5'].index,AaccPDAC.wdist['HTD_0.5'].index],5)
clust,loss = [],[]
for _ in tqdm(range(100)):
kmeans = actl.KBarycenters(k=3,init='++',random_state=_,max_iters=100)
kmeans.fit(X,distr=AaccPDAC.p,cost=AaccPDAC.Cs['HTD_0.5'])
clust.append(kmeans)
loss.append(kmeans.err[-1])
ncls= clust[np.argmin(loss)]
100%|█████████████████████████████████████████| 100/100 [01:02<00:00, 1.59it/s]
5.1.2 Updating the distance matrix with the Barycenters
[14]:
aux = ncls.centroids.to_dict()
for i in range(3):
for j in range(i,3):
if i!=j:
aux[i][j] = ot.emd2(a=ncls.bary[i]/ncls.bary[i].sum(),
b=ncls.bary[j]/ncls.bary[j].sum(),M=AaccPDAC.Cs['HTD_0.5'])
aux[j][i] = ot.emd2(a=ncls.bary[i]/ncls.bary[i].sum(),
b=ncls.bary[j]/ncls.bary[j].sum(),M=AaccPDAC.Cs['HTD_0.5'])
else:
aux[i][j] = 0
eux = pd.concat([AaccPDAC.wdist['HTD_0.5'],ncls.centroids],axis=1)
eux = pd.concat([eux,pd.DataFrame.from_dict(aux).T])
tmplab = ncls.flabels.tolist()
for i in [0,1,2]:
tmplab.append(i)
Defining the barycenters
[15]:
tmpstl = ['0']*len(AaccPDAC.p.columns)+['1']*3
5.1.3 Computing the new embeddings and plotting the new data representation
[16]:
import phate
[17]:
%matplotlib agg
emb1 = phate.PHATE(knn=8,knn_dist='precomputed',random_state=42)
emb1 = emb1.fit_transform(eux.to_numpy())
fs,axs = plt.subplots(1,2,figsize=(14,5))
Calculating PHATE...
Running PHATE on precomputed distance matrix with 38 observations.
Calculating graph and diffusion operator...
Calculating affinities...
Calculating optimal t...
Automatically selected t = 5
Calculated optimal t in 0.01 seconds.
Calculating diffusion potential...
Calculating metric MDS...
Calculated PHATE in 0.03 seconds.
[18]:
barys = pd.DataFrame(ncls.bary,index=AaccPDAC.p.loc[list(AaccPDAC.expgraph.nodes()),:].index)
barys.reset_index(inplace=True)
barys[['u', 'v']] = barys['index'].str.split('$', expand=True)
cmapse =[plt.cm.Greens,plt.cm.Blues,plt.cm.Reds]
vtxcmap = {i[1]:plt.cm.tab10.colors[i[0]] for i in enumerate(AaccPDAC.nodes)}
[19]:
sns.scatterplot(x=emb1[0:-3,0],
y=emb1[0:-3,1],hue=pdata_metadata.loc[AaccPDAC.Cs['PCA'].index,'accLabel'],s=200,linewidth=0,ax=axs[0])
axs[0].legend(loc=0, bbox_to_anchor=(1, 0.5))
axs[0].invert_xaxis()
sns.scatterplot(x=emb1[:,0],
y=emb1[:,1],hue=tmplab,s=200,style=tmpstl,palette=list(plt.cm.Set1.colors)[0:3][::-1],
size=tmpstl,sizes=(400,200),linewidth=0,ax=axs[1])
axs[1].legend(loc=0, bbox_to_anchor=(1, 0.5))
#axs[1].invert_xaxis()
fs
[19]:
