Hiba NEJJARI Milly ON DIA5
!pip install datasets
!pip install sklearn-crfsuite
!pip install transformers
!pip install beautifulsoup4 nltk spacy
!pip install rdflib
Requirement already satisfied: datasets in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.12.0) Requirement already satisfied: numpy>=1.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (1.24.3) Requirement already satisfied: pyarrow>=8.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (11.0.0) Requirement already satisfied: dill<0.3.7,>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.3.6) Requirement already satisfied: pandas in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.0.3) Requirement already satisfied: requests>=2.19.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.31.0) Requirement already satisfied: tqdm>=4.62.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (4.65.0) Requirement already satisfied: xxhash in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.0.2) Requirement already satisfied: multiprocess in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.70.14) Requirement already satisfied: fsspec[http]>=2021.11.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2023.4.0) Requirement already satisfied: aiohttp in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (3.8.5) Requirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.15.1) Requirement already satisfied: packaging in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (23.1) Requirement already satisfied: responses<0.19 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.13.3) Requirement already satisfied: pyyaml>=5.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (6.0) Requirement already satisfied: attrs>=17.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (25.3.0) Requirement already satisfied: charset-normalizer<4.0,>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (2.0.4) Requirement already satisfied: multidict<7.0,>=4.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (6.0.2) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: yarl<2.0,>=1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.8.1) Requirement already satisfied: frozenlist>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: aiosignal>=1.1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.2.0) Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (3.9.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (4.13.2) Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (2023.7.22) Requirement already satisfied: six in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from responses<0.19->datasets) (1.16.0) Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=4.62.1->datasets) (0.4.6) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2023.3) Requirement already satisfied: sklearn-crfsuite in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (0.5.0) Requirement already satisfied: python-crfsuite>=0.9.7 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (0.9.11) Requirement already satisfied: scikit-learn>=0.24.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (1.3.0) Requirement already satisfied: tabulate>=0.4.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (0.8.10) Requirement already satisfied: tqdm>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (4.65.0) Requirement already satisfied: numpy>=1.17.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.24.3) Requirement already satisfied: scipy>=1.5.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.11.1) Requirement already satisfied: joblib>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (2.2.0) Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=2.0->sklearn-crfsuite) (0.4.6) Requirement already satisfied: transformers in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.32.1) Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (3.9.0) Requirement already satisfied: huggingface-hub<1.0,>=0.15.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.15.1) Requirement already satisfied: numpy>=1.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (1.24.3) Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (23.1) Requirement already satisfied: pyyaml>=5.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (2022.7.9) Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (2.31.0) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.13.2) Requirement already satisfied: safetensors>=0.3.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.3.2) Requirement already satisfied: tqdm>=4.27 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (4.65.0) Requirement already satisfied: fsspec in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0,>=0.15.1->transformers) (2023.4.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0,>=0.15.1->transformers) (4.13.2) Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=4.27->transformers) (0.4.6) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (2023.7.22) Requirement already satisfied: beautifulsoup4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.12.2) Requirement already satisfied: nltk in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (3.8.1) Requirement already satisfied: spacy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (3.8.4) Requirement already satisfied: soupsieve>1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from beautifulsoup4) (2.4) Requirement already satisfied: click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (8.0.4) Requirement already satisfied: joblib in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (1.2.0) Requirement already satisfied: regex>=2021.8.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (2022.7.9) Requirement already satisfied: tqdm in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (4.65.0) Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.0.12) Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.0.5) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.0.12) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.0.11) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.0.9) Requirement already satisfied: thinc<8.4.0,>=8.3.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (8.3.4) Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.1.3) Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.5.1) Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.0.10) Requirement already satisfied: weasel<0.5.0,>=0.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (0.4.1) Requirement already satisfied: typer<1.0.0,>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (0.15.2) Requirement already satisfied: numpy>=1.19.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.24.3) Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.31.0) Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.10.8) Requirement already satisfied: jinja2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.1.2) Requirement already satisfied: setuptools in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (68.0.0) Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (23.1) Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.5.0) Requirement already satisfied: language-data>=1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0) Requirement already satisfied: typing-extensions>=4.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.13.2) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2023.7.22) Requirement already satisfied: blis<1.3.0,>=1.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.2.0) Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5) Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm->nltk) (0.4.6) Requirement already satisfied: shellingham>=1.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4) Requirement already satisfied: rich>=10.11.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typer<1.0.0,>=0.3.0->spacy) (13.9.4) Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.21.0) Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (5.2.1) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from jinja2->spacy) (2.1.1) Requirement already satisfied: marisa-trie>=1.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.2.1) Requirement already satisfied: markdown-it-py>=2.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.2.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.15.1) Requirement already satisfied: mdurl~=0.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.0) Requirement already satisfied: rdflib in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (7.1.3) Requirement already satisfied: pyparsing<4,>=2.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rdflib) (3.0.9)
from datasets import load_dataset
# Loading the CoNLL-2003 dataset
dataset = load_dataset("conll2003")
Found cached dataset conll2003 (C:/Users/Nejjari/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98)
0%| | 0/3 [00:00<?, ?it/s]
Using NLTK and spaCy: tokenization, normalization, stopword removal, and lemmatization.
import spacy
import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# download required data
nltk.download('punkt')
nltk.download('stopwords')
# load spaCy English model
nlp = spacy.load("en_core_web_sm")
def preprocess_text(text):
# tokenize
tokens = word_tokenize(text)
# lowercase
tokens = [word.lower() for word in tokens]
# remove punctuation (except hyphen)
punctuation = string.punctuation.replace('-', '')
tokens = [word for word in tokens if word not in punctuation]
# remove stopwords
stop_words = set(stopwords.words("english"))
filtered_tokens = [word for word in tokens if word not in stop_words]
# lemmatize
doc = nlp(" ".join(filtered_tokens))
lemmatized_text = " ".join([token.lemma_ for token in doc])
return lemmatized_text
# example
text = "Apple was founded by Steve Jobs in 1976."
print(preprocess_text(text))
[nltk_data] Downloading package punkt to [nltk_data] C:\Users\Nejjari\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package stopwords to [nltk_data] C:\Users\Nejjari\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date!
apple found steve job 1976
a. CRF-Based NER with Feature Extraction and Evaluation
import sklearn_crfsuite
from sklearn_crfsuite import metrics
from datasets import load_dataset
from sklearn_crfsuite import metrics as crf_metrics
train_dataset = dataset["train"]
val_dataset = dataset["validation"]
test_dataset = dataset["test"]
# feature extraction function
def word2features(sent, i):
word = sent[i]
features = {
'bias': 1.0,
'word.lower()': word.lower(),
'word[-3:]': word[-3:],
'word[-2:]': word[-2:],
'word.isupper()': word.isupper(),
'word.istitle()': word.istitle(),
'word.isdigit()': word.isdigit(),
}
if i > 0:
word1 = sent[i-1]
features.update({
'-1:word.lower()': word1.lower(),
'-1:word.istitle()': word1.istitle(),
'-1:word.isupper()': word1.isupper(),
})
else:
features['BOS'] = True
if i < len(sent)-1:
word1 = sent[i+1]
features.update({
'+1:word.lower()': word1.lower(),
'+1:word.istitle()': word1.istitle(),
'+1:word.isupper()': word1.isupper(),
})
else:
features['EOS'] = True
return features
def extract_features_and_labels(dataset):
sentences = [example["tokens"] for example in dataset]
label_list = dataset.features["ner_tags"].feature.names
labels = [[label_list[tag] for tag in example["ner_tags"]] for example in dataset]
X = [[word2features(sent, i) for i in range(len(sent))] for sent in sentences]
return X, labels
# preparing the data
X_train, y_train = extract_features_and_labels(train_dataset)
X_test, y_test = extract_features_and_labels(test_dataset)
# training CRF
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1,
c2=0.1,
max_iterations=100,
all_possible_transitions=False
)
crf.fit(X_train, y_train)
# predicting and evaluating
y_pred = crf.predict(X_test)
print(metrics.flat_classification_report(y_test, y_pred))
precision recall f1-score support
B-LOC 0.86 0.83 0.84 1668
B-MISC 0.80 0.76 0.78 702
B-ORG 0.82 0.71 0.76 1661
B-PER 0.82 0.84 0.83 1617
I-LOC 0.79 0.70 0.74 257
I-MISC 0.64 0.65 0.65 216
I-ORG 0.70 0.74 0.72 835
I-PER 0.86 0.95 0.90 1156
O 0.99 0.99 0.99 38323
accuracy 0.96 46435
macro avg 0.81 0.80 0.80 46435
weighted avg 0.96 0.96 0.96 46435
The table above summarizes its performance across entity types using precision, recall, and F1-score. High scores across B/I-tags (PER, ORG, LOC) and an overall accuracy of 96% show that the model is effective for sequence labeling tasks.
b. spaCy NER using en_ner_conll03 Model
# loading spaCy's pre-trained model
nlp = spacy.load(r"C:\Users\Nejjari\Documents\WebData Project\en_ner_conll03\best_ner_model")
# example sentence
text = "Apple was founded by Steve Jobs in 1976."
doc = nlp(text)
# extracting and display named entities with positions
entities = [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
print("Extracted Entities:", entities)
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\spacy\util.py:910: UserWarning: [W095] Model 'en_pipeline' (0.0.0) was trained with spaCy v3.7.5 and may not be 100% compatible with the current version (3.8.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate warnings.warn(warn_msg)
Extracted Entities: [('Apple', 0, 5, 'ORG'), ('Steve Jobs', 21, 31, 'PER')]
from sklearn.metrics import classification_report
from itertools import chain
# function to convert spaCy predictions to the same format as CRF
def spacy_predict(doc_tokens):
text = " ".join(doc_tokens)
doc = nlp(text)
ents = {ent.text: ent.label_ for ent in doc.ents}
labels = []
for token in doc_tokens:
if token in ents:
labels.append(f"B-{ents[token]}")
else:
labels.append("O")
return labels
# applying to full test set
spacy_preds = []
true_labels = []
label_list = dataset["test"].features["ner_tags"].feature.names
for example in dataset["test"]:
tokens = example["tokens"]
true = [label_list[i] for i in example["ner_tags"]]
pred = spacy_predict(tokens)
# padding/truncating to match length
pred += ["O"] * (len(true) - len(pred)) # if pred shorter
pred = pred[:len(true)] # in case pred is longer
spacy_preds.append(pred)
true_labels.append(true)
# flattens the lists
flat_true_labels = list(chain.from_iterable(true_labels))
flat_spacy_preds = list(chain.from_iterable(spacy_preds))
# evaluation
print("spaCy NER performance:")
print(classification_report(flat_true_labels, flat_spacy_preds))
spaCy NER performance:
precision recall f1-score support
B-LOC 0.89 0.72 0.79 1668
B-MISC 0.85 0.57 0.68 702
B-ORG 0.76 0.49 0.60 1661
B-PER 0.66 0.21 0.32 1617
I-LOC 0.00 0.00 0.00 257
I-MISC 0.00 0.00 0.00 216
I-ORG 0.00 0.00 0.00 835
I-PER 0.00 0.00 0.00 1156
O 0.89 1.00 0.94 38323
accuracy 0.88 46435
macro avg 0.45 0.33 0.37 46435
weighted avg 0.83 0.88 0.85 46435
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
The evaluation of the spaCy pre-trained model reveals high accuracy (88%) but limited recognition of several entity types. The model performs well on the "O" class but misses many inner tokens of named entities, resulting in low macro-averaged F1-score (0.37).
c. Comparison of the two
import pandas as pd
import matplotlib.pyplot as plt
# flattening crf outputs
flat_crf_true = list(chain.from_iterable(y_test)) # y_test from crf
flat_crf_pred = list(chain.from_iterable(y_pred)) # y_pred from crf
# flattening spacy outputs
flat_spacy_true = list(chain.from_iterable(true_labels))
flat_spacy_pred = list(chain.from_iterable(spacy_preds))
# generating classification reports as dictionaries
crf_report_dict = classification_report(flat_crf_true, flat_crf_pred, output_dict=True)
spacy_report_dict = classification_report(flat_spacy_true, flat_spacy_pred, output_dict=True)
# converting dictionaries to dataframes
crf_df = pd.DataFrame(crf_report_dict).transpose()
spacy_df = pd.DataFrame(spacy_report_dict).transpose()
# plotting macro average scores
metrics_to_plot = ["precision", "recall", "f1-score"]
x = range(len(metrics_to_plot))
plt.figure(figsize=(8, 5))
plt.bar([i - 0.2 for i in x], crf_df.loc["macro avg"][metrics_to_plot], width=0.4, label="crf")
plt.bar([i + 0.2 for i in x], spacy_df.loc["macro avg"][metrics_to_plot], width=0.4, label="spacy")
plt.xticks(x, metrics_to_plot)
plt.ylabel("score")
plt.ylim(0, 1.05)
plt.title("crf vs spacy ner performance (macro avg)")
plt.legend()
plt.tight_layout()
plt.show()
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
# side-by-side macro avg comparison table
combined_df = pd.concat([
crf_df.loc[["macro avg"]][metrics_to_plot].rename(columns=lambda x: f"CRF_{x}"),
spacy_df.loc[["macro avg"]][metrics_to_plot].rename(columns=lambda x: f"spaCy_{x}")
], axis=1)
print("\nMacro Average Comparison (CRF vs spaCy):")
display(combined_df)
Macro Average Comparison (CRF vs spaCy):
| CRF_precision | CRF_recall | CRF_f1-score | spaCy_precision | spaCy_recall | spaCy_f1-score | |
|---|---|---|---|---|---|---|
| macro avg | 0.808326 | 0.79646 | 0.801214 | 0.449194 | 0.331623 | 0.369927 |
The CRF model outperforms the spaCy pre-trained model with a macro-averaged precision of 0.81 vs 0.45, recall of 0.80 vs 0.33, and F1-score of 0.80 vs 0.37, demonstrating superior performance across all NER evaluation metrics.
This is surely because the CRF model was trained on the same dataset (CoNLL-2003), allowing it to better learn the specific tagging scheme, while spaCy's generic pre-trained model may not fully align with the fine-grained labels or tokenization of the dataset.
d. Saving the extracted entities along with their positions.
import csv
# helper function to extract entities from crf token-label pairs
def extract_crf_entities(tokens, labels):
entities = []
current_entity = []
current_label = None
for idx, (token, label) in enumerate(zip(tokens, labels)):
if label.startswith("B-"):
if current_entity:
entities.append((" ".join(current_entity), current_label, start_idx, idx - 1))
current_entity = [token]
current_label = label[2:]
start_idx = idx
elif label.startswith("I-") and current_label == label[2:]:
current_entity.append(token)
else:
if current_entity:
entities.append((" ".join(current_entity), current_label, start_idx, idx - 1))
current_entity = []
current_label = None
if current_entity:
entities.append((" ".join(current_entity), current_label, start_idx, len(tokens) - 1))
return entities
# open file and write both outputs
with open("combined_ner_entities.csv", "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["model", "sentence_id", "entity", "label", "start", "end"])
for idx, (example, crf_labels) in enumerate(zip(dataset["test"], y_pred)):
tokens = example["tokens"]
text = " ".join(tokens)
# --- spaCy entities ---
doc = nlp(text)
for ent in doc.ents:
writer.writerow(["spaCy", idx, ent.text, ent.label_, ent.start_char, ent.end_char])
# --- CRF entities (token index positions) ---
crf_entities = extract_crf_entities(tokens, crf_labels)
for ent_text, ent_label, start_idx, end_idx in crf_entities:
writer.writerow(["CRF", idx, ent_text, ent_label, start_idx, end_idx])
Here we save to a csv file the extracted entities along with their positions from both models.
import spacy
# spaCy's model for relation extraction
nlp = spacy.load("en_core_web_sm")
def extract_relations(text):
doc = nlp(text)
relations = []
for token in doc:
# check for passive or active subject
if (token.dep_ == "nsubj" or token.dep_ == "nsubjpass") and token.head.dep_ == "ROOT":
subject = token.text
predicate = token.head.text
# look for object of preposition or agent
for child in token.head.children:
if child.dep_ in ("prep", "agent"):
for obj in child.children:
if obj.dep_ == "pobj":
object_phrase = " ".join([tok.text for tok in obj.subtree])
relations.append((subject, predicate, object_phrase))
return relations
text = "Apple was founded by Steve Jobs."
print(extract_relations(text))
[('Apple', 'founded', 'Steve Jobs')]
Here we show the extracted relation triplet in the form (subject, predicate, object) using dependency patterns.
'Optional' : visualizations and more
from spacy import displacy
displacy.render(nlp("Apple was founded by Steve Jobs."), style="dep", jupyter=True)
This dependency parse visualization illustrates how spaCy identifies grammatical relationships that enable relation extraction.
displacy.render(nlp("Apple was founded by Steve Jobs."), style="ent", jupyter=True)
texts = [
"Apple was founded by Steve Jobs.",
"Google acquired YouTube in 2006.",
# you can add more sentences here
]
for i, text in enumerate(texts):
print(f"Sentence {i}: {text}")
print("Extracted Relations:", extract_relations(text))
print()
Sentence 0: Apple was founded by Steve Jobs.
Extracted Relations: [('Apple', 'founded', 'Steve Jobs')]
Sentence 1: Google acquired YouTube in 2006.
Extracted Relations: [('Google', 'acquired', '2006')]
Extra rules added :
Support for both active (nsubj) and passive (nsubjpass) sentence constructions.
Detection of agents using agent and prep dependencies.
Reconstruction of full entity phrases using subtree.
Mapping of extracted verbs to ontology predicates like schema:founder.
Default fallback using a generic ex: namespace for unmapped relations.
pip install rdflibhttp://localhost:8888/notebooks/Documents/WebData%20Project/Untitled.ipynb#4.-Knowledge-Graph-Building:
Note: you may need to restart the kernel to use updated packages.
ERROR: Invalid requirement: 'rdflibhttp://localhost:8888/notebooks/Documents/WebData%20Project/Untitled.ipynb#4.-Knowledge-Graph-Building:'
from rdflib import Graph, URIRef, Namespace
from rdflib.namespace import RDF
# sample aligned triples
triples = [
("Tesla", "schema:founder", "Elon Musk"),
("Google", "dbo:acquisition", "YouTube")
]
# create RDF graph
g = Graph()
# define namespaces
EX = Namespace("http://example.org/")
SCHEMA = Namespace("http://schema.org/")
DBO = Namespace("http://dbpedia.org/ontology/")
prefix_map = {
"schema": SCHEMA,
"dbo": DBO,
"ex": EX
}
# add triples to graph
for s, p, o in triples:
prefix, pred = p.split(":")
predicate_uri = prefix_map[prefix][pred]
subject_uri = URIRef(EX[s.replace(" ", "_")])
object_uri = URIRef(EX[o.replace(" ", "_")])
g.add((subject_uri, predicate_uri, object_uri))
In this part, we construct an RDF knowledge graph by defining sample triples with namespace-prefixed predicates (for example schema:founder) and converting them into proper URIs. We use rdflib to add these triples to a graph and then serialize it in Turtle format.
print(g.serialize(format="turtle"))
@prefix ns1: <http://dbpedia.org/ontology/> . @prefix ns2: <http://schema.org/> . <http://example.org/Google> ns1:acquisition <http://example.org/YouTube> . <http://example.org/Tesla> ns2:founder <http://example.org/Elon_Musk> .
We can see the output is a turtle-encoded RDF graph showing semantic relations such as Tesla’s founder being Elon Musk and Google acquiring YouTube, using proper ontology namespaces.
To retrieve information from the RDF graph, we perform a SPARQL query to extract all predicate-object pairs for a given subject—in this case, Google.
query = """
SELECT ?predicate ?object
WHERE {
<http://example.org/Google> ?predicate ?object
}
"""
for row in g.query(query):
print(row.predicate, "-->", row.object)
http://dbpedia.org/ontology/acquisition --> http://example.org/YouTube
We define a new URI referencing Tesla’s actual resource on DBpedia to align our graph entities with real-world knowledge bases for interoperability.
tesla_uri = URIRef("http://dbpedia.org/resource/Tesla,_Inc.")
We then add a new triple to the RDF graph indicating that Tesla (from DBpedia) was founded by Elon Musk (also from DBpedia), using the schema:founder predicate for semantic consistency.
g.add((tesla_uri, SCHEMA.founder, URIRef("http://dbpedia.org/resource/Elon_Musk")))
<Graph identifier=Nc39f8bbe3d6e464dbbe616ccbb4c5eab (<class 'rdflib.graph.Graph'>)>
We test it out with the provided text and apply the relation extraction function and print the extracted triples.
text = """Star Wars IV is a Movie where there are different kinds of creatures, like humans and wookies. Some creatures are Jedis; for instance, the human Luke is a Jedi, and Master Yoda — for whom the species is not known — is also a Jedi. The wookie named Chewbacca is Han’s co-pilot on the Millennium Falcon starship. The speed of Millennium Falcon is 1.5 (above the speed of light!)"""
triples = extract_relations(text)
print("Extracted Triples:")
for t in triples:
print(t)
Extracted Triples:
('wookie', 'is', 'the Millennium Falcon starship')
First we configure Selenium to use headless Chrome (no UI browser window)
pip install selenium beautifulsoup4 requests
Requirement already satisfied: selenium in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.31.0) Requirement already satisfied: beautifulsoup4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.12.2) Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.31.0) Requirement already satisfied: urllib3[socks]<3,>=1.26 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (1.26.16) Requirement already satisfied: trio~=0.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (0.29.0) Requirement already satisfied: trio-websocket~=0.9 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (0.12.2) Requirement already satisfied: certifi>=2021.10.8 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (2023.7.22) Requirement already satisfied: typing_extensions~=4.9 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (4.13.2) Requirement already satisfied: websocket-client~=1.8 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (1.8.0) Requirement already satisfied: soupsieve>1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from beautifulsoup4) (2.4) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests) (3.4) Requirement already satisfied: attrs>=23.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (25.3.0) Requirement already satisfied: sortedcontainers in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (2.4.0) Requirement already satisfied: outcome in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.3.0.post0) Requirement already satisfied: sniffio>=1.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.3.1) Requirement already satisfied: cffi>=1.14 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.15.1) Requirement already satisfied: wsproto>=0.14 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio-websocket~=0.9->selenium) (1.2.0) Requirement already satisfied: PySocks!=1.5.7,<2.0,>=1.5.6 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from urllib3[socks]<3,>=1.26->selenium) (1.7.1) Requirement already satisfied: pycparser in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from cffi>=1.14->trio~=0.17->selenium) (2.21) Requirement already satisfied: h11<1,>=0.9.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.14.0) Note: you may need to restart the kernel to use updated packages.
In this code we use Selenium and BeautifulSoup to scrape news articles from NPR's World section. It launches a browser (chrome), accesses the target page, and parses its HTML content. Then, it collects at least 10 unique article URLs by selecting anchor tags () that start with https://www.npr.org/ to ensure they point to valid NPR articles.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
def fetch_npr_articles():
options = Options()
driver = webdriver.Chrome(options=options)
print("✅ Driver launched. Navigating to listing page...")
driver.get("https://www.npr.org/sections/world/")
time.sleep(3) # allows page to load
soup = BeautifulSoup(driver.page_source, "html.parser")
print("✅ Section page loaded")
# collect article URLs
links = []
for a in soup.select("article a[href^='https://www.npr.org/']"):
href = a['href']
if '/202' in href and href not in links:
links.append(href)
if len(links) >= 10:
break
print(f"🔗 Collected {len(links)} article links")
articles = []
for url in links:
try:
print(f" Opening {url}")
driver.get(url)
time.sleep(3) # Wait for article to load
article_soup = BeautifulSoup(driver.page_source, "html.parser")
# --- Title
title_tag = article_soup.find("h1")
title = title_tag.get_text(strip=True) if title_tag else "No title"
# --- Date
date_tag = article_soup.find("time")
publication_date = date_tag.get("datetime") if date_tag else "Unknown"
# --- Content (fixed)
paragraphs = article_soup.select("div[class*='storytext'] p")
content = "\n".join(p.get_text(strip=True) for p in paragraphs) if paragraphs else "No content"
articles.append({
"title": title,
"url": url,
"publication_date": publication_date,
"content": content
})
except Exception as e:
print(f" Error processing {url}: {e}")
driver.quit()
return articles
articles = fetch_npr_articles()
for i, article in enumerate(articles):
print(f"\n Article {i+1}")
print(f"Title: {article['title']}")
print(f"Date: {article['publication_date']}")
print(f"URL: {article['url']}")
print(f"Content Preview:\n{article['content'][:300]}...\n{'-'*80}")
✅ Driver launched. Navigating to listing page... ✅ Section page loaded 🔗 Collected 10 article links Opening https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories Opening https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts Opening https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline Opening https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade Opening https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope Opening https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis Opening https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences Opening https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis Opening https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador Opening https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke Article 1 Title: Do you have memories of Pope Francis to share? Send them our way Date: 2025-04-21T14:25:05-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories Content Preview: Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the ... -------------------------------------------------------------------------------- Article 2 Title: Pope Francis is remembered around the world for his generosity of spirit Date: 2025-04-21T14:05:36-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts Content Preview: People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the p... -------------------------------------------------------------------------------- Article 3 Title: What happens next after a pope dies, according to recent history Date: 2025-04-21T13:14:58-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline Content Preview: The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption This is a developing story. For more of our coverage head toour latest updates. Pope Francis' death on Monday sets in motion weeks-long series of events... -------------------------------------------------------------------------------- Article 4 Title: China warns of 'countermeasures' against any deals that harm its interests Date: 2025-04-21T11:08:30-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade Content Preview: People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China's Guangdong province.Jade Gao/AFP via Getty Imageshide caption As the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm i... -------------------------------------------------------------------------------- Article 5 Title: Who is Cardinal Kevin Farrell, the acting head of the Vatican? Date: 2025-04-21T10:34:12-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope Content Preview: Cardinal Kevin Farrell, Camerlengo of the Apostolic Chamber, announced the death of Pope Francis from the Casa Santa Marta in Vatican City on Monday.Vatican Pool/Getty Imageshide caption Cardinal Kevin Farrell, who announcedPope Francis' deathon Monday morning, is now the acting head of the Vatican ... -------------------------------------------------------------------------------- Article 6 Title: A brief history of Trump's feud with Pope Francis Date: 2025-04-21T09:13:28-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis Content Preview: Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who... -------------------------------------------------------------------------------- Article 7 Title: Argentina's president, a former critic of Pope Francis, offers his condolences Date: 2025-04-21T08:32:37-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences Content Preview: Pope Francis meets with newly elected Argentinian President Javier Milei before a Canonization Ceremony in St. Peter's Basilica on Feb. 11, 2024 in Vatican City, Vatican.Vatican Pool/Getty Imageshide caption Argentina's president sent profound condolences to the family of Pope Francis and to all Cat... -------------------------------------------------------------------------------- Article 8 Title: Leaders in Africa mourn the passing of Pope Francis Date: 2025-04-21T08:07:03-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis Content Preview: Pope Francis meets with president of Kenya William Samoei Ruto during the G7 Leaders Summit on day two of the 50th G7 summit at Borgo Egnazia on June 14, 2024 in Fasano, Italy.Vatican Media via Vatican Pool/Getty Imageshide caption On Monday,Kenyan PresidentWilliam Rutoposted on X that Francis "exem... -------------------------------------------------------------------------------- Article 9 Title: Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return Date: 2025-04-21T07:19:41-04:00 URL: https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador Content Preview: Good morning. You're reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day. Pope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in o... -------------------------------------------------------------------------------- Article 10 Title: Who will be the next pope? Here's how the conclave works Date: 2025-04-21T06:38:53-04:00 URL: https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke Content Preview: White smoke billows from a chimney at the Sistine Chapel, signaling that cardinal electors have chosen a new pope — Pope Francis — on March 13, 2013.Jeff J Mitchell/Getty Imageshide caption Pope Francis has died at 88. For more of our coverage head toour latest updates. The white smoke is famous. Wh... --------------------------------------------------------------------------------
First, we define a preprocessing pipeline that cleans text by lowercasing, removing digits and punctuation, filtering out stopwords, and applying lemmatization using NLTK.
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
# downloading necessary nltk resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
def preprocess_text(text):
text = text.lower()
text = re.sub(r'\d+', '', text)
text = text.translate(str.maketrans('', '', string.punctuation))
tokens = nltk.word_tokenize(text)
tokens = [word for word in tokens if word not in stopwords.words('english')]
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return ' '.join(tokens)
[nltk_data] Downloading package punkt to [nltk_data] C:\Users\Nejjari\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package stopwords to [nltk_data] C:\Users\Nejjari\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package wordnet to [nltk_data] C:\Users\Nejjari\AppData\Roaming\nltk_data... [nltk_data] Package wordnet is already up-to-date!
# applying preprocessing
for article in articles:
article["cleaned_title"] = preprocess_text(article.get("title", ""))
article["cleaned_content"] = preprocess_text(article.get("content", ""))
# displaying the results as an example for the first few articles
for i, article in enumerate(articles[:3], 1): # preview first 3
print(f"\n📰 Article {i}")
print("🔹 Original Title :", article.get("title", ""))
print("✅ Cleaned Title :", article["cleaned_title"])
print("📄 Original Content :", article.get("content", "")[:300])
print("✅ Cleaned Content :", article["cleaned_content"][:300])
print("📅 Date :", article.get("publication_date", ""))
print("🔗 URL :", article.get("url", ""))
print("-" * 100)
📰 Article 1 🔹 Original Title : Do you have memories of Pope Francis to share? Send them our way ✅ Cleaned Title : memory pope francis share send way 📄 Original Content : Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the ✅ Cleaned Content : pope francis drive crowd inauguration mass pope st peter square march vatican city vatican mass held front expected crowd one million pilgrim faithful filled square surrounding street see former cardinal buenos aire officially take role pontiffspencer plattgetty imageshide caption wed love hear refl 📅 Date : 2025-04-21T14:25:05-04:00 🔗 URL : https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories ---------------------------------------------------------------------------------------------------- 📰 Article 2 🔹 Original Title : Pope Francis is remembered around the world for his generosity of spirit ✅ Cleaned Title : pope francis remembered around world generosity spirit 📄 Original Content : People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the p ✅ Cleaned Content : people attend interfaith memorial meeting mourn death pope francis new delhi india mondayimtiyaz khananadolu via getty imageshide caption catholic across globe mourningthe death pope francis remembering humility generosity spirit concern poor steadfast effort restore trust church year scandal franci 📅 Date : 2025-04-21T14:05:36-04:00 🔗 URL : https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts ---------------------------------------------------------------------------------------------------- 📰 Article 3 🔹 Original Title : What happens next after a pope dies, according to recent history ✅ Cleaned Title : happens next pope dy according recent history 📄 Original Content : The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption This is a developing story. For more of our coverage head toour latest updates. Pope Francis' death on Monday sets in motion weeks-long series of events ✅ Cleaned Content : funeral pope john paul ii saint peter basilica rome italy april eric vandevillegammarapho via getty imageshide caption developing story coverage head toour latest update pope francis death monday set motion weekslong series event period mourning process selecting successor vatican intricate set rule 📅 Date : 2025-04-21T13:14:58-04:00 🔗 URL : https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline ----------------------------------------------------------------------------------------------------
We then load the custom-trained en_ner_conll03 model using spaCy, define an entity extraction function, and apply it to each article's content to extract named entities.
# loading the custom trained NER model
ner_model_path = r"C:\Users\Nejjari\Documents\WebData Project\en_ner_conll03\best_ner_model"
nlp_ner = spacy.load(ner_model_path)
# defining a named entity extraction function
def extract_entities(text):
doc = nlp_ner(text)
return [(ent.text, ent.label_) for ent in doc.ents]
# applying entity extraction to all cleaned articles
for article in articles:
article["entities"] = extract_entities(article.get("content", ""))
# showing the result from one article
print("🔍 Named Entities in Article 1:\n")
for entity, label in articles[0]["entities"]:
print(f" - {entity} [{label}]")
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\spacy\util.py:910: UserWarning: [W095] Model 'en_pipeline' (0.0.0) was trained with spaCy v3.7.5 and may not be 100% compatible with the current version (3.8.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate warnings.warn(warn_msg)
🔍 Named Entities in Article 1: - Pope Francis [PER] - Inauguration Mass [ORG] - Pope [LOC] - St. Peter [LOC] - Vatican City [LOC] - Vatican [LOC] - Cardinal of Buenos Aires [ORG] - Spencer Platt [PER] - Getty Imageshide [PER] - Pope Francis [PER]
We can see the named entities found in the first article which we use as an example, listing each entity alongside its label (PERSON, LOC, ORG), for easy inspection and validation.
Below is a function using spaCy's dependency parsing (en_core_web_sm) to extract subject–verb–object triples from text. It checks for the main verb and its children to identify grammatical roles and filters the results to include only those where both subject and object are recognized named entities.
# load spaCy's small English model for dependency parsing
nlp_re = spacy.load("en_core_web_sm")
# define relation extraction function
def extract_relations(text):
doc = nlp_re(text)
relations = []
for token in doc:
# identifying the main verb (root of the sentence)
if token.pos_ == "VERB" and token.dep_ == "ROOT":
subject, object_ = None, None
# checking verb's children for subject or object
for child in token.children:
if child.dep_ in ("nsubj", "nsubjpass"):
subject = child
elif child.dep_ in ("dobj", "attr"):
object_ = child
elif child.dep_ == "prep":
for pobj in child.children:
if pobj.dep_ == "pobj":
object_ = pobj
# filtering only if both subject and object are named entities
subj_ent = next((ent for ent in doc.ents if subject and subject.text in ent.text), None)
obj_ent = next((ent for ent in doc.ents if object_ and object_.text in ent.text), None)
if subj_ent and obj_ent:
relations.append((subj_ent.text, token.lemma_, obj_ent.text))
return relations
In the cell below, the code iterates through all articles and applies the relation extraction function defined above. Extracted relations are added to each article under the "relations" key for further use or display.
# add extracted relations to each article
for article in articles:
article["relations"] = extract_relations(article.get("content", ""))
Example :
print("🔗 Relations found in Article 1:\n")
for rel in articles[0]["relations"]:
print(f"({rel[0]}) ---[{rel[1]}]---> ({rel[2]})")
🔗 Relations found in Article 1: (Francis) ---[drive]---> (Vatican City)
For better understanding, we use spaCy's displacy visualizer to show the syntactic structure of the first article's content, helping understand how relations were extracted based on dependency parsing.
from spacy import displacy
# visualize dependency structure for article 1 in Jupyter
doc = nlp_re(articles[0]["content"])
displacy.render(doc, style="dep", jupyter=True)
In the diagram, each word is labeled with a part-of-speech (POS) tag:
PROPN stands for proper noun (like names Pope”, “Francis”, “US”), NOUN means a common noun (like “gifts”), ADP means adposition (like prepositions “with”), and nsubj indicates the word is a nominal subject.
Arrows like compound show how words combine into a phrase.
Here, “Pope Francis” is treated as one unit, and “US” modifies “gifts”. The structure shows that “Pope Francis” is the subject, “exchanges” is the action, and “gifts with US” is the object.
We initialize the RDF graph construction using rdflib. It defines a helper function to sanitize entity names for URIs, sets up namespaces, and begins creating RDF triples for each article. The graph includes metadata such as title, content, source URL, and publication date.
The loop continues by creating RDF triples for extracted named entities and their types, and for relations found in the articles (subject–predicate–object). Each piece of data is formatted using the defined clean_uri() function.
from rdflib import Graph, URIRef, Namespace, RDF, Literal
from rdflib.namespace import XSD
import re
# helper to sanitize text for URIs
def clean_uri(value):
value = value.strip()
value = re.sub(r'[^a-zA-Z0-9_]', '_', value)
value = re.sub(r'_+', '_', value)
return value.strip('_')
# namespaces
EX = Namespace("http://example.org/")
NPR = Namespace("https://www.npr.org/article/")
g = Graph()
g.bind("ex", EX)
g.bind("npr", NPR)
# building graph
for i, article in enumerate(articles, 1):
article_uri = URIRef(NPR[f"article_{i}"])
g.add((article_uri, RDF.type, EX.Article))
g.add((article_uri, EX.title, Literal(article.get("title", ""), datatype=XSD.string)))
g.add((article_uri, EX.content, Literal(article.get("content", ""), datatype=XSD.string)))
g.add((article_uri, EX.sourceURL, Literal(article.get("url", ""), datatype=XSD.anyURI)))
date_str = article.get("publication_date", "")
if date_str.count("-") == 2:
g.add((article_uri, EX.date, Literal(date_str, datatype=XSD.date)))
else:
g.add((article_uri, EX.date, Literal(date_str, datatype=XSD.string)))
for ent_text, ent_type in sorted(article.get("entities", [])):
ent_uri = URIRef(EX[clean_uri(ent_text)])
ent_class = URIRef(EX[clean_uri(ent_type)])
g.add((ent_uri, RDF.type, ent_class))
for subj, pred, obj in article.get("relations", []):
subj_uri = URIRef(EX[clean_uri(subj)])
pred_uri = URIRef(EX[clean_uri(pred)])
obj_uri = URIRef(EX[clean_uri(obj)])
g.add((subj_uri, pred_uri, obj_uri))
# output graph in Turtle format
ttl_output = g.serialize(format="turtle")
print(ttl_output)
@prefix ex: <http://example.org/> .
@prefix npr: <https://www.npr.org/article/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:115 ex:vote ex:April_2005at_age_84 .
ex:AFP_via_Getty_Imageshide a ex:ORG .
ex:AP_APhide a ex:ORG .
ex:AP_Photo a ex:ORG .
ex:AP_Photo_ASSOCIATED a ex:ORG .
ex:Abdul_Latif_Rashid a ex:PER .
ex:Accordingto_Britannica a ex:PER .
ex:Africa a ex:LOC .
ex:African a ex:MISC .
ex:AfterExtra a ex:ORG .
ex:Ahmedalso a ex:PER .
ex:Alessandra_Tarantino a ex:PER .
ex:Alessia_Pierdomenico a ex:LOC .
ex:America a ex:LOC .
ex:American a ex:MISC .
ex:Anadolu a ex:LOC .
ex:Andreas_Solaro a ex:PER .
ex:Andrew_Medichini_Andrew a ex:PER .
ex:Anthony_Kuhn a ex:PER .
ex:Apostolic_Camera a ex:ORG .
ex:Apostolic_Chamber a ex:MISC .
ex:Apostolic_Palace a ex:ORG .
ex:Archdiocese_of_Washington a ex:ORG .
ex:Argentina a ex:LOC .
ex:Argentine a ex:MISC .
ex:Argentinian a ex:MISC .
ex:Arthur a ex:PER ;
ex:encounter ex:The_Black_Knight .
ex:Asia a ex:LOC .
ex:Associated_Pressthat_Francis a ex:ORG .
ex:Association_of_United_States_Catholic_Priests a ex:ORG .
ex:Basilica a ex:LOC .
ex:Beijing a ex:LOC .
ex:Benedict_XVI a ex:ORG .
ex:Bianca_Lott a ex:PER ;
ex:study ex:her_spring_semester .
ex:Bishop a ex:PER .
ex:Black_Knight a ex:PER .
ex:Bloomberg a ex:LOC .
ex:Borgo_Egnazia a ex:LOC .
ex:Britannica a ex:LOC .
ex:Bry_Jensen a ex:PER ;
ex:tell ex:NPR .
ex:Buenos_Aires a ex:LOC .
ex:CSI_300 a ex:MISC .
ex:Camerlengo a ex:PER .
ex:Canon a ex:PER .
ex:Canon_Law a ex:PER .
ex:Canonization_Ceremony a ex:ORG .
ex:Cardinal_of_Buenos_Aires a ex:ORG .
ex:Cardinals a ex:ORG .
ex:Casa_Santa_Marta a ex:ORG,
ex:PER .
ex:Catholic a ex:MISC .
ex:Catholic_Charitable_Organizations a ex:ORG .
ex:Catholic_Church a ex:ORG .
ex:Catholic_University_of_America a ex:ORG .
ex:Catholics a ex:MISC .
ex:China a ex:LOC .
ex:Chinese a ex:MISC .
ex:Christian a ex:MISC .
ex:Christians a ex:MISC ;
ex:move ex:Gaza .
ex:Church a ex:LOC .
ex:Clarin a ex:PER .
ex:College_of_Cardinals a ex:ORG .
ex:Commission_for_Confidential_Matters a ex:ORG .
ex:Conclaves a ex:PER .
ex:Cristina_Sille a ex:PER .
ex:Dallas a ex:LOC .
ex:Dennis a ex:PER .
ex:Dicastery_for_Laity a ex:ORG .
ex:Divine_Spirit a ex:PER .
ex:Donald_Trump a ex:PER ;
ex:clash ex:recent_years ;
ex:praise ex:2013 .
ex:Doug_Rand a ex:PER .
ex:Edward_J_Weisenburger a ex:PER .
ex:El_Salvador a ex:LOC .
ex:Ethiopia a ex:LOC .
ex:Eugenio_Pacelli a ex:PER .
ex:Europe a ex:LOC .
ex:European a ex:MISC .
ex:Extra a ex:MISC .
ex:Family a ex:LOC .
ex:Farrell a ex:LOC,
ex:PER .
ex:Fasano a ex:LOC .
ex:Four_House_Democrats a ex:MISC .
ex:Francis a ex:PER ;
ex:bear ex:Jorge_Bergoglio ;
ex:choose ex:March_2013 ;
ex:die ex:88,
ex:Easter_Sunday,
ex:the_age_of_88_He ;
ex:drive ex:Vatican_City ;
ex:meet ex:a_Canonization_Ceremony,
ex:the_G7_Leaders_Summit,
ex:withLee_Yong_soo ;
ex:preside ex:Vatican_City ;
ex:write ex:Santa_Maria_Maggiore,
ex:his2025 .
ex:Francisroundly a ex:ORG .
ex:Franco_Origlia a ex:ORG .
ex:French a ex:MISC .
ex:Gabriel_Romanelli a ex:PER .
ex:Gallatin_Gateway a ex:PER .
ex:Gaza_City a ex:LOC .
ex:George_Antone a ex:PER .
ex:Germany a ex:LOC .
ex:Getty_Imageshide a ex:PER .
ex:Gioacchino_Pecci a ex:PER .
ex:Giovanni_Battistaannounced a ex:PER ;
ex:battistaannounce ex:tens_of_thousands .
ex:Global_South a ex:MISC .
ex:God a ex:PER .
ex:God_Bless a ex:PER .
ex:Graham_Chapman a ex:PER .
ex:Grand_Ayatollah_Ali a ex:PER .
ex:Gregg_Gassman a ex:PER .
ex:Guangdong a ex:LOC .
ex:Guangzhou a ex:LOC .
ex:Habemus_Papam a ex:ORG .
ex:Hatciri_Lopez a ex:PER .
ex:Here a ex:PER .
ex:Holy_Family_Church a ex:ORG .
ex:Holy_Father a ex:ORG .
ex:Holy_Gospels a ex:ORG .
ex:Holy_Grail_Fathom_Entertainmenthide a ex:ORG .
ex:Holy_Seeuntil a ex:ORG .
ex:Holy_Spirit a ex:ORG .
ex:ISIS a ex:LOC,
ex:ORG .
ex:Ididsponsor a ex:ORG .
ex:Imtiyaz_Khan a ex:PER .
ex:Inauguration_Mass a ex:ORG .
ex:India a ex:LOC .
ex:Iraq a ex:LOC .
ex:Iraqi a ex:MISC .
ex:Ireland a ex:LOC .
ex:Irish a ex:MISC .
ex:Islam a ex:MISC .
ex:Israel a ex:LOC .
ex:Italian a ex:MISC .
ex:Italy a ex:LOC .
ex:J_Mitchell a ex:PER .
ex:Jade_Gao a ex:PER .
ex:Jane_Arraf a ex:PER .
ex:Japanese a ex:MISC .
ex:Javier_Milei a ex:PER .
ex:Jensen a ex:PER .
ex:Jesus a ex:PER .
ex:Joe_Raedle a ex:PER .
ex:John a ex:PER .
ex:John_Paul a ex:PER .
ex:John_Paul_II a ex:PER ;
ex:appoint ex:2001 ;
ex:die ex:Easter .
ex:Johnston_County a ex:PER .
ex:Jorge a ex:PER .
ex:Jorge_Mario_Bergoglio a ex:PER .
ex:Joseph_Ratzinger a ex:PER ;
ex:announce ex:April_8 .
ex:Judaism a ex:LOC .
ex:Kenya_William_Samoei_Ruto a ex:ORG .
ex:Kenyan_PresidentWilliam_Rutoposted a ex:PER .
ex:Kevin_Farrell a ex:PER ;
ex:announce ex:Monday ;
ex:bear ex:Dublin ;
ex:criticize ex:2018 .
ex:Kilmar_Abrego_Garcia a ex:PER .
ex:Kurt_Martens a ex:PER .
ex:LGBT_Catholics_Westminster a ex:ORG .
ex:Larry_Hogancalled a ex:PER .
ex:Latin a ex:LOC .
ex:Latin_America a ex:LOC .
ex:Latin_American a ex:MISC .
ex:Latin_for a ex:MISC .
ex:Latinos a ex:LOC .
ex:Lee a ex:PER .
ex:Life a ex:ORG .
ex:Lisa_Maree_Williams a ex:PER .
ex:Lord_for_the a ex:ORG .
ex:Lori a ex:PER .
ex:Manila_Cathedral a ex:LOC .
ex:Many_Africans a ex:MISC .
ex:Martens a ex:PER .
ex:Martin_Pendergast a ex:PER ;
ex:tell ex:London .
ex:Mary_McAleese a ex:PER .
ex:Maryland a ex:LOC .
ex:Mass a ex:LOC .
ex:McCarrick a ex:ORG,
ex:PER .
ex:Mexico a ex:LOC .
ex:Milei a ex:PER .
ex:Mont a ex:PER .
ex:Muslim_Arab a ex:MISC .
ex:N_C a ex:LOC .
ex:New_Delhi a ex:LOC .
ex:Northfield a ex:LOC .
ex:Notably a ex:PER .
ex:Omar_Al a ex:PER .
ex:Opus_Dei a ex:PER .
ex:Palestinian_Christians a ex:MISC .
ex:Palestinians a ex:MISC .
ex:Papal_Basilicas a ex:PER .
ex:Papal_Swiss_Guard a ex:MISC .
ex:Patsy a ex:PER .
ex:Pauline_Chapel a ex:ORG .
ex:People_of_God a ex:ORG .
ex:Peter a ex:PER .
ex:Pius_IX a ex:ORG .
ex:Pontifacts a ex:ORG .
ex:Pontifical_Gregorian_University a ex:ORG .
ex:Pope a ex:LOC,
ex:PER .
ex:Pope_Benedict a ex:PER .
ex:Pope_Francis a ex:PER .
ex:Pope_Francisin a ex:PER .
ex:Pope_Gregory_X_The a ex:PER .
ex:Pope_John_Paul a ex:PER .
ex:Pope_John_Paul_II a ex:PER .
ex:Pope_Leo_XIII a ex:PER .
ex:Pope_Pius a ex:PER .
ex:PresidentCyril_Ramaphosasaid a ex:PER .
ex:Prophet_Abraham a ex:ORG .
ex:Python a ex:ORG .
ex:Qattaa a ex:PER .
ex:Rand a ex:PER .
ex:Rapho a ex:LOC .
ex:Rashid a ex:PER .
ex:Ratzinger a ex:PER .
ex:Reese a ex:PER .
ex:Reuters a ex:ORG .
ex:Roman a ex:MISC .
ex:Roman_Catholic_Church a ex:MISC .
ex:Roman_Catholicism a ex:MISC .
ex:Rome a ex:LOC .
ex:Ruth_Angelettia a ex:PER .
ex:Sacred_College_of_Cardinals a ex:ORG .
ex:Saint_Peter a ex:PER .
ex:Salvadoran a ex:LOC .
ex:Salvadorian a ex:MISC .
ex:Sanctae_Marthae a ex:PER .
ex:Shiite a ex:PER .
ex:Sistani a ex:PER .
ex:Sistine a ex:MISC .
ex:Sistine_Chapel a ex:ORG .
ex:South_Africa a ex:LOC .
ex:South_Korea a ex:LOC .
ex:Spain a ex:LOC .
ex:Spanish a ex:MISC .
ex:Spanish_Catholic_Center a ex:MISC .
ex:Spencer_Platt a ex:PER .
ex:Square a ex:PER .
ex:St_Peter a ex:LOC,
ex:ORG,
ex:PER .
ex:State_Department a ex:ORG .
ex:Stephen_P_Newton a ex:PER ;
ex:reflect ex:NPR .
ex:Summit a ex:PER .
ex:Supreme_Court a ex:ORG .
ex:Swiss a ex:MISC .
ex:Terry_Gilliam a ex:PER .
ex:Thomas_Reese a ex:PER .
ex:Trump a ex:MISC,
ex:PER .
ex:Trumpmet a ex:PER .
ex:Truth_Social a ex:ORG .
ex:Typically a ex:PER .
ex:US a ex:LOC .
ex:US_First_Lady_Melania_Trump a ex:ORG .
ex:U_S a ex:LOC .
ex:U_S_Catholic a ex:MISC .
ex:U_S_Mexico a ex:LOC .
ex:United_Arab_Emirates a ex:ORG .
ex:Universal_Shepherd a ex:PER .
ex:University_of_Monterrey a ex:ORG .
ex:University_of_Salamanca a ex:ORG .
ex:Up_First a ex:ORG .
ex:Ur a ex:LOC .
ex:Vance a ex:ORG,
ex:PER .
ex:Vandeville_Gamma a ex:PER .
ex:Vatican_City_State_Supreme_Court a ex:ORG .
ex:Vatican_Media a ex:ORG .
ex:Vatican_Pool a ex:LOC .
ex:Vatican_Press_Office a ex:ORG .
ex:Vaticanuntil a ex:ORG .
ex:WUNC a ex:ORG .
ex:Washington a ex:LOC .
ex:White a ex:MISC .
ex:White_House a ex:LOC .
ex:Willem_Marx a ex:PER .
ex:William_Lorirecalled a ex:PER .
ex:World_War_II a ex:MISC .
ex:X a ex:MISC .
ex:el_Papa_Francisco a ex:ORG .
ex:likePapabili a ex:LOC .
ex:m_pic_twitter_com_3dPPFoNWBr a ex:LOC .
ex:newspaperLaCroix_International a ex:LOC .
ex:the_College_of_Cardinals ex:lock ex:Vatican .
ex:the_People_of_God ex:attend ex:Monday .
ex:withLee_Yong a ex:PER .
npr:article_1 a ex:Article ;
ex:content """Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the former Cardinal of Buenos Aires officially take up his role as pontiff.Spencer Platt/Getty Imageshide caption
We'd love to hear about your reflections on Pope Francis and about any experiences you had with him over the years. You can fill out the form belowor via this link, and share your stories, photos, voice memos, etc. An editor may be in touch to learn more."""^^xsd:string ;
ex:date "2025-04-21T14:25:05-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories"^^xsd:anyURI ;
ex:title "Do you have memories of Pope Francis to share? Send them our way"^^xsd:string .
npr:article_10 a ex:Article ;
ex:content """White smoke billows from a chimney at the Sistine Chapel, signaling that cardinal electors have chosen a new pope — Pope Francis — on March 13, 2013.Jeff J Mitchell/Getty Imageshide caption
Pope Francis has died at 88. For more of our coverage head toour latest updates.
The white smoke is famous. When it streams out of a chimney at the Sistine Chapel, it signals that a new pope has been chosen and sets off celebrations among some 1.4 billion Catholics around the world.
Behind the scenes, a mysterious and intensely dramatic process culminates in that smoke — literally. It's created by burning the ballots cardinals just used. White smoke signals that the Roman Catholic Church has a new leader; black smoke means the cardinals will need to vote again.
With the death of Pope Francis, the elaborate mechanism will now begin to decide who sits in power at the Vatican, the seat of the last absolute monarchy in Europe. It centers around the conclave, a gathering whose name stems from the Latin for "with key."
"That actually comes from the 13th century," Bry Jensen, host of the long-runningPontifacts podcast, tells NPR. She says cardinals couldn't agree on a new pope in 1268 and the Church went nearly three years without a pontiff, despite growing frustration outside the cardinals' ranks.
"They locked the cardinals up behind closed doors, and then they put them on water and bread so that they would focus on the essentials," says Kurt Martens, ordinary professor of Canon law at the School of Canon Law at the Catholic University of America.
That initial conclave elected an archdeacon who wasnot an ordained priest, who became Pope Gregory X. The new pope ordered that future Church transitions would begin with a conclave, to avoid long vacancies.
Cardinals in the conclave will be locked away within the Vatican, cut off from the outside world. As they deliberate, news outlets point cameras at the chapel's chimney, and arcane words enter casual conversation, likePapabili, or "pope-able," the term for cardinals with a chance of becoming pope.
Cardinals file into the Sistine Chapel for a conclave to elect a new pope on March 12, 2013 in the Vatican.MAURIX/Gamma-Rapho via Getty Imageshide caption
When a reigning pope dies, an immediate duty falls to the camerlengo, a cardinal whose title translates to "chamberlain." The camerlengo declares the pope is deceased andadministers the Holy Seeuntil a successor is chosen. The current camerlengo is Cardinal Kevin Farrell, the first American in that post.
You might have heard that the camerlengo uses a silver hammer to tap a pope's forehead three times, to ascertain whether he's alive. The practice has become a matter of legend, Martens says.
"The last time that that ritual was used was in 1878 when Pius IX died," he says, "but that's not done anymore."
Funeral rites for the late pope are held for nine days, as he is mourned and celebrated. Conclaves must begin within 15 to 20 days after a pope dies or resigns.
Upon the pope's death, the dean of the College of Cardinals calls the cardinal electors to the Vatican. There are currently135 of them. To join the conclave, cardinals must be under 80 years old.
During the conclave, the cardinals live in theDomus Sanctae Marthae, a hotel-like facility next to St. Peter's Basilica.It's where Pope Francis opted to live, rather than in the Apostolic Palace's papal apartments. The residence has beencompared to a three-star hotel.
"I've eaten there. I must say I'd rather go to a nice Roman restaurant than eat there," Martens says. But, he adds, that's part of the point.
"You don't want to make it more than a three-star hotel. Because you don't want the Cardinals to get comfortable," but instead to focus on electing a pontiff.
The rituals take place according torules popes have refined over the centuries, clarifying the timeframe and obligations. But the conclave itself must be obscured by "total secrecy," as Pope John Paul II wrote. Cardinal electors must sign an oath of secrecy and seclusion, under threat of excommunication.
That's why the process intrigues so many people, says Gregg Gassman, a librarian who edits the Pontifacts podcast.
"Some of the mystery does come from the closed nature of the conclave itself," he says. "It's fascinating."
Once the cardinals are gathered, the dean of the College of Cardinals presides over a mass. The group then walks together from the Pauline Chapel to the Sistine Chapel, singing hymns invoking the Holy Spirit.
In the Sistine Chapel, the conclave swears a secrecy oath in Latin, touching the Holy Gospels.
"When that ceremony is over, you have the papal master of ceremonies who in a dramatic way says,Extra omnes," Martens says. "Roughly, it means, 'Get the hell out of here, all you who don't belong here,' meaning only the cardinal electors can remain."
Outside the chapel, the famed Papal Swiss Guard stands guard.
Swiss Guards line up in front of St. Peter's Basilica at the Vatican, Wednesday, Dec. 25, 2024. (AP Photo/Andrew Medichini)Andrew Medichini/AP/APhide caption
"That's when the doors are locked, that's when the verbal and communicative gates go down," Jensen says. "AfterExtra omnes, there is no further communication until a pope has been elected, aside from smoke."
"There's only one round that first evening, and then you will see black smoke or white smoke," Martens says.
Typically, he says, the first round is merely an indication of the cardinals' priorities. On the following day, the conclave starts holding two rounds of voting each morning, and another two in the afternoon.
After each vote, a needle is pushed through the ballots, binding them together. If no winner emerges with a two-thirds majority, the two packages are "put together in that stove that is in the corner of the Sistine Chapel, to burn them and produce whatever smoke needs to be produced — white or black," Martens says.
A close up of the stove in which the cardinals will burn their ballots during the election for a new pontiff in the Sistine chapel in Vatican City shown October 1978. At right a container with chemicals to produce white and black smoke in the time of burning to say if a new pope was named or not. (AP Photo)ASSOCIATED PRESS/APhide caption
The Church once used wet straw or dry straw to produce the right color, but to avoid confusion, the process now relies on chemicals.
The cardinals will continue to pray and contemplate — and vote — until a new pope is elected.
"All of the conclaves from the 1900s onwards have been under four days," Jensen says.
Pope Francis presides over a Mass at St. Peter's Square on Feb. 9, 2025 in Vatican City. During the Mass, the Pope asked his master of ceremonies to continue reading his homily for him, as he became short of breath.Franco Origlia/Getty Imageshide caption
Francis was elected pope on the conclave's second day, for instance.
After a successful vote, the winning candidate is asked two questions. The first is whether they accept their election as pope.
"And then the second question is going to be, 'What name do you choose?' And then the name is chosen," Martens says.
Before Pope Francis was elected, many of the faithful in Buenos Aires knew their archbishop as simply "Father Jorge," asNPR reported in 2013.
Official documents are filled out, and the new pope is taken into a sacristy, to be fitted with papal attire.
"There are typically three sets of vestments ready," in sizes roughly equal to small, medium and large, Martens says.
Soon afterward, the senior cardinal deacon will appear on the balcony over St. Peter's Square, announcing,Habemus Papam!— "We have a pope!"
It will then be the new pope's turn to emerge onto the balcony and deliver his first blessing."""^^xsd:string ;
ex:date "2025-04-21T06:38:53-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke"^^xsd:anyURI ;
ex:title "Who will be the next pope? Here's how the conclave works"^^xsd:string .
npr:article_2 a ex:Article ;
ex:content """People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption
Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the poor, and steadfast efforts to restore trust in the church after years of scandal.
Francis died early Monday in Rome at the age of 88, just one day after Easter Sunday. His death marks the end of a 12-year papacy that began in 2013 following the historic resignation of Benedict XVI — the first pontiff to step down in nearly six centuries.
The Vatican announced that the pope's body would be placed in a coffin on Monday evening, with Cardinal Kevin Farrell presiding over the rite in the chapel of Casa Santa Marta. The Dublin-born cardinal is theacting head of the Vaticanuntil a new pope is elected.
Outside St. Peter's Basilica in Vatican City, American tourists were among those mourning Francis' death, including Doug Rand and his wife, Ruth Angelettia from Gallatin Gateway, Mont.
A digital screen shows an image of Pope Francis in Saint Peter's Square on Monday in Vatican City.Alessia Pierdomenico/Bloomberg via Getty Imageshide caption
Rand described the pope as someone who worked "right up to the last day" for "the little guy and the poor and the disadvantaged and the abused."
Bianca Lott, from Northfield, Minn., is studying abroad in Rome for her spring semester. Given that Francis died on Easter Monday, she said she felt "a strange happiness at the timing," which she called "poetic."
BaltimoreArchbishop William Lorirecalled thepope's final appearancegreeting crowds in St. Peter's Square on Easter, just hours before his death. "It was as if to say farewell to the People of God whom he loved so dearly and served so devotedly," Lori said in a statement. "Throughout his pontificate, Pope Francis showed deep compassion for the poor and marginalized, uplifting the voices of migrants, the sick, the elderly, and victims of injustice."
Former Maryland Gov. Larry Hogancalled Francis"a truly extraordinary leader of the Church — humble, gracious, and deeply prayerful."
DetroitArchbishop Edward J. Weisenburger shared: "My heart is heavy as our world has lost a powerful, prophetic, and loving voice. Yet I rejoice in what I pray is a blessed reward of joy beyond all understanding for a truly great and loving Universal Shepherd."
The Rev. Stephen P. Newton, executive director of the Association of United States Catholic Priests, reflected in an email to NPR: "While we will miss his beautiful, often smiling presence, his example will continue to inspire us to become the church Jesus intended: one that is open and deeply listens to the movement of the Divine Spirit within us, our earth, and the universe."
On itswebsite, Opus Dei, the conservative Catholic organization, offered prayers "to the Lord for the soul of our beloved Pope Francis," adding, "God will have rewarded his generous dedication to the service of the People of God and the whole world."
People pray during a service for Pope Francis in Buenos Aires Cathedral on Monday.Cristina Sille/Picture Alliance via Getty Imageshide caption
Francis, the first-ever Latin American pope, once served as archbishop in Buenos Aires. In the Argentine capital, the government declared seven days of mourning and citizens gathered for a special mass at the city's cathedral,Reuters reports.
The pope also touched the lives of many Latinos around the world by communicating with them in Spanish. Hatciri Lopez, a lifelong Catholic from rural Johnston County, N.C., told NPR member station WUNC that Francis grew her faith.
"It's just easier for the message to get to your heart, instead of hearing it from a translator," she said. "Just as soon as I heard him speak, it would just strike my heart right away. I would just want to cry and just feel a sense of happiness and hope for the future."
In London, Martin Pendergast, secretary of LGBT Catholics Westminster,told The Associated Pressthat Francis was "the first pope to actually use the word 'gay,' so even the way he speaks has been a radical transformation — and some would say a bit of a revolution as well — compared with some of his predecessors."
In South Korea,political and religious leaders remembered the popefor his compassion toward the victims of conflict and disaster.
On a visit to South Korea in 2014, Francis met withLee Yong-soo, who was forced into sexual servitude by the Japanese military during World War II.
"He must have gone to a good place," Lee, now 96, said of the pontiff following his death.
In besieged Gaza, where more than 50,000 Palestinians have been killed in more than 18 months of war with Israel, Christians there were deeply moved by Pope Francis' nightly phone call to them offering comfort amid the frightening conflict. The Rev. Gabriel Romanelli of the Holy Family Church said the pope's final call came the night before his death, according toReuters.
Members of the clergy hold Mass for late Pope Francis at the Holy Family Church in Gaza City on Monday. Palestinian Christians in Gaza mourned the death of Pope Francis, who had maintained close and consistent contact with the besieged territory's small Christian community throughout the ongoing war with Israel.Omar Al-Qattaa/AFP via Getty Imageshide caption
"We lost a saint who taught us every day how to be brave, how to keep patient and stay strong," George Antone, at the Holy Family Church in Gaza, told the news agency.
Francis focused on outreach to the overwhelmingly Muslim Arab world during his papacy, making groundbreaking visits to Iraq and the United Arab Emirates.
In 2021, Francis was the first pope in history to travel to Iraq, meeting the revered Shiite spiritual leader Grand Ayatollah Ali al Sistani and visiting Ur, the reputed birthplace of the Prophet Abraham, known as the patriarch of Judaism, Christianity and Islam.
In apost on X, Iraqi President Abdul Latif Rashid was also amongthose to mourn the pope's death, calling him "a remarkable religious and humanitarian leader whose life was devoted to promoting peace, alleviating poverty, and fostering interfaith tolerance."
"His humanitarian stance against war and violence, and his continuous calls for peace and coexistence, will leave an indelible impact on the world," Rashid wrote.
Willem Marx, Anthony Kuhn and Jane Arraf contributed reporting."""^^xsd:string ;
ex:date "2025-04-21T14:05:36-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts"^^xsd:anyURI ;
ex:title "Pope Francis is remembered around the world for his generosity of spirit"^^xsd:string .
npr:article_3 a ex:Article ;
ex:content """The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption
This is a developing story. For more of our coverage head toour latest updates.
Pope Francis' death on Monday sets in motion weeks-long series of events, from a period of mourning to the process of selecting his successor.
The Vatican has an intricate set of rules governing the papal transition, a process the world doesn't get to watch unfold very often.
Pope Francis was chosen for the job in March 2013, several weeks after Pope Benedict XVIresigned from his post— a rare move that he blamed on his advanced age and diminishing strength. He died at age 95 inDecember 2022.
The most recent pope to die in office was the previous pontiff, Pope John Paul II. Hedied in April 2005at age 84, after 26 years in the papacy.
Here's what happened after:
1. The pope's death is confirmed
The pope's death is supposed to be immediately verified and communicated to relevant parties, as Father Thomas Reese, then-editor in chief ofAmerica,told NPR in 2005.
The prefect of the papal household tells the camerlengo — in 2025, that'sCardinal Kevin Farrell— who must verify the pope's death in the presence of the papal master of ceremonies, the cleric prelates of the Apostolic Camera and the secretary of the Apostolic Camera, who draws up a death certificate.
Then the camerlengo and prefect of the papal household pass the news to various officials in the Vatican, who relay it to the people of Rome and the heads of nations.
"Although this is the formal procedure, in fact most people will first hear of the death of the pope from the media," Reese said.
John Paul II died at 9:37 p.m. on April 2, 2005 — six days after Easter. The Vatican Press Office informed journalists of his death via emailexactly 17 minutes later.
Cardinal Giovanni Battistaannounced the pope'sdeath to tens of thousands of people who had gathered for a vigil in St. Peter's Square in Vatican City. The crowd grew silent, and some people clapped in tribute.
2. The papal apartments are locked
The camerlengo locks and seals the pope's apartment.
While looting ("by staff, the cardinals or the Roman populace") was a concern in the past, modern popes are more concerned that their private papers stay out of the wrong hands.
The camerlengo destroys the pope's fisherman's ring and seal — traditionally with a special hammer,per Britannica— to symbolize the end of his reign and prevent misuse, like forging documents.
John Paul II's ring and seal were destroyed in asymbolic ritualon April 16, 2005 — at the end of the nine-day mourning period and before the start of the conclave.
3. The mourning period lasts nine days
Thousands of people wait in line at the St Peter's Basilica to view the body of Pope John Paul II on April 6, 2005 in Vatican City.Joe Raedle/Getty Imageshide caption
A pope's death is followed by a nine-day mourning period called the novemdiales. The cardinals arrange for the funeral rites to be observed during this time.
This is also when a pope lies in state. Approximately1 millionmourners visited John Paul II's body as it lay in state — first in the Apostolic Palace for staff and then St. Peter's Basilica for public viewing — for several days before his funeral on April 8.
According to Reese, the date for the funeral and burial is set by the College of Cardinals, but the apostolic constitution states it is to "take place, except for special reasons, between the fourth and sixth day after death."
Francis wrote in his2025 autobiographythat he found the customary service "excessive."
"So I have arranged with the master of ceremonies to lighten it: no catafalque, no ceremony for the closure of the casket, nor the deposition of the cypress casket into a second of lead and a third of oak," he wrote.
4. Burial occurs within four to five days
The camerlengo is tasked with arranging the funeral in accordance with instructions the pope leaves behind.
John Paul II's funeral took place six days after his death in Saint Peter's Square, at 10 a.m. local time on April 8.
The three-hour ceremony was conducted by the dean of the Sacred College of Cardinals, Cardinal Joseph Ratzinger, with help from some 164 cardinals.
About 2 million people watched online, and thelong list of dignitarieswho attended in person included four kings, five queens and at least 70 presidents and prime ministers, according toDemocracyNow.
John Paul II wasburied immediately afterwardin the crypt of St. Peter's Basilica in the Vatican. Pope Francis wrote in his memoir that he has a different final resting place in mind"
"When it happens, I will not be buried in Saint Peter's but at Santa Maria Maggiore," he wrote, referring to one of the four Papal Basilicas in Rome. "The Vatican is the home of my last service, not my eternal home."
5. The conclave choses the next pope
White smoke vents up from the chimney of the Sistine Chapel on April 19, 2005, meaning that Catholic Church cardinals elected a new leader after a conclave lasting little more than 24 hours.Andreas Solaro/AFP via Getty Imageshide caption
The camerlengo is the acting head of the Vatican until the next pope is chosen and organizes the election process, which is called the conclave.
All cardinals under 80 years of age when the pope dies have the right to vote for the next pope — 115 of them voted in 2005. The process involvesmultiple rounds of votingover several days.
The conclave traditionally begins 15 to 20 days after the pope's death (the College of Cardinals sets the date and time). All of the conclaves since the 1900s have lasted less than four days — the last conclave to run more than five days was in 1831, and it lasted for 54.
The election takes place in the Sistine Chapel and is completely secret. But the public can watch the chimney for hints — black smoke means the cardinals will need to vote again; white smoke means a new pope has been chosen.
In 2005,the conclavebegan on the afternoon of April 18 — 16 days after the pope's death, and 10 days after his funeral. It lasted just two days, ending on April 19 when Cardinal Joseph Ratzinger was elected after just four ballots.
After the vote, the winning candidate is asked two questions: Do they accept their election, and what name will they chose? Then official documents are filled out, the new pope is fitted with papal attire — there are typically three sets of garments at the ready — and the news is announced to the public.
The senior cardinal deacon appears on the balcony over St. Peter's Square and announces "Habemus Papam!"— "We have a pope!"
Ratzinger was announced as Pope Benedict XVI on April 19 and installed as pope with Mass on April 24. He made his first foreign trip, to his native Germany, in August of that year."""^^xsd:string ;
ex:date "2025-04-21T13:14:58-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline"^^xsd:anyURI ;
ex:title "What happens next after a pope dies, according to recent history"^^xsd:string .
npr:article_4 a ex:Article ;
ex:content """People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China's Guangdong province.Jade Gao/AFP via Getty Imageshide caption
As the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm its interests.
China's commerce ministry says it respects the efforts of others to try to resolve trade disputes with the U.S. through consultation. But it warns that it will take "corresponding countermeasures" if any deals are struck at the expense of China's interests. It did not give details.
The comments come after reports that Trump is hoping touse tariff negotiations with other countriesto isolate China. At the same time, Trump has said he wants to do a deal with Beijing. This month he raised the base tariff on Chinese imports to a dizzying 145%. China responded in kind, with high tariffs on U.S. goods.
The Chinese commerce ministry says seeking tariff exemptions by harming the interests of others will only lead to failure on both sides and ultimately hurt everyone."""^^xsd:string ;
ex:date "2025-04-21T11:08:30-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade"^^xsd:anyURI ;
ex:title "China warns of 'countermeasures' against any deals that harm its interests"^^xsd:string .
npr:article_5 a ex:Article ;
ex:content """Cardinal Kevin Farrell, Camerlengo of the Apostolic Chamber, announced the death of Pope Francis from the Casa Santa Marta in Vatican City on Monday.Vatican Pool/Getty Imageshide caption
Cardinal Kevin Farrell, who announcedPope Francis' deathon Monday morning, is now the acting head of the Vatican until a new pope is elected.
There's a name for the person with that job: the camerlengo.
Accordingto Britannica, the cardinal camerlengo in Roman Catholicism is a key dignitary of the Vatican who is personally appointed by the pope and tasked with "a specific series of functions in the crucial time of transition from one pope to his successor."
Those tasks include verifying the pope's death, destroying the late pope'ssymbolic fisherman ringandpreparing the conclave, the process by which a new pope is elected.
Farrell, a Dublin-born, naturalized U.S. citizen, held a series of positions at the Vatican before Pope Francis nominated him as the camerlengo in 2019. Here's what to know about him.
Farrell, 77, was born in September, 1947 in Dublin, and after completing secondary school went on to attend the University of Salamanca in Spain and the Pontifical Gregorian University in Rome.
He was ordained a priest on Dec. 24, 1978, and began his career serving as chaplain at the University of Monterrey in Mexico. He moved to the U.S. to join the Archdiocese of Washington in 1984, according to hisVatican biography.
Farrell held a series of positions in several parishes in the area, including director of the Spanish Catholic Center, acting executive director of the Catholic Charitable Organizations and secretary for financial affairs.
Pope John Paul II appointed Farrell as an auxiliary bishop of Washington in 2001. He served as moderator of the curia and chief vicar general until 2007, when he was appointed bishop of Dallas.
In 2016, Pope Francis appointed Farrell as the prefect of the newly established Dicastery for Laity, Family and Life.
"My administrative assistant, came in and said, 'The Pope's on the telephone, and I felt like saying, 'Yeah, yeah,'" Farrell said at a press conference at the time, according to the localNBC affiliate. "Eventually she did put on the Pope, and he told me that he would like me to go to Rome because Dallas needed a much better Bishop than I am."
The pope named Farrell a cardinal later that same year, and continued to elevate him to positions in the Vatican.
He was nominated as camerlengo in 2019, appointed as president of the Commission for Confidential Matters in 2020 and appointed as president of the Vatican City State Supreme Court effective January 2024.
Farrell has been in close proximity to scandal — and scandalous figures — during his career.
Notably, from 2002 to 2006, heworked and lived withTheodore McCarrick, a once-powerful Catholic cardinal who wasdefrocked by Pope Francisin 2019 after a Vatican investigation determined he had molested adults and children.
After those allegations came to light in 2018, Farrell publicly said he had not known or suspected anything about McCarrick's behavior.
Also in 2018, Farrell was criticized for allegedly barring a group called Voices of Faith from holding its fourth annual Women's Day event inside the Vatican.
Some people, includingmembers of the group, believed the reason was that several of the would-be speakers — including former Irish President Mary McAleese — openly supported same-sex marriage, among other issues.
When asked about the controversy at an unrelated event weeks later, Farrell did not go into much detail about the reason behind his decision.
"Having been told subsequently that Ididsponsor that event and having been told subsequently what the event was about, it was not appropriate for me to continue to sponsor such an event," he said, according to the French newspaperLaCroix International. "Obviously, when I withdrew the sponsorship of the event it couldn't be inside the Vatican."
Farrell hassaid publiclythat while the church cannot bless same-sex unions, that no one should be excluded from the "pastoral care and love of the Church."
Farrell's position as camerlengo doesn't inherently disqualify or prime him for the position of pope.
TheTimesreportsthat only two camerlengos have been elected pope before: Gioacchino Pecci, as Pope Leo XIII in 1878, and Eugenio Pacelli, as Pope Pius XII in 1939.
There has never been a pope from Ireland or the U.S."""^^xsd:string ;
ex:date "2025-04-21T10:34:12-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope"^^xsd:anyURI ;
ex:title "Who is Cardinal Kevin Farrell, the acting head of the Vatican?"^^xsd:string .
npr:article_6 a ex:Article ;
ex:content """Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption
President Trump has acknowledged the pope's death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"
Trump and Francis clashed repeatedly in recent years.
Trump praised the pope at the start of Francis's papacy, in 2013, several years before Trump reached the White House.
"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.
Things soured soon after. During the 2016 election, Francisroundly criticizedTrump's campaign proposal to build a wall on the U.S.-Mexico border.
"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.
Trump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person's faith is disgraceful."
"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS's ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.
Trumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.
Look at their faces.pic.twitter.com/0t84cBX8bZ
Nearly a decade later, amidst the second Trump administration's crackdown on immigration, the pope once again made a rare public rebuke of the president's policies.
In apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."
He said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."
"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.
The letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.
"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote."""^^xsd:string ;
ex:date "2025-04-21T09:13:28-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis"^^xsd:anyURI ;
ex:title "A brief history of Trump's feud with Pope Francis"^^xsd:string .
npr:article_7 a ex:Article ;
ex:content """Pope Francis meets with newly elected Argentinian President Javier Milei before a Canonization Ceremony in St. Peter's Basilica on Feb. 11, 2024 in Vatican City, Vatican.Vatican Pool/Getty Imageshide caption
Argentina's president sent profound condolences to the family of Pope Francis and to all Catholics ina messageposted to X from the pontiff's homeland.
Javier Milei, a far-right libertarian who stridently defends free markets, acknowledged his and the pope's differing viewpoints.
"Despite differences that seem minor today, having been able to know him in his goodness and wisdom was a true honor for me," Milei added on X. "I bid farewell to the Holy Father and stand with all of us who are today dealing with this sad news."
ADIÓSCon profundo dolor me entero esta triste mañana que el Papa Francisco, Jorge Bergoglio, falleció hoy y ya se encuentra descansando en paz. A pesar de diferencias que hoy resultan menores, haber podido conocerlo en su bondad y sabiduría fue un verdadero honor para mí.…pic.twitter.com/3dPPFoNWBr
During the 2023 presidential race, then-candidate Milei had decried the pope, calling him an "imbecile" who defended social justice and equating him to evil and the devil.
However, once in office, Milei softened his tone, even visiting the Vatican to meet with Francis.
Francis was born in Buenos Aires in 1936 as Jorge Bergoglio. His parents were Italian immigrants and as a boy he learned Italian, but Spanish was dominant in his home. He rose to be the Archbishop of Buenos Aires, and many in Argentina lament that he never came home to visit as pope.
Mass will be held today in his honor in the capital's cathedral where he presided. According to the newspaper Clarin, the country will observe seven days of mourning."""^^xsd:string ;
ex:date "2025-04-21T08:32:37-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences"^^xsd:anyURI ;
ex:title "Argentina's president, a former critic of Pope Francis, offers his condolences"^^xsd:string .
npr:article_8 a ex:Article ;
ex:content """Pope Francis meets with president of Kenya William Samoei Ruto during the G7 Leaders Summit on day two of the 50th G7 summit at Borgo Egnazia on June 14, 2024 in Fasano, Italy.Vatican Media via Vatican Pool/Getty Imageshide caption
On Monday,Kenyan PresidentWilliam Rutoposted on X that Francis "exemplified servant leadership through his humility, his unwavering commitment to inclusivity and justice, and his deep compassion for the poor and the vulnerable."
In neighboring Ethiopia, Prime MinisterAbiy Ahmedalso turned to social media to mourn the pope, saying "may his legacy of compassion, humility, and service to humanity continue to inspire generations to come."
In South Africa, PresidentCyril Ramaphosasaid in a statement that Pope Francis was "a spiritual leader who sought to unite humanity and wished to see a world governed by fundamental human values.
Pope Francis, an Argentine, was notable as the first pontiff from the Global South. Many Africans will be watching as the Vatican decides on his successor, hoping for the first African pope."""^^xsd:string ;
ex:date "2025-04-21T08:07:03-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis"^^xsd:anyURI ;
ex:title "Leaders in Africa mourn the passing of Pope Francis"^^xsd:string .
npr:article_9 a ex:Article ;
ex:content """Good morning. You're reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day.
Pope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in over a millennium and was one of themost popular popes in decades. He was elected to his exalted post in 2013 and cast an image of humility during years of strain and change, within his church and worldwide.
Pope Francis waves to thousands of followers as he arrives at the Philippines' Manila Cathedral on Jan. 16, 2015. During his papacy, Francis strove to reach out to what he called the "periphery" of the world in Asia, Africa and Latin America.Lisa Maree Williams/Getty Imageshide caption
A big draw to Pope Francis was his personal story.He was the son of immigrants and grew up in Argentina, where he lived through turbulent times. Francis, born Jorge Mario Bergoglio, was the first pope from Latin America.
Four House Democrats were scheduled to land in El Salvador today to demand the release and return of Kilmar Abrego Garcia, a Salvadorian citizen who lived in Maryland. He was deported to a Salvadoran prison due to an "administrative error," according to the Trump administration. The lawmakerssaid in a statementthey hope "to pressure" the White House "to abide by a Supreme Court order."
The State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.Mandel Ngan/APhide caption
The State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.
NPR has learned that the Trump administration is substantiallyscaling back the State Department's annual reportson international human rights to remove critiques of abuses such as harsh prison conditions and government corruption. These reports are intended to guide congressional foreign aid and security assistance decisions. Moving forward, the State Department will no longer call out governments for restricting freedom of movement and peaceful assembly. Additionally, the reports will not condemn the detention of political prisoners without due process or the limitations placed on free and fair elections.
nasal sprayDDurrich/iStockphoto/Getty Imageshide caption
Living Better is aspecial seriesabout what it takes to stay healthy in America.
It is not just in your head; seasonal allergies are getting worse every year. This is due to the warming from climate change, making the pollen season longer. Luckily, there are ways to keep the pollen from taking over your life. Here aretips from doctorson how to get relief:
King Arthur (Graham Chapman) and his servant Patsy (Terry Gilliam) encounter The Black Knight (John Cleese)Monty Python and the Holy Grail/Fathom Entertainmenthide caption
This newsletter was edited byYvonne Dennis."""^^xsd:string ;
ex:date "2025-04-21T07:19:41-04:00"^^xsd:string ;
ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador"^^xsd:anyURI ;
ex:title "Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return"^^xsd:string .
ex:Dublin a ex:LOC .
ex:Easter a ex:PER .
ex:Easter_Sunday a ex:PER .
ex:Gaza a ex:LOC .
ex:Jorge_Bergoglio a ex:PER .
ex:London a ex:LOC .
ex:Santa_Maria_Maggiore a ex:LOC .
ex:Vatican a ex:LOC .
ex:his2025 a ex:MISC .
ex:NPR a ex:ORG .
ex:Vatican_City a ex:LOC .
This output is a Turtle serialization of an RDF graph that represents structured data extracted from NPR news articles.
Each line indicates a resource (such as a person, location, or organization) and assigns it a type.
Articles are described using unique URIs (npr:article_1) and contain properties like title, content, date, and sourceURL.
Relationships between entities are also captured, such as who told or who met who, creating a semantic web of linked data.
The next cell of code is to visualise the RDF graph showing a simplified view of the knowledge graph, where each node represents an article linked to its publication date via the date predicate, and all are typed as instances of the Article class.
import networkx as nx
G = nx.DiGraph()
# Choose only a couple of articles
target_articles = {"article_1", "article_2"}
max_node_length = 20 # short label names
for subj, pred, obj in g:
subj_label = subj.split("/")[-1][:max_node_length]
pred_label = pred.split("/")[-1][:max_node_length]
obj_label = obj.split("/")[-1][:max_node_length] if isinstance(obj, URIRef) else str(obj)[:max_node_length]
# Keep only nodes related to article_1 or article_2
if not any(a in subj_label for a in target_articles) and not any(a in obj_label for a in target_articles):
continue
# Skip content/text relations
if pred_label in {"content", "title", "sourceURL"}:
continue
G.add_node(subj_label)
G.add_node(obj_label)
G.add_edge(subj_label, obj_label, label=pred_label)
# Draw the simplified graph
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, k=0.8, seed=42)
nx.draw_networkx_nodes(G, pos, node_color="#AED6F1", node_size=1800)
nx.draw_networkx_edges(G, pos, arrows=True, edge_color="gray")
nx.draw_networkx_labels(G, pos, font_size=10)
edge_labels = nx.get_edge_attributes(G, 'label')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=9)
plt.title("Simplified RDF Graph (First 2 Articles)", fontsize=14)
plt.axis("off")
plt.show()
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 129504 (\N{BRAIN}) missing from current font.
fig.canvas.print_figure(bytes_io, **kw)
We first test out three SPARQL queries on the RDF graph to list all 'persons' , extract subject-verb-object triples with example.org URIs, and find articles containing a given keyword :
from rdflib.plugins.sparql import prepareQuery
# List all people
query_persons = prepareQuery("""
SELECT ?entity WHERE {
?entity a <http://example.org/PER> .
}
""")
print("Persons found in the graph:")
for result in g.query(query_persons):
print(result)
# Extract subject-verb-object triples with URI pattern
query_triples = prepareQuery("""
SELECT ?sub ?rel ?obj WHERE {
?sub ?rel ?obj .
FILTER(STRSTARTS(STR(?sub), "http://example.org/")) .
FILTER(STRSTARTS(STR(?rel), "http://example.org/")) .
FILTER(STRSTARTS(STR(?obj), "http://example.org/")) .
}
""")
print("\nTriples with example.org URIs:")
for result in g.query(query_triples):
print(result)
# Search for keyword in article contents
keyword = "Trump"
query_keyword = prepareQuery(f"""
SELECT ?doc ?text WHERE {{
?doc a <http://example.org/Article> ;
<http://example.org/content> ?text .
FILTER(CONTAINS(LCASE(STR(?text)), LCASE("{keyword}")))
}}
""")
print(f"\nArticles mentioning '{keyword}':")
for result in g.query(query_keyword):
print(result)
Persons found in the graph:
(rdflib.term.URIRef('http://example.org/Getty_Imageshide'),)
(rdflib.term.URIRef('http://example.org/Pope_Francis'),)
(rdflib.term.URIRef('http://example.org/Spencer_Platt'),)
(rdflib.term.URIRef('http://example.org/Abdul_Latif_Rashid'),)
(rdflib.term.URIRef('http://example.org/Anthony_Kuhn'),)
(rdflib.term.URIRef('http://example.org/Bianca_Lott'),)
(rdflib.term.URIRef('http://example.org/Casa_Santa_Marta'),)
(rdflib.term.URIRef('http://example.org/Cristina_Sille'),)
(rdflib.term.URIRef('http://example.org/Divine_Spirit'),)
(rdflib.term.URIRef('http://example.org/Doug_Rand'),)
(rdflib.term.URIRef('http://example.org/Easter'),)
(rdflib.term.URIRef('http://example.org/Easter_Sunday'),)
(rdflib.term.URIRef('http://example.org/Edward_J_Weisenburger'),)
(rdflib.term.URIRef('http://example.org/Francis'),)
(rdflib.term.URIRef('http://example.org/Gabriel_Romanelli'),)
(rdflib.term.URIRef('http://example.org/Gallatin_Gateway'),)
(rdflib.term.URIRef('http://example.org/George_Antone'),)
(rdflib.term.URIRef('http://example.org/God'),)
(rdflib.term.URIRef('http://example.org/Grand_Ayatollah_Ali'),)
(rdflib.term.URIRef('http://example.org/Hatciri_Lopez'),)
(rdflib.term.URIRef('http://example.org/Imtiyaz_Khan'),)
(rdflib.term.URIRef('http://example.org/Jane_Arraf'),)
(rdflib.term.URIRef('http://example.org/Jesus'),)
(rdflib.term.URIRef('http://example.org/Johnston_County'),)
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'),)
(rdflib.term.URIRef('http://example.org/Larry_Hogancalled'),)
(rdflib.term.URIRef('http://example.org/Lee'),)
(rdflib.term.URIRef('http://example.org/Lori'),)
(rdflib.term.URIRef('http://example.org/Martin_Pendergast'),)
(rdflib.term.URIRef('http://example.org/Mont'),)
(rdflib.term.URIRef('http://example.org/Omar_Al'),)
(rdflib.term.URIRef('http://example.org/Opus_Dei'),)
(rdflib.term.URIRef('http://example.org/Qattaa'),)
(rdflib.term.URIRef('http://example.org/Rand'),)
(rdflib.term.URIRef('http://example.org/Rashid'),)
(rdflib.term.URIRef('http://example.org/Ruth_Angelettia'),)
(rdflib.term.URIRef('http://example.org/Saint_Peter'),)
(rdflib.term.URIRef('http://example.org/Shiite'),)
(rdflib.term.URIRef('http://example.org/Sistani'),)
(rdflib.term.URIRef('http://example.org/Stephen_P_Newton'),)
(rdflib.term.URIRef('http://example.org/Universal_Shepherd'),)
(rdflib.term.URIRef('http://example.org/Willem_Marx'),)
(rdflib.term.URIRef('http://example.org/William_Lorirecalled'),)
(rdflib.term.URIRef('http://example.org/withLee_Yong'),)
(rdflib.term.URIRef('http://example.org/Andreas_Solaro'),)
(rdflib.term.URIRef('http://example.org/Giovanni_Battistaannounced'),)
(rdflib.term.URIRef('http://example.org/Here'),)
(rdflib.term.URIRef('http://example.org/Joe_Raedle'),)
(rdflib.term.URIRef('http://example.org/John_Paul'),)
(rdflib.term.URIRef('http://example.org/John_Paul_II'),)
(rdflib.term.URIRef('http://example.org/Joseph_Ratzinger'),)
(rdflib.term.URIRef('http://example.org/Papal_Basilicas'),)
(rdflib.term.URIRef('http://example.org/Peter'),)
(rdflib.term.URIRef('http://example.org/Pope_Benedict'),)
(rdflib.term.URIRef('http://example.org/Pope_John_Paul'),)
(rdflib.term.URIRef('http://example.org/Pope_John_Paul_II'),)
(rdflib.term.URIRef('http://example.org/Ratzinger'),)
(rdflib.term.URIRef('http://example.org/Reese'),)
(rdflib.term.URIRef('http://example.org/Square'),)
(rdflib.term.URIRef('http://example.org/St_Peter'),)
(rdflib.term.URIRef('http://example.org/Thomas_Reese'),)
(rdflib.term.URIRef('http://example.org/Vandeville_Gamma'),)
(rdflib.term.URIRef('http://example.org/Jade_Gao'),)
(rdflib.term.URIRef('http://example.org/Trump'),)
(rdflib.term.URIRef('http://example.org/Accordingto_Britannica'),)
(rdflib.term.URIRef('http://example.org/Bishop'),)
(rdflib.term.URIRef('http://example.org/Camerlengo'),)
(rdflib.term.URIRef('http://example.org/Eugenio_Pacelli'),)
(rdflib.term.URIRef('http://example.org/Farrell'),)
(rdflib.term.URIRef('http://example.org/Gioacchino_Pecci'),)
(rdflib.term.URIRef('http://example.org/Mary_McAleese'),)
(rdflib.term.URIRef('http://example.org/McCarrick'),)
(rdflib.term.URIRef('http://example.org/Notably'),)
(rdflib.term.URIRef('http://example.org/Pope'),)
(rdflib.term.URIRef('http://example.org/Pope_Francisin'),)
(rdflib.term.URIRef('http://example.org/Pope_Leo_XIII'),)
(rdflib.term.URIRef('http://example.org/Pope_Pius'),)
(rdflib.term.URIRef('http://example.org/Alessandra_Tarantino'),)
(rdflib.term.URIRef('http://example.org/Donald_Trump'),)
(rdflib.term.URIRef('http://example.org/God_Bless'),)
(rdflib.term.URIRef('http://example.org/Trumpmet'),)
(rdflib.term.URIRef('http://example.org/Vance'),)
(rdflib.term.URIRef('http://example.org/Clarin'),)
(rdflib.term.URIRef('http://example.org/Javier_Milei'),)
(rdflib.term.URIRef('http://example.org/Jorge_Bergoglio'),)
(rdflib.term.URIRef('http://example.org/Milei'),)
(rdflib.term.URIRef('http://example.org/Ahmedalso'),)
(rdflib.term.URIRef('http://example.org/Kenyan_PresidentWilliam_Rutoposted'),)
(rdflib.term.URIRef('http://example.org/PresidentCyril_Ramaphosasaid'),)
(rdflib.term.URIRef('http://example.org/Summit'),)
(rdflib.term.URIRef('http://example.org/Arthur'),)
(rdflib.term.URIRef('http://example.org/Black_Knight'),)
(rdflib.term.URIRef('http://example.org/Dennis'),)
(rdflib.term.URIRef('http://example.org/Graham_Chapman'),)
(rdflib.term.URIRef('http://example.org/John'),)
(rdflib.term.URIRef('http://example.org/Jorge_Mario_Bergoglio'),)
(rdflib.term.URIRef('http://example.org/Kilmar_Abrego_Garcia'),)
(rdflib.term.URIRef('http://example.org/Lisa_Maree_Williams'),)
(rdflib.term.URIRef('http://example.org/Patsy'),)
(rdflib.term.URIRef('http://example.org/Terry_Gilliam'),)
(rdflib.term.URIRef('http://example.org/Andrew_Medichini_Andrew'),)
(rdflib.term.URIRef('http://example.org/Bry_Jensen'),)
(rdflib.term.URIRef('http://example.org/Canon'),)
(rdflib.term.URIRef('http://example.org/Canon_Law'),)
(rdflib.term.URIRef('http://example.org/Conclaves'),)
(rdflib.term.URIRef('http://example.org/Gregg_Gassman'),)
(rdflib.term.URIRef('http://example.org/J_Mitchell'),)
(rdflib.term.URIRef('http://example.org/Jensen'),)
(rdflib.term.URIRef('http://example.org/Jorge'),)
(rdflib.term.URIRef('http://example.org/Kurt_Martens'),)
(rdflib.term.URIRef('http://example.org/Martens'),)
(rdflib.term.URIRef('http://example.org/Pope_Gregory_X_The'),)
(rdflib.term.URIRef('http://example.org/Sanctae_Marthae'),)
(rdflib.term.URIRef('http://example.org/Typically'),)
Triples with example.org URIs:
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/Easter_Sunday'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/the_G7_Leaders_Summit'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/preside'), rdflib.term.URIRef('http://example.org/Vatican_City'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/drive'), rdflib.term.URIRef('http://example.org/Vatican_City'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/choose'), rdflib.term.URIRef('http://example.org/March_2013'))
(rdflib.term.URIRef('http://example.org/Christians'), rdflib.term.URIRef('http://example.org/move'), rdflib.term.URIRef('http://example.org/Gaza'))
(rdflib.term.URIRef('http://example.org/John_Paul_II'), rdflib.term.URIRef('http://example.org/appoint'), rdflib.term.URIRef('http://example.org/2001'))
(rdflib.term.URIRef('http://example.org/the_People_of_God'), rdflib.term.URIRef('http://example.org/attend'), rdflib.term.URIRef('http://example.org/Monday'))
(rdflib.term.URIRef('http://example.org/Martin_Pendergast'), rdflib.term.URIRef('http://example.org/tell'), rdflib.term.URIRef('http://example.org/London'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/withLee_Yong_soo'))
(rdflib.term.URIRef('http://example.org/John_Paul_II'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/Easter'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/write'), rdflib.term.URIRef('http://example.org/Santa_Maria_Maggiore'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/announce'), rdflib.term.URIRef('http://example.org/Monday'))
(rdflib.term.URIRef('http://example.org/the_College_of_Cardinals'), rdflib.term.URIRef('http://example.org/lock'), rdflib.term.URIRef('http://example.org/Vatican'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/a_Canonization_Ceremony'))
(rdflib.term.URIRef('http://example.org/Donald_Trump'), rdflib.term.URIRef('http://example.org/praise'), rdflib.term.URIRef('http://example.org/2013'))
(rdflib.term.URIRef('http://example.org/Arthur'), rdflib.term.URIRef('http://example.org/encounter'), rdflib.term.URIRef('http://example.org/The_Black_Knight'))
(rdflib.term.URIRef('http://example.org/115'), rdflib.term.URIRef('http://example.org/vote'), rdflib.term.URIRef('http://example.org/April_2005at_age_84'))
(rdflib.term.URIRef('http://example.org/Donald_Trump'), rdflib.term.URIRef('http://example.org/clash'), rdflib.term.URIRef('http://example.org/recent_years'))
(rdflib.term.URIRef('http://example.org/Bianca_Lott'), rdflib.term.URIRef('http://example.org/study'), rdflib.term.URIRef('http://example.org/her_spring_semester'))
(rdflib.term.URIRef('http://example.org/Joseph_Ratzinger'), rdflib.term.URIRef('http://example.org/announce'), rdflib.term.URIRef('http://example.org/April_8'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/bear'), rdflib.term.URIRef('http://example.org/Jorge_Bergoglio'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/the_age_of_88_He'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/bear'), rdflib.term.URIRef('http://example.org/Dublin'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/criticize'), rdflib.term.URIRef('http://example.org/2018'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/write'), rdflib.term.URIRef('http://example.org/his2025'))
(rdflib.term.URIRef('http://example.org/Bry_Jensen'), rdflib.term.URIRef('http://example.org/tell'), rdflib.term.URIRef('http://example.org/NPR'))
(rdflib.term.URIRef('http://example.org/Stephen_P_Newton'), rdflib.term.URIRef('http://example.org/reflect'), rdflib.term.URIRef('http://example.org/NPR'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/88'))
(rdflib.term.URIRef('http://example.org/Giovanni_Battistaannounced'), rdflib.term.URIRef('http://example.org/battistaannounce'), rdflib.term.URIRef('http://example.org/tens_of_thousands'))
Articles mentioning 'Trump':
(rdflib.term.URIRef('https://www.npr.org/article/article_4'), rdflib.term.Literal('People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China\'s Guangdong province.Jade Gao/AFP via Getty Imageshide caption\nAs the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm its interests.\nChina\'s commerce ministry says it respects the efforts of others to try to resolve trade disputes with the U.S. through consultation. But it warns that it will take "corresponding countermeasures" if any deals are struck at the expense of China\'s interests. It did not give details.\nThe comments come after reports that Trump is hoping touse tariff negotiations with other countriesto isolate China. At the same time, Trump has said he wants to do a deal with Beijing. This month he raised the base tariff on Chinese imports to a dizzying 145%. China responded in kind, with high tariffs on U.S. goods.\nThe Chinese commerce ministry says seeking tariff exemptions by harming the interests of others will only lead to failure on both sides and ultimately hurt everyone.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_6'), rdflib.term.Literal('Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption\nPresident Trump has acknowledged the pope\'s death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"\nTrump and Francis clashed repeatedly in recent years.\nTrump praised the pope at the start of Francis\'s papacy, in 2013, several years before Trump reached the White House.\n"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.\nThings soured soon after. During the 2016 election, Francisroundly criticizedTrump\'s campaign proposal to build a wall on the U.S.-Mexico border.\n"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.\nTrump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person\'s faith is disgraceful."\n"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS\'s ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.\nTrumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.\nLook at their faces.pic.twitter.com/0t84cBX8bZ\nNearly a decade later, amidst the second Trump administration\'s crackdown on immigration, the pope once again made a rare public rebuke of the president\'s policies.\nIn apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."\nHe said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."\n"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.\nThe letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.\n"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_9'), rdflib.term.Literal('Good morning. You\'re reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day.\nPope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in over a millennium and was one of themost popular popes in decades. He was elected to his exalted post in 2013 and cast an image of humility during years of strain and change, within his church and worldwide.\nPope Francis waves to thousands of followers as he arrives at the Philippines\' Manila Cathedral on Jan. 16, 2015. During his papacy, Francis strove to reach out to what he called the "periphery" of the world in Asia, Africa and Latin America.Lisa Maree Williams/Getty Imageshide caption\nA big draw to Pope Francis was his personal story.He was the son of immigrants and grew up in Argentina, where he lived through turbulent times. Francis, born Jorge Mario Bergoglio, was the first pope from Latin America.\nFour House Democrats were scheduled to land in El Salvador today to demand the release and return of Kilmar Abrego Garcia, a Salvadorian citizen who lived in Maryland. He was deported to a Salvadoran prison due to an "administrative error," according to the Trump administration. The lawmakerssaid in a statementthey hope "to pressure" the White House "to abide by a Supreme Court order."\nThe State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.Mandel Ngan/APhide caption\nThe State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.\nNPR has learned that the Trump administration is substantiallyscaling back the State Department\'s annual reportson international human rights to remove critiques of abuses such as harsh prison conditions and government corruption. These reports are intended to guide congressional foreign aid and security assistance decisions. Moving forward, the State Department will no longer call out governments for restricting freedom of movement and peaceful assembly. Additionally, the reports will not condemn the detention of political prisoners without due process or the limitations placed on free and fair elections.\nnasal sprayDDurrich/iStockphoto/Getty Imageshide caption\nLiving Better is aspecial seriesabout what it takes to stay healthy in America.\nIt is not just in your head; seasonal allergies are getting worse every year. This is due to the warming from climate change, making the pollen season longer. Luckily, there are ways to keep the pollen from taking over your life. Here aretips from doctorson how to get relief:\nKing Arthur (Graham Chapman) and his servant Patsy (Terry Gilliam) encounter The Black Knight (John Cleese)Monty Python and the Holy Grail/Fathom Entertainmenthide caption\nThis newsletter was edited byYvonne Dennis.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
We try another which gets all article titles and their publication dates :
query_titles_dates = prepareQuery("""
SELECT ?article ?title ?date WHERE {
?article a <http://example.org/Article> ;
<http://example.org/title> ?title ;
<http://example.org/date> ?date .
}
""")
print("\nArticles with their titles and dates:")
for result in g.query(query_titles_dates):
print(result)
Articles with their titles and dates:
(rdflib.term.URIRef('https://www.npr.org/article/article_1'), rdflib.term.Literal('Do you have memories of Pope Francis to share? Send them our way', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T14:25:05-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_2'), rdflib.term.Literal('Pope Francis is remembered around the world for his generosity of spirit', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T14:05:36-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_3'), rdflib.term.Literal('What happens next after a pope dies, according to recent history', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T13:14:58-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_4'), rdflib.term.Literal("China warns of 'countermeasures' against any deals that harm its interests", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T11:08:30-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_5'), rdflib.term.Literal('Who is Cardinal Kevin Farrell, the acting head of the Vatican?', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T10:34:12-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_6'), rdflib.term.Literal("A brief history of Trump's feud with Pope Francis", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T09:13:28-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_7'), rdflib.term.Literal("Argentina's president, a former critic of Pope Francis, offers his condolences", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T08:32:37-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_8'), rdflib.term.Literal('Leaders in Africa mourn the passing of Pope Francis', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T08:07:03-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_9'), rdflib.term.Literal("Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T07:19:41-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_10'), rdflib.term.Literal("Who will be the next pope? Here's how the conclave works", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T06:38:53-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
And this one lists all unique types used in the graph :
query_types = prepareQuery("""
SELECT DISTINCT ?type WHERE {
?entity a ?type .
}
""")
print("\nAll types found in the graph:")
for result in g.query(query_types):
print(result)
All types found in the graph:
(rdflib.term.URIRef('http://example.org/Article'),)
(rdflib.term.URIRef('http://example.org/ORG'),)
(rdflib.term.URIRef('http://example.org/PER'),)
(rdflib.term.URIRef('http://example.org/LOC'),)
(rdflib.term.URIRef('http://example.org/MISC'),)
In the following function (spotlight_link), we send text to the DBpedia Spotlight API to annotate named entities and link them to their corresponding DBpedia URIs.
It handles both successful and failed requests
import requests
def spotlight_link(text, confidence=0.5, support=20):
url = "https://api.dbpedia-spotlight.org/en/annotate"
headers = {"Accept": "application/json"}
params = {
"text": text,
"confidence": confidence,
"support": support
}
try:
response = requests.get(url, headers=headers, params=params, verify=False) # SSL check skipped
if response.status_code == 200:
data = response.json()
if "Resources" in data:
return [(res["@surfaceForm"], res["@URI"]) for res in data["Resources"]]
else:
return []
else:
print("HTTP error:", response.status_code)
return []
except requests.exceptions.SSLError as e:
print("SSL error:", e)
return []
Example usage :
text = "Pope Francis met with Elon Musk and Donald Trump in Vatican City."
linked_entities = spotlight_link(text)
print("Linked Entities:")
for label, uri in linked_entities:
print(f"{label} -> {uri}")
Linked Entities: Pope Francis -> http://dbpedia.org/resource/Pope_Francis Elon Musk -> http://dbpedia.org/resource/Elon_Musk Donald Trump -> http://dbpedia.org/resource/Donald_Trump Vatican City -> http://dbpedia.org/resource/Vatican_City
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\urllib3\connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'api.dbpedia-spotlight.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings warnings.warn(
It enables better integration with external knowledge bases.
To visually explore the RDF graph structure, we implemented a custom visualization function using NetworkX and Matplotlib, which colors nodes by type and displays clean labels.
import random
def clean_label(uri):
"""Clean URI to extract a readable label."""
if isinstance(uri, str):
if 'rdf-syntax-ns#type' in uri:
return 'type'
return uri.split('/')[-1] # Get last part of URI
return str(uri)
def get_node_type(triples, node):
"""Return the rdf:type of a node if available."""
for s, p, o in triples:
if str(s) == node and 'rdf-syntax-ns#type' in str(p):
return o.split('/')[-1]
return None
def visualize_clean_rdf_graph(g, max_edges=50):
G = nx.DiGraph()
triples = list(g)
# Filter out long literals
filtered_triples = [
(s, p, o) for s, p, o in triples
if not (isinstance(o, str) and len(o) > 80)
]
sampled_triples = random.sample(filtered_triples, min(len(filtered_triples), max_edges))
node_types = {}
for s, p, o in triples:
if 'rdf-syntax-ns#type' in str(p):
node_types[str(s)] = o.split('/')[-1]
for s, p, o in sampled_triples:
G.add_node(str(s))
G.add_node(str(o))
G.add_edge(str(s), str(o), label=clean_label(p))
edge_labels = nx.get_edge_attributes(G, 'label')
pos = nx.spring_layout(G, k=0.5, iterations=25)
# Assign node colors based on type
node_colors = []
color_map = {
"PER": "lightcoral",
"LOC": "skyblue",
"ORG": "orange",
"Article": "yellowgreen",
"MISC": "plum"
}
for node in G.nodes():
node_type = node_types.get(node, None)
color = color_map.get(node_type, 'lightgray')
node_colors.append(color)
# Clean node labels for display
labels = {node: clean_label(node) for node in G.nodes()}
plt.figure(figsize=(16, 12))
nx.draw(G, pos, labels=labels, node_color=node_colors, node_size=2500, font_size=9, arrows=True)
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color='green', font_size=8)
plt.title("RDF Graph View (Grouped by Type, Clean Labels)")
plt.axis("off")
plt.show()
visualize_clean_rdf_graph(g, max_edges=60)
The above RDF knowledge graph visualizes the relationships between entities extracted from a collection of NPR news articles qs stated previously.
Each node represents a named entity (people, locations, organizations, articles), and edges indicate semantic relationships such as type classification, publication date, or involvement in events.
The nodes are color-coded based on entity types ; persons (red), locations (blue), organizations (orange), articles (green), and miscellaneous entities (purple).
!pip install pykeen torch
Requirement already satisfied: pykeen in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (1.10.1) Requirement already satisfied: torch in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.0.1) Requirement already satisfied: dataclasses-json in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.6.7) Requirement already satisfied: numpy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.24.3) Requirement already satisfied: scipy>=1.7.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.11.1) Requirement already satisfied: click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (8.0.4) Requirement already satisfied: click-default-group in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.2.4) Requirement already satisfied: scikit-learn in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.3.0) Requirement already satisfied: tqdm in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.65.0) Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (2.31.0) Requirement already satisfied: optuna>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.3.0) Requirement already satisfied: pandas>=1.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (2.0.3) Requirement already satisfied: tabulate in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.8.10) Requirement already satisfied: more-click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.2) Requirement already satisfied: more-itertools in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (8.12.0) Requirement already satisfied: pystow>=0.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.7.0) Requirement already satisfied: docdata in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.0.5) Requirement already satisfied: class-resolver>=0.3.10 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.3.10) Requirement already satisfied: pyyaml in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (6.0) Requirement already satisfied: rexmex>=0.1.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.3) Requirement already satisfied: torch-max-mem>=0.0.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.4) Requirement already satisfied: torch-ppr>=0.0.7 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.0.8) Requirement already satisfied: protobuf<4.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (3.20.3) Requirement already satisfied: typing-extensions in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.13.2) Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.9.0) Requirement already satisfied: sympy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (1.13.1) Requirement already satisfied: networkx in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.1) Requirement already satisfied: jinja2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.1.2) Requirement already satisfied: alembic>=1.5.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (1.14.1) Requirement already satisfied: colorlog in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (6.9.0) Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (23.1) Requirement already satisfied: sqlalchemy>=1.4.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (1.4.39) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2023.3) Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from click->pykeen) (0.4.6) Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from dataclasses-json->pykeen) (3.26.1) Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from dataclasses-json->pykeen) (0.9.0) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from jinja2->torch) (2.1.1) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (2023.7.22) Requirement already satisfied: joblib>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn->pykeen) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn->pykeen) (2.2.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sympy->torch) (1.3.0) Requirement already satisfied: Mako in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from alembic>=1.5.0->optuna>=2.0.0->pykeen) (1.3.10) Requirement already satisfied: six>=1.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->pykeen) (1.16.0) Requirement already satisfied: greenlet!=0.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sqlalchemy>=1.4.2->optuna>=2.0.0->pykeen) (2.0.1) Requirement already satisfied: mypy-extensions>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json->pykeen) (1.0.0)
First, we extract RDF Triples from the constructed graph g into a list of labeled triples in (subject, predicate, object) format to prepare them for embedding.
triples = [(str(s), str(p), str(o)) for s, p, o in g]
We convert the list of RDF triples into a NumPy array, build a PyKEEN TriplesFactory from the labeled triples, and print the total number of triples along with sample entries for verification.
import numpy as np
from pykeen.triples import TriplesFactory
triples = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples, dtype=str)
# Create PyKEEN TriplesFactory
tf = TriplesFactory.from_labeled_triples(triples_array)
# printing to confirm
print(f"Total triples: {len(triples_array)}")
print("Sample triples:", triples_array[:5])
Total triples: 367 Sample triples: [['http://example.org/John_Paul_II' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' 'http://example.org/PER'] ['https://www.npr.org/article/article_6' 'http://example.org/content' 'Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption\nPresident Trump has acknowledged the pope\'s death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"\nTrump and Francis clashed repeatedly in recent years.\nTrump praised the pope at the start of Francis\'s papacy, in 2013, several years before Trump reached the White House.\n"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.\nThings soured soon after. During the 2016 election, Francisroundly criticizedTrump\'s campaign proposal to build a wall on the U.S.-Mexico border.\n"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.\nTrump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person\'s faith is disgraceful."\n"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS\'s ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.\nTrumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.\nLook at their faces.pic.twitter.com/0t84cBX8bZ\nNearly a decade later, amidst the second Trump administration\'s crackdown on immigration, the pope once again made a rare public rebuke of the president\'s policies.\nIn apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."\nHe said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."\n"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.\nThe letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.\n"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote.'] ['http://example.org/Stephen_P_Newton' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' 'http://example.org/PER'] ['http://example.org/Iraq' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' 'http://example.org/LOC'] ['http://example.org/Vatican_Media' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' 'http://example.org/ORG']]
We move on to the split
We split the full RDF triple set into 80% training, 10% validation, and 10% testing subsets using train_test_split, then create corresponding TriplesFactory objects for each, ensuring reproducibility with a fixed random seed.
To finally display the number of triples in each split.
from pykeen.triples import TriplesFactory
from sklearn.model_selection import train_test_split
# converting RDF triples to list of strings
triples_list = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples_list)
# First split: 80% train, 20% temp (val + test)
train_triples, temp_triples = train_test_split(
triples_array, test_size=0.2, random_state=42
)
# Second split: 50/50 of remaining → 10% val, 10% test
val_triples, test_triples = train_test_split(
temp_triples, test_size=0.5, random_state=42
)
# Create TriplesFactories
training = TriplesFactory.from_labeled_triples(train_triples)
validation = TriplesFactory.from_labeled_triples(val_triples)
testing = TriplesFactory.from_labeled_triples(test_triples)
# Check the counts
print(f"Training: {training.num_triples}")
print(f"Validation: {validation.num_triples}")
print(f"Testing: {testing.num_triples}")
Training: 293 Validation: 37 Testing: 37
We tested out a cleaner approqch because it groups all training-related imports and prepares everything needed for training a TransE model: imports, RDF-to-array conversion, manual triple splitting, and creation of train/val/test factories.
from pykeen.triples import TriplesFactory
from pykeen.models import TransE
from pykeen.training import SLCWATrainingLoop
from pykeen.evaluation import RankBasedEvaluator
from sklearn.model_selection import train_test_split
# Assume you already have your RDFLib graph `g`
# Convert RDF graph to PyKEEN triples
triples = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples, dtype=str)
# Build TriplesFactory
tf = TriplesFactory.from_labeled_triples(triples_array)
# Manual split
from sklearn.model_selection import train_test_split
train_triples, temp = train_test_split(triples_array, test_size=0.2, random_state=42)
valid_triples, test_triples = train_test_split(temp, test_size=0.5, random_state=42)
training = TriplesFactory.from_labeled_triples(train_triples)
validation = TriplesFactory.from_labeled_triples(valid_triples)
testing = TriplesFactory.from_labeled_triples(test_triples)
model = TransE(triples_factory=training)
No random seed is specified. This may lead to non-reproducible results.
The following code defines and runs the training loop for the TransE model using the SLCWATrainingLoop for 100 epochs.
# Instantiate training loop with triples_factory
training_loop = SLCWATrainingLoop(
model=model,
triples_factory=training # required at init
)
# Run training (triples_factory required again here)
training_loop_result = training_loop.train(
triples_factory=training, # also needed here
num_epochs=100,
batch_size=128
)
Training epochs on cpu: 0%| | 0/100 [00:00<?, ?epoch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/3 [00:00<?, ?batch/s]
We then evaluate the trained TransE model using RankBasedEvaluator and display the mean reciprocal rank (MRR) and Hits@K metrics.
from pykeen.evaluation import RankBasedEvaluator
evaluator = RankBasedEvaluator()
results = evaluator.evaluate(
model=model,
mapped_triples=testing.mapped_triples,
additional_filter_triples=[
training.mapped_triples,
validation.mapped_triples
]
)
# View key metrics
results.get_metric("mean_reciprocal_rank")
results.get_metric("hits_at_k") # returns hits@1, hits@3, hits@10
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
0.02702702702702703
The low Mean Reciprocal Rank (MRR ≈ 0.027) observed in the evaluation is due to the limitations of the data and model.
The dataset contains only 367 triples, which is very small for training knowledge graph embedding models like TransE.
Additionally, TransE is a simple model that may struggle with complex relationships or long literal texts found in article content.
from sklearn.manifold import TSNE
# Extract learned entity embeddings
entity_embeddings = model.entity_representations[0]().detach().cpu().numpy()
# Reduce to 2D
tsne = TSNE(n_components=2, random_state=42)
entity_2d = tsne.fit_transform(entity_embeddings)
# Plot
plt.figure(figsize=(8,6))
plt.scatter(entity_2d[:, 0], entity_2d[:, 1], alpha=0.6)
plt.title("t-SNE of Entity Embeddings")
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.show()
We move on to train and evaluate multiple knowledge graph embedding models (TransE, DistMult, ComplEx) using PyKEEN’s pipeline, and store their results for comparison.
from pykeen.pipeline import pipeline
models = ['TransE', 'DistMult', 'ComplEx']
model_results = {}
for model_name in models:
print(f"Training {model_name}...")
result = pipeline(
training=training,
validation=validation,
testing=testing,
model=model_name,
model_kwargs=dict(embedding_dim=50), # <- put it here
training_kwargs=dict(batch_size=32),
epochs=100,
random_seed=42,
)
model_results[model_name] = result
No cuda devices were available. The model runs on CPU
Training TransE...
Training epochs on cpu: 0%| | 0/100 [00:00<?, ?epoch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.05s seconds WARNING:pykeen.utils:No cuda devices were available. The model runs on CPU INFO:pykeen.pipeline.api:Using device: None
Training DistMult...
Training epochs on cpu: 0%| | 0/100 [00:00<?, ?epoch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.04s seconds WARNING:pykeen.utils:No cuda devices were available. The model runs on CPU INFO:pykeen.pipeline.api:Using device: None
Training ComplEx...
Training epochs on cpu: 0%| | 0/100 [00:00<?, ?epoch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
Training batches on cpu: 0%| | 0/10 [00:00<?, ?batch/s]
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.03s seconds
As said previously, we manually evaluate each trained model using RankBasedEvaluator to extract detailed performance metrics and store them for analysis.
from pykeen.evaluation import RankBasedEvaluator
import pandas as pd
model_results_summary = {}
for model_name, result in model_results.items():
print(f"=== Evaluation Results for {model_name} ===")
evaluator = RankBasedEvaluator()
metrics = evaluator.evaluate(
model=result.model,
mapped_triples=testing.mapped_triples,
additional_filter_triples=[
training.mapped_triples,
validation.mapped_triples,
]
)
# Store the results as dictionary
model_results_summary[model_name] = metrics.to_dict()
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
=== Evaluation Results for TransE ===
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.04s seconds INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
=== Evaluation Results for DistMult ===
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.03s seconds INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value. INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.
=== Evaluation Results for ComplEx ===
Evaluating on cpu: 0%| | 0.00/37.0 [00:00<?, ?triple/s]
INFO:pykeen.evaluation.evaluator:Evaluation took 0.03s seconds
# create DataFrame for comparison
summary_df = pd.DataFrame(model_results_summary)
summary_df = summary_df.round(4) # Optional: round for clarity
TransE \
head {'optimistic': {'inverse_median_rank': 0.00561...
tail {'optimistic': {'inverse_median_rank': 0.00609...
both {'optimistic': {'inverse_median_rank': 0.00595...
DistMult \
head {'optimistic': {'inverse_median_rank': 0.00892...
tail {'optimistic': {'inverse_median_rank': 0.00847...
both {'optimistic': {'inverse_median_rank': 0.00877...
ComplEx
head {'optimistic': {'inverse_median_rank': 0.01219...
tail {'optimistic': {'inverse_median_rank': 0.01351...
both {'optimistic': {'inverse_median_rank': 0.01298...
We define a helper function to extract core link prediction metrics (MRR, Hits@k) from PyKEEN's nested output :
def extract_core_metrics(metrics_dict):
"""Extracts core metrics for comparison from nested PyKEEN metric dicts."""
try:
# "both" = filtered, optimistic evaluation; common in link prediction
both = metrics_dict['both']['optimistic']
return {
'MRR': both['mean_reciprocal_rank'],
'Hits@1': both['hits_at_1'],
'Hits@3': both['hits_at_3'],
'Hits@10': both['hits_at_10']
}
except Exception as e:
print("Error extracting metrics:", e)
return {}
# Apply extraction for all results
flat_results = {}
for name, d in model_results_summary.items():
flat_results[name] = extract_core_metrics(d)
df_metrics = pd.DataFrame(flat_results).T # .T to make models as rows
Error extracting metrics: 'mean_reciprocal_rank' Error extracting metrics: 'mean_reciprocal_rank' Error extracting metrics: 'mean_reciprocal_rank' Empty DataFrame Columns: [] Index: [TransE, DistMult, ComplEx]
import json
# print the full nested metrics for one model
print(json.dumps(model_results_summary['TransE'], indent=2))
{
"head": {
"optimistic": {
"inverse_median_rank": 0.0056179775280898875,
"adjusted_arithmetic_mean_rank": 1.1498720257844344,
"inverse_harmonic_mean_rank": 0.010257537252455581,
"geometric_mean_rank": 136.4341549450835,
"median_absolute_deviation": 94.88654198435853,
"z_arithmetic_mean_rank": -1.5843886702161658,
"arithmetic_mean_rank": 163.9189189189189,
"z_geometric_mean_rank": -1.7509208414395,
"z_inverse_harmonic_mean_rank": -0.9751401979423034,
"adjusted_inverse_harmonic_mean_rank": -0.011930175509493494,
"adjusted_arithmetic_mean_rank_index": -0.15093078758949852,
"count": 37.0,
"median_rank": 178.0,
"standard_deviation": 77.98592332382236,
"inverse_geometric_mean_rank": 0.00732954296087012,
"inverse_arithmetic_mean_rank": 0.006100577081615829,
"variance": 6081.804236669101,
"harmonic_mean_rank": 97.4892876709375,
"adjusted_geometric_mean_rank_index": -0.2751952661870738,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1619626806801366,
"adjusted_hits_at_k": -0.03649043999044609
},
"realistic": {
"inverse_median_rank": 0.00561797758564353,
"adjusted_arithmetic_mean_rank": 1.1498719968550781,
"inverse_harmonic_mean_rank": 0.010257537476718426,
"geometric_mean_rank": 136.43414306640625,
"median_absolute_deviation": 94.88654198435853,
"z_arithmetic_mean_rank": -1.5843883643862817,
"arithmetic_mean_rank": 163.91891479492188,
"z_geometric_mean_rank": -1.7509201298293657,
"z_inverse_harmonic_mean_rank": -0.9751401792007366,
"adjusted_inverse_harmonic_mean_rank": -0.0119301752802032,
"adjusted_arithmetic_mean_rank_index": -0.15093075845577264,
"count": 37.0,
"median_rank": 178.0,
"standard_deviation": 77.98592376708984,
"inverse_geometric_mean_rank": 0.007329543586820364,
"inverse_arithmetic_mean_rank": 0.006100577302277088,
"variance": 6081.8046875,
"harmonic_mean_rank": 97.48928553950732,
"adjusted_geometric_mean_rank_index": -0.2751951543420743,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1619626806801366,
"adjusted_hits_at_k": -0.03649043999044609
},
"pessimistic": {
"inverse_median_rank": 0.0056179775280898875,
"adjusted_arithmetic_mean_rank": 1.1498720257844344,
"inverse_harmonic_mean_rank": 0.010257537252455581,
"geometric_mean_rank": 136.4341549450835,
"median_absolute_deviation": 94.88654198435853,
"z_arithmetic_mean_rank": -1.5843886702161658,
"arithmetic_mean_rank": 163.9189189189189,
"z_geometric_mean_rank": -1.7509208414395,
"z_inverse_harmonic_mean_rank": -0.9751401979423034,
"adjusted_inverse_harmonic_mean_rank": -0.011930175509493494,
"adjusted_arithmetic_mean_rank_index": -0.15093078758949852,
"count": 37.0,
"median_rank": 178.0,
"standard_deviation": 77.98592332382236,
"inverse_geometric_mean_rank": 0.00732954296087012,
"inverse_arithmetic_mean_rank": 0.006100577081615829,
"variance": 6081.804236669101,
"harmonic_mean_rank": 97.4892876709375,
"adjusted_geometric_mean_rank_index": -0.2751952661870738,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1619626806801366,
"adjusted_hits_at_k": -0.03649043999044609
}
},
"tail": {
"optimistic": {
"inverse_median_rank": 0.006097560975609756,
"adjusted_arithmetic_mean_rank": 1.1637410606482772,
"inverse_harmonic_mean_rank": 0.008433153844356594,
"geometric_mean_rank": 146.27384194941163,
"median_absolute_deviation": 109.71256416941455,
"z_arithmetic_mean_rank": -1.7310780985509666,
"arithmetic_mean_rank": 169.32432432432432,
"z_geometric_mean_rank": -2.1621542374267344,
"z_inverse_harmonic_mean_rank": -1.1066681704329815,
"adjusted_inverse_harmonic_mean_rank": -0.013403310310512798,
"adjusted_arithmetic_mean_rank_index": -0.16487421677733094,
"count": 37.0,
"median_rank": 164.0,
"standard_deviation": 78.62502663740504,
"inverse_geometric_mean_rank": 0.006836492339798164,
"inverse_arithmetic_mean_rank": 0.005905826017557861,
"variance": 6181.8948137326515,
"harmonic_mean_rank": 118.57959886136702,
"adjusted_geometric_mean_rank_index": -0.339942517617952,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1495340671022203,
"adjusted_hits_at_k": -0.03571428571428572
},
"realistic": {
"inverse_median_rank": 0.006097560748457909,
"adjusted_arithmetic_mean_rank": 1.1637410691513639,
"inverse_harmonic_mean_rank": 0.008433152921497822,
"geometric_mean_rank": 146.27381896972656,
"median_absolute_deviation": 109.71256416941455,
"z_arithmetic_mean_rank": -1.7310781884459931,
"arithmetic_mean_rank": 169.32432556152344,
"z_geometric_mean_rank": -2.1621528893208533,
"z_inverse_harmonic_mean_rank": -1.106668248308493,
"adjusted_inverse_harmonic_mean_rank": -0.013403311253694932,
"adjusted_arithmetic_mean_rank_index": -0.16487422533926255,
"count": 37.0,
"median_rank": 164.0,
"standard_deviation": 78.62503051757812,
"inverse_geometric_mean_rank": 0.0068364934995770454,
"inverse_arithmetic_mean_rank": 0.005905826110392809,
"variance": 6181.8955078125,
"harmonic_mean_rank": 118.57961183779754,
"adjusted_geometric_mean_rank_index": -0.3399423056633659,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1495340671022203,
"adjusted_hits_at_k": -0.03571428571428572
},
"pessimistic": {
"inverse_median_rank": 0.006097560975609756,
"adjusted_arithmetic_mean_rank": 1.1637410606482772,
"inverse_harmonic_mean_rank": 0.008433153844356594,
"geometric_mean_rank": 146.27384194941163,
"median_absolute_deviation": 109.71256416941455,
"z_arithmetic_mean_rank": -1.7310780985509666,
"arithmetic_mean_rank": 169.32432432432432,
"z_geometric_mean_rank": -2.1621542374267344,
"z_inverse_harmonic_mean_rank": -1.1066681704329815,
"adjusted_inverse_harmonic_mean_rank": -0.013403310310512798,
"adjusted_arithmetic_mean_rank_index": -0.16487421677733094,
"count": 37.0,
"median_rank": 164.0,
"standard_deviation": 78.62502663740504,
"inverse_geometric_mean_rank": 0.006836492339798164,
"inverse_arithmetic_mean_rank": 0.005905826017557861,
"variance": 6181.8948137326515,
"harmonic_mean_rank": 118.57959886136702,
"adjusted_geometric_mean_rank_index": -0.339942517617952,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.1495340671022203,
"adjusted_hits_at_k": -0.03571428571428572
}
},
"both": {
"optimistic": {
"inverse_median_rank": 0.005952380952380952,
"adjusted_arithmetic_mean_rank": 1.1568774629386376,
"inverse_harmonic_mean_rank": 0.009345345548406088,
"geometric_mean_rank": 141.26835461963404,
"median_absolute_deviation": 106.74735973240334,
"z_arithmetic_mean_rank": -2.345325544113173,
"arithmetic_mean_rank": 166.6216216216216,
"z_geometric_mean_rank": -2.8134040954014115,
"z_inverse_harmonic_mean_rank": -1.4715919275655787,
"adjusted_inverse_harmonic_mean_rank": -0.012666885393668326,
"adjusted_arithmetic_mean_rank_index": -0.1579743008314436,
"count": 74.0,
"median_rank": 168.0,
"standard_deviation": 78.35275443211982,
"inverse_geometric_mean_rank": 0.0070787261782195065,
"inverse_arithmetic_mean_rank": 0.006001622060016221,
"variance": 6139.154127100072,
"harmonic_mean_rank": 107.005139063109,
"adjusted_geometric_mean_rank_index": -0.31527984840562095,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.634498534649764,
"adjusted_hits_at_k": -0.03610221749620178
},
"realistic": {
"inverse_median_rank": 0.0059523810632526875,
"adjusted_arithmetic_mean_rank": 1.156877505888879,
"inverse_harmonic_mean_rank": 0.009345345199108124,
"geometric_mean_rank": 141.26834106445312,
"median_absolute_deviation": 106.74735973240334,
"z_arithmetic_mean_rank": -2.345326186221328,
"arithmetic_mean_rank": 166.6216278076172,
"z_geometric_mean_rank": -2.8134029611751874,
"z_inverse_harmonic_mean_rank": -1.4715919690474113,
"adjusted_inverse_harmonic_mean_rank": -0.01266688575072765,
"adjusted_arithmetic_mean_rank_index": -0.1579743440819794,
"count": 74.0,
"median_rank": 168.0,
"standard_deviation": 78.3527603149414,
"inverse_geometric_mean_rank": 0.007078726775944233,
"inverse_arithmetic_mean_rank": 0.006001621950417757,
"variance": 6139.15478515625,
"harmonic_mean_rank": 107.00514306260568,
"adjusted_geometric_mean_rank_index": -0.31527972130028514,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.634498534649764,
"adjusted_hits_at_k": -0.03610221749620178
},
"pessimistic": {
"inverse_median_rank": 0.005952380952380952,
"adjusted_arithmetic_mean_rank": 1.1568774629386376,
"inverse_harmonic_mean_rank": 0.009345345548406088,
"geometric_mean_rank": 141.26835461963404,
"median_absolute_deviation": 106.74735973240334,
"z_arithmetic_mean_rank": -2.345325544113173,
"arithmetic_mean_rank": 166.6216216216216,
"z_geometric_mean_rank": -2.8134040954014115,
"z_inverse_harmonic_mean_rank": -1.4715919275655787,
"adjusted_inverse_harmonic_mean_rank": -0.012666885393668326,
"adjusted_arithmetic_mean_rank_index": -0.1579743008314436,
"count": 74.0,
"median_rank": 168.0,
"standard_deviation": 78.35275443211982,
"inverse_geometric_mean_rank": 0.0070787261782195065,
"inverse_arithmetic_mean_rank": 0.006001622060016221,
"variance": 6139.154127100072,
"harmonic_mean_rank": 107.005139063109,
"adjusted_geometric_mean_rank_index": -0.31527984840562095,
"hits_at_1": 0.0,
"hits_at_3": 0.0,
"hits_at_5": 0.0,
"hits_at_10": 0.0,
"z_hits_at_k": -1.634498534649764,
"adjusted_hits_at_k": -0.03610221749620178
}
}
}
We then e xtractcore evaluation metrics (MRR, Hits@k, mean ranks) from model results, flatten them, and display a transposed DataFrame for easy model comparison :
import pandas as pd
# Define core metrics to extract
core_metrics = [
"mean_reciprocal_rank",
"arithmetic_mean_rank",
"harmonic_mean_rank",
"geometric_mean_rank",
"hits_at_1",
"hits_at_3",
"hits_at_10"
]
# Extract metrics safely from 'both → optimistic'
def extract_metrics(summary, metrics=core_metrics):
extracted = {}
optimistic = summary.get("both", {}).get("optimistic", {})
for metric in metrics:
value = optimistic.get(metric)
if value is not None:
extracted[metric] = value
return extracted
# Flatten all model metrics
flat_results = {
name: extract_metrics(summary)
for name, summary in model_results_summary.items()
}
# Build and display DataFrame
df_metrics = pd.DataFrame(flat_results).T.round(4)
print(df_metrics)
arithmetic_mean_rank harmonic_mean_rank geometric_mean_rank \
TransE 166.6216 107.0051 141.2684
DistMult 119.8378 14.4287 66.3157
ComplEx 106.8649 15.6083 63.6594
hits_at_1 hits_at_3 hits_at_10
TransE 0.0000 0.0000 0.0000
DistMult 0.0405 0.0676 0.0946
ComplEx 0.0405 0.0405 0.0811
ComplEx achieves the best ranking performance overall, with the lowest mean ranks (arithmetic: 106.86, harmonic: 15.61) and relatively high Hits@k scores.
DistMult performs moderately well, slightly behind ComplEx.
TransE performs poorly in link prediction, with zero Hits@k scores and significantly higher mean ranks, indicating weaker predictive ability.
# Split metrics into two categories for clarity
rank_metrics = ['arithmetic_mean_rank', 'harmonic_mean_rank', 'geometric_mean_rank']
hits_metrics = ['hits_at_1', 'hits_at_3', 'hits_at_10']
# Plot Rank Metrics (lower is better)
df_metrics[rank_metrics].plot(kind='bar', figsize=(10, 5), title='Model Comparison - Rank Metrics')
plt.ylabel('Rank (lower is better)')
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
# Plot Hits@K Metrics (higher is better)
df_metrics[hits_metrics].plot(kind='bar', figsize=(10, 5), title='Model Comparison - Hits@K Metrics')
plt.ylabel('Score (higher is better)')
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
The visual comparison of knowledge graph embedding models shows that ComplEx consistently achieves the best performance across most rank-based and hits@k metrics. It has the lowest harmonic and geometric mean ranks (indicating better ranking of true triples) and competitive hits@k scores, particularly hits@10.
DistMult also performs well in hits@k metrics, slightly outperforming ComplEx on hits@10 but with slightly higher rank values.
TransE, on the other hand, performs significantly worse across all metrics, suggesting it is less suited for the underlying graph structure or complexity of the data.
from sklearn.manifold import TSNE
for model_name, result in model_results.items():
model = result.model
# Get entity embeddings and move to CPU
entity_embeddings = model.entity_representations[0]().detach().cpu().numpy()
# Fix for complex embeddings (e.g. ComplEx)
if np.iscomplexobj(entity_embeddings):
print(f" {model_name} embeddings are complex — using real part only")
entity_embeddings = entity_embeddings.real
# Reduce to 2D with t-SNE
tsne = TSNE(n_components=2, random_state=42)
entity_2d = tsne.fit_transform(entity_embeddings)
# Plot
plt.figure(figsize=(8, 6))
plt.scatter(entity_2d[:, 0], entity_2d[:, 1], alpha=0.5)
plt.title(f"{model_name} - t-SNE of Entity Embeddings")
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.grid(True)
plt.show()
⚠️ ComplEx embeddings are complex — using real part only
The t-SNE visualizations of the entity embeddings highlight key differences in how each model structures the latent space.
TransE shows a scattered distribution with no clear clusters, indicating it may struggle to capture strong relational patterns.
DistMult shows distinct and compact clusters, suggesting it successfully learns meaningful groupings of related entities—an observation that aligns with its higher performance in rank and Hits@K metrics.
ComplEx, while slightly more structured than TransE, still displays a relatively dispersed embedding space, likely due to the reduced expressiveness when using only the real part of complex embeddings.
Overall, DistMult appears to produce the most organized embedding space among the three.
Now, we move on to the entity similarity part, we define and perform a similarity search using the entity embeddings learned by the ComplEx model.
We start by defining the find_similar_entities function, which computes cosine similarity between a given entity and all others.
Next, we retrieve the entity-to-ID mappings from the training data and select a target entity (http://example.org/LOC). We then use the function to find the top 5 most similar entities to this target and display the results.
This allows us to explore which entities are positioned closely in the embedding space, indicating similarity.
from sklearn.metrics.pairwise import cosine_similarity
# 1. Extract real or magnitude embeddings (to handle complex numbers in ComplEx)
entity_embeddings = model_results['ComplEx'].model.entity_representations[0]().detach().cpu().numpy().real
# 2. Define similarity function
def find_similar_entities(entity_id, entity_embeddings, top_k=5):
entity_vector = entity_embeddings[entity_id].reshape(1, -1)
similarities = cosine_similarity(entity_vector, entity_embeddings)
most_similar = np.argsort(similarities[0])[-top_k-1:-1][::-1] # exclude the entity itself
return most_similar
# 3. Build ID mappings
tf = model_results['ComplEx'].training
entity_to_id = tf.entity_to_id
id_to_entity = {v: k for k, v in entity_to_id.items()}
# 4. Choose a target entity
target = "http://example.org/LOC"
target_id = entity_to_id[target]
# 5. Find similar entities
similar_ids = find_similar_entities(target_id, entity_embeddings)
# 6. Display results
print(f"Entities similar to {target}:")
for i in similar_ids:
print("-", id_to_entity[i])
Entities similar to http://example.org/LOC: - http://example.org/Vatican_City - http://example.org/Universal_Shepherd - http://example.org/Pope - http://example.org/his2025 - http://example.org/Mass
Now we can uses cosine similarity to retrieve and display the top entities most similar to a target entity based on the learned embeddings from each model, allowing us to compare how each model captures semantic neighborhoods :
# Function to normalize embeddings (real part or magnitude if complex)
def preprocess_embeddings(model_name, model):
# Extract embeddings
raw_embeddings = model.entity_representations[0]().detach().cpu().numpy()
if model_name == 'ComplEx':
# ComplEx embeddings are complex-valued
# Option 1: Use real part
return raw_embeddings.real
# Option 2 (alternative): return np.abs(raw_embeddings)
else:
return raw_embeddings
# Similarity function using cosine similarity
def find_similar_entities(entity_id, entity_embeddings, top_k=5):
entity_vector = entity_embeddings[entity_id].reshape(1, -1)
similarities = cosine_similarity(entity_vector, entity_embeddings)
most_similar = np.argsort(similarities[0])[-top_k-1:-1][::-1] # exclude self
return most_similar
# Choose the target entity
target_entity = "http://example.org/LOC"
# Iterate over all 3 models
for model_name in ['TransE', 'DistMult', 'ComplEx']:
print(f"\n Similar entities in model: {model_name}")
# Get model and training triples factory
model = model_results[model_name].model
tf = model_results[model_name].training
# Preprocess embeddings
entity_embeddings = preprocess_embeddings(model_name, model)
# ID mappings
entity_to_id = tf.entity_to_id
id_to_entity = {v: k for k, v in entity_to_id.items()}
# Get target ID and compute similarity
target_id = entity_to_id[target_entity]
similar_ids = find_similar_entities(target_id, entity_embeddings)
# Show top similar entities
print(f"Entities similar to {target_entity} using {model_name}:")
for i in similar_ids:
print("-", id_to_entity[i])
🔍 Similar entities in model: TransE Entities similar to http://example.org/LOC using TransE: - http://example.org/Guangzhou - http://example.org/U_S - http://example.org/Latinos - http://example.org/likePapabili - http://example.org/Rome 🔍 Similar entities in model: DistMult Entities similar to http://example.org/LOC using DistMult: - https://www.npr.org/article/article_2 - http://example.org/John_Paul_II - http://example.org/Pope_Francis - http://example.org/Eugenio_Pacelli - http://example.org/Easter 🔍 Similar entities in model: ComplEx Entities similar to http://example.org/LOC using ComplEx: - http://example.org/Vatican_City - http://example.org/Universal_Shepherd - http://example.org/Pope - http://example.org/his2025 - http://example.org/Mass
We implemented a reusable function plot_entity_embeddings() that performs t-SNE dimensionality reduction and visualizes the entity embeddings for a selected model. This function handles embedding preprocessing, random sampling, and annotation to avoid overcrowding the plot.
Then, using a simple loop, we apply this visualization to all models—allowing an easy side-by-side comparison of the spatial distribution learned by each model.
def plot_entity_embeddings(model_name, model_results, sample_size=50):
model = model_results[model_name].model
tf = model_results[model_name].training
entity_embeddings = preprocess_embeddings(model_name, model)
# Reverse map: id -> label
id_to_entity = {v: k for k, v in tf.entity_to_id.items()}
# Dimensionality reduction
tsne = TSNE(n_components=2, random_state=42)
embeddings_2d = tsne.fit_transform(entity_embeddings)
# Choose sample entities to show
sample_indices = np.random.choice(len(embeddings_2d), size=min(sample_size, len(embeddings_2d)), replace=False)
# Plot
plt.figure(figsize=(12, 8))
for idx in sample_indices:
x, y = embeddings_2d[idx]
label = id_to_entity[idx].split("/")[-1][:20] # shorten long URIs
plt.scatter(x, y, alpha=0.6)
plt.annotate(label, (x, y), fontsize=8, alpha=0.75)
plt.title(f"{model_name} - Entity Embeddings (t-SNE)")
plt.grid(True)
plt.tight_layout()
plt.show()
for name in ['TransE', 'DistMult', 'ComplEx']:
print(f"\n Visualizing embeddings for model: {name}")
plot_entity_embeddings(name, model_results)
🔍 Visualizing embeddings for model: TransE
🔍 Visualizing embeddings for model: DistMult
🔍 Visualizing embeddings for model: ComplEx
print("Relations in model:")
for r in result.training.relation_to_id.keys():
print(r)
# We list the available relations in the model to help choose valid triples for prediction.
Relations in model: http://example.org/announce http://example.org/appoint http://example.org/attend http://example.org/battistaannounce http://example.org/bear http://example.org/clash http://example.org/content http://example.org/criticize http://example.org/date http://example.org/die http://example.org/encounter http://example.org/lock http://example.org/meet http://example.org/move http://example.org/praise http://example.org/preside http://example.org/reflect http://example.org/sourceURL http://example.org/study http://example.org/tell http://example.org/title http://example.org/vote http://example.org/write http://www.w3.org/1999/02/22-rdf-syntax-ns#type
We define a robust function to predict the top-k tail entities given a head entity and a relation using model scores. It handles device setup, tensor formatting, and outputs human-readable predictions.
def predict_tail_entities(model, head, relation, triples_factory, k=5):
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
# Convert head and relation to IDs
try:
head_id = torch.tensor([triples_factory.entity_to_id[head]], device=device)
rel_id = torch.tensor([triples_factory.relation_to_id[relation]], device=device)
except KeyError as e:
print(f" Entity or relation not found: {e}")
return []
# Generate all possible tail triples (h, r, ?)
all_tail_ids = torch.arange(triples_factory.num_entities, device=device)
# Shape: [num_entities, 3]
triples = torch.stack([
head_id.repeat(len(all_tail_ids)),
rel_id.repeat(len(all_tail_ids)),
all_tail_ids
], dim=1)
# Score all tail triples
with torch.no_grad():
scores = model.score_hrt(triples)
# Check actual size
if scores.dim() == 0 or scores.numel() == 0:
print(" No scores returned by model.")
return []
# Ensure scores is a 1D tensor
scores = scores.view(-1)
actual_k = min(k, scores.shape[0])
# Top-k scoring tails
topk = torch.topk(scores, k=actual_k)
top_indices = topk.indices.tolist()
top_scores = topk.values.tolist()
id_to_label = {v: k for k, v in triples_factory.entity_to_id.items()}
return [(id_to_label[idx], score) for idx, score in zip(top_indices, top_scores)]
head = 'http://example.org/Pope_Francis'
relation = 'http://example.org/die'
for name, result in model_results.items():
print(f"\n Predictions using {name}:")
try:
preds = predict_tail_entities(
model=result.model,
head=head,
relation=relation,
triples_factory=result.training,
k=5
)
for entity, score in preds:
print(f"- {entity:60} score: {score:.4f}")
except Exception as e:
print(f" Error with {name}: {e}")
🔮 Predictions using TransE: - http://example.org/Pope_Francis score: -5.7292 - http://example.org/Latinos score: -7.4719 - http://example.org/Rome score: -7.4900 - http://example.org/el_Papa_Francisco score: -8.1880 - http://example.org/Gaza score: -8.2402 🔮 Predictions using DistMult: - https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories score: 0.0907 - http://example.org/Imtiyaz_Khan score: 0.0899 - http://example.org/Eugenio_Pacelli score: 0.0791 - http://example.org/Jorge_Mario_Bergoglio score: 0.0755 - http://example.org/Vance score: 0.0727 🔮 Predictions using ComplEx: - http://example.org/Archdiocese_of_Washington score: 47.8624 - http://example.org/Lori score: 38.7026 - https://www.npr.org/article/article_9 score: 37.1092 - http://example.org/Pope_Pius score: 36.9296 - 2025-04-21T06:38:53-04:00 score: 36.5162
ComplEx produces the highest confidence scores, ranging from 36.52 to 47.86, and outputs semantically relevant entities such as Archdiocese_of_Washington and Pope_Pius, indicating strong plausibility.
DistMult generates lower but still positive scores between 0.0727 and 0.0907, with predictions like Eugenio_Pacelli and Jorge_Mario_Bergoglio, which are contextually appropriate but less confidently ranked.
In contrast, TransE yields consistently negative scores from -5.73 to -8.24, including entities like Pope_Francis and Gaza, suggesting weak or incorrect predictions.
The numerical contrast highlights that ComplEx is the most reliable model in this context, followed by DistMult, while TransE fails to provide meaningful predictions for this relation.
We print a few sample entity and relation labels from the training set to help with testing or interactive exploration :
print("Sample heads:", list(training.entity_to_id.keys())[:5])
print("Sample relations:", list(training.relation_to_id.keys())[:5])
Sample heads: ['2025-04-21T06:38:53-04:00', '2025-04-21T07:19:41-04:00', '2025-04-21T08:07:03-04:00', '2025-04-21T08:32:37-04:00', '2025-04-21T11:08:30-04:00'] Sample relations: ['http://example.org/announce', 'http://example.org/appoint', 'http://example.org/attend', 'http://example.org/battistaannounce', 'http://example.org/bear']
We run predictions using PyKEEN’s predict_target, store top predictions across models, and visualize them in a dataframe :
from pykeen.predict import predict_target
sample_head = list(model_results['TransE'].training.entity_labeling.label_to_id.keys())[0]
sample_relation = list(model_results['TransE'].training.relation_labeling.label_to_id.keys())[0]
print(f" Predicting: ({sample_head}, {sample_relation}, ?)")
# Store predictions
all_predictions = {}
for name, result in model_results.items():
print(f" Predicting with model: {name}")
try:
# Run prediction
pred_df = predict_target(
model=result.model,
head=sample_head,
relation=sample_relation,
triples_factory=result.training
).df
# Keep only top 5
top_preds = pred_df.head(5)[['tail_label', 'score']].copy()
all_predictions[f'{name} Tail'] = top_preds['tail_label'].tolist()
all_predictions[f'{name} Score'] = top_preds['score'].round(4).tolist()
except Exception as e:
print(f" Error with model {name}: {e}")
all_predictions[f'{name} Tail'] = ['ERROR'] * 5
all_predictions[f'{name} Score'] = [None] * 5
# Combine into a single DataFrame
df_all = pd.DataFrame(all_predictions)
pd.set_option('display.max_colwidth', None)
display(df_all)
🎯 Predicting: (2025-04-21T06:38:53-04:00, http://example.org/announce, ?) 🔮 Predicting with model: TransE 🔮 Predicting with model: DistMult 🔮 Predicting with model: ComplEx
| TransE Tail | TransE Score | DistMult Tail | DistMult Score | ComplEx Tail | ComplEx Score | |
|---|---|---|---|---|---|---|
| 0 | 2025-04-21T06:38:53-04:00 | -5.8472 | http://example.org/Ur | 0.0979 | http://example.org/Many_Africans | 48.3942 |
| 1 | http://example.org/University_of_Monterrey | -7.1925 | http://example.org/Northfield | 0.0946 | http://example.org/Archdiocese_of_Washington | 39.9880 |
| 2 | http://example.org/the_age_of_88_He | -7.2320 | http://example.org/Vatican_Pool | 0.0941 | https://www.npr.org/article/article_9 | 36.6563 |
| 3 | http://example.org/Vaticanuntil | -7.3815 | https://www.npr.org/article/article_8 | 0.0928 | http://example.org/Square | 34.9605 |
| 4 | http://example.org/Northfield | -7.3889 | http://example.org/Judaism | 0.0921 | http://example.org/God | 29.8242 |
To conclude, our evaluation and comparison of TransE, DistMult, and ComplEx across multiple tasks—including ranking metrics, similarity search, t-SNE visualization, and link prediction—demonstrate that ComplEx consistently outperforms the other models.
In both prediction scenarios, ComplEx not only achieves the highest plausibility scores (up to 48.39) but also returns semantically meaningful entities, highlighting its strong capability in capturing complex relational patterns.
DistMult provides moderate performance, with reasonable predictions and scores around 0.09, while TransE significantly underperforms with negative scores and irrelevant predictions, indicating its limitations on this dataset.
Overall, ComplEx is the most effective and reliable model for link prediction and entity representation in our case.