!pip install datasets
!pip install sklearn-crfsuite
!pip install transformers
!pip install beautifulsoup4 nltk spacy
!pip install rdflib

Requirement already satisfied: datasets in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.12.0)
Requirement already satisfied: numpy>=1.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (1.24.3)
Requirement already satisfied: pyarrow>=8.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (11.0.0)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.3.6)
Requirement already satisfied: pandas in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.0.3)
Requirement already satisfied: requests>=2.19.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.31.0)
Requirement already satisfied: tqdm>=4.62.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (4.65.0)
Requirement already satisfied: xxhash in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2.0.2)
Requirement already satisfied: multiprocess in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.70.14)
Requirement already satisfied: fsspec[http]>=2021.11.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (2023.4.0)
Requirement already satisfied: aiohttp in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (3.8.5)
Requirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.15.1)
Requirement already satisfied: packaging in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (23.1)
Requirement already satisfied: responses<0.19 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (0.13.3)
Requirement already satisfied: pyyaml>=5.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from datasets) (6.0)
Requirement already satisfied: attrs>=17.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (25.3.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (2.0.4)
Requirement already satisfied: multidict<7.0,>=4.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (6.0.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.8.1)
Requirement already satisfied: frozenlist>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from aiohttp->datasets) (1.2.0)
Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (3.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (4.13.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests>=2.19.0->datasets) (2023.7.22)
Requirement already satisfied: six in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from responses<0.19->datasets) (1.16.0)
Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=4.62.1->datasets) (0.4.6)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas->datasets) (2023.3)
Requirement already satisfied: sklearn-crfsuite in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (0.5.0)
Requirement already satisfied: python-crfsuite>=0.9.7 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (0.9.11)
Requirement already satisfied: scikit-learn>=0.24.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (1.3.0)
Requirement already satisfied: tabulate>=0.4.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (0.8.10)
Requirement already satisfied: tqdm>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sklearn-crfsuite) (4.65.0)
Requirement already satisfied: numpy>=1.17.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.24.3)
Requirement already satisfied: scipy>=1.5.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.11.1)
Requirement already satisfied: joblib>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn>=0.24.0->sklearn-crfsuite) (2.2.0)
Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=2.0->sklearn-crfsuite) (0.4.6)
Requirement already satisfied: transformers in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.32.1)
Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (3.9.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.15.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.15.1)
Requirement already satisfied: numpy>=1.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (1.24.3)
Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (23.1)
Requirement already satisfied: pyyaml>=5.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (6.0)
Requirement already satisfied: regex!=2019.12.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (2022.7.9)
Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (2.31.0)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.13.2)
Requirement already satisfied: safetensors>=0.3.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (0.3.2)
Requirement already satisfied: tqdm>=4.27 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from transformers) (4.65.0)
Requirement already satisfied: fsspec in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0,>=0.15.1->transformers) (2023.4.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from huggingface-hub<1.0,>=0.15.1->transformers) (4.13.2)
Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm>=4.27->transformers) (0.4.6)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->transformers) (2023.7.22)
Requirement already satisfied: beautifulsoup4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.12.2)
Requirement already satisfied: nltk in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (3.8.1)
Requirement already satisfied: spacy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (3.8.4)
Requirement already satisfied: soupsieve>1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from beautifulsoup4) (2.4)
Requirement already satisfied: click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (8.0.4)
Requirement already satisfied: joblib in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (1.2.0)
Requirement already satisfied: regex>=2021.8.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (2022.7.9)
Requirement already satisfied: tqdm in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from nltk) (4.65.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.0.12)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.0.11)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.0.9)
Requirement already satisfied: thinc<8.4.0,>=8.3.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (8.3.4)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.1.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.5.1)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.0.10)
Requirement already satisfied: weasel<0.5.0,>=0.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (0.4.1)
Requirement already satisfied: typer<1.0.0,>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (0.15.2)
Requirement already satisfied: numpy>=1.19.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.24.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (1.10.8)
Requirement already satisfied: jinja2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.1.2)
Requirement already satisfied: setuptools in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (68.0.0)
Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (23.1)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from spacy) (3.5.0)
Requirement already satisfied: language-data>=1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0)
Requirement already satisfied: typing-extensions>=4.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.13.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2023.7.22)
Requirement already satisfied: blis<1.3.0,>=1.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.2.0)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)
Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from tqdm->nltk) (0.4.6)
Requirement already satisfied: shellingham>=1.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typer<1.0.0,>=0.3.0->spacy) (13.9.4)
Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.21.0)
Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (5.2.1)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from jinja2->spacy) (2.1.1)
Requirement already satisfied: marisa-trie>=1.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.2.1)
Requirement already satisfied: markdown-it-py>=2.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.2.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.15.1)
Requirement already satisfied: mdurl~=0.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.0)
Requirement already satisfied: rdflib in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (7.1.3)
Requirement already satisfied: pyparsing<4,>=2.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from rdflib) (3.0.9)


from datasets import load_dataset

# Loading the CoNLL-2003 dataset
dataset = load_dataset("conll2003")

Found cached dataset conll2003 (C:/Users/Nejjari/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98)

  0%|          | 0/3 [00:00<?, ?it/s]


import spacy
import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# download required data
nltk.download('punkt')
nltk.download('stopwords')

# load spaCy English model
nlp = spacy.load("en_core_web_sm")

def preprocess_text(text):
    # tokenize
    tokens = word_tokenize(text)

    # lowercase
    tokens = [word.lower() for word in tokens]

    # remove punctuation (except hyphen)
    punctuation = string.punctuation.replace('-', '')
    tokens = [word for word in tokens if word not in punctuation]

    # remove stopwords
    stop_words = set(stopwords.words("english"))
    filtered_tokens = [word for word in tokens if word not in stop_words]

    # lemmatize
    doc = nlp(" ".join(filtered_tokens))
    lemmatized_text = " ".join([token.lemma_ for token in doc])

    return lemmatized_text

# example
text = "Apple was founded by Steve Jobs in 1976."
print(preprocess_text(text))

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Nejjari\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Nejjari\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

apple found steve job 1976


import sklearn_crfsuite
from sklearn_crfsuite import metrics
from datasets import load_dataset
from sklearn_crfsuite import metrics as crf_metrics

train_dataset = dataset["train"]
val_dataset = dataset["validation"]
test_dataset = dataset["test"]

# feature extraction function
def word2features(sent, i):
    word = sent[i]
    features = {
        'bias': 1.0,
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word[-2:]': word[-2:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
    }
    if i > 0:
        word1 = sent[i-1]
        features.update({
            '-1:word.lower()': word1.lower(),
            '-1:word.istitle()': word1.istitle(),
            '-1:word.isupper()': word1.isupper(),
        })
    else:
        features['BOS'] = True

    if i < len(sent)-1:
        word1 = sent[i+1]
        features.update({
            '+1:word.lower()': word1.lower(),
            '+1:word.istitle()': word1.istitle(),
            '+1:word.isupper()': word1.isupper(),
        })
    else:
        features['EOS'] = True
    return features

def extract_features_and_labels(dataset):
    sentences = [example["tokens"] for example in dataset]
    label_list = dataset.features["ner_tags"].feature.names
    labels = [[label_list[tag] for tag in example["ner_tags"]] for example in dataset]

    X = [[word2features(sent, i) for i in range(len(sent))] for sent in sentences]
    return X, labels

# preparing the data 
X_train, y_train = extract_features_and_labels(train_dataset)
X_test, y_test = extract_features_and_labels(test_dataset)

# training CRF
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1,
    c2=0.1,
    max_iterations=100,
    all_possible_transitions=False
)
crf.fit(X_train, y_train)

# predicting and evaluating
y_pred = crf.predict(X_test)
print(metrics.flat_classification_report(y_test, y_pred))

              precision    recall  f1-score   support

       B-LOC       0.86      0.83      0.84      1668
      B-MISC       0.80      0.76      0.78       702
       B-ORG       0.82      0.71      0.76      1661
       B-PER       0.82      0.84      0.83      1617
       I-LOC       0.79      0.70      0.74       257
      I-MISC       0.64      0.65      0.65       216
       I-ORG       0.70      0.74      0.72       835
       I-PER       0.86      0.95      0.90      1156
           O       0.99      0.99      0.99     38323

    accuracy                           0.96     46435
   macro avg       0.81      0.80      0.80     46435
weighted avg       0.96      0.96      0.96     46435


# loading spaCy's pre-trained model
nlp = spacy.load(r"C:\Users\Nejjari\Documents\WebData Project\en_ner_conll03\best_ner_model")

# example sentence
text = "Apple was founded by Steve Jobs in 1976."
doc = nlp(text)

# extracting and display named entities with positions
entities = [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
print("Extracted Entities:", entities)

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\spacy\util.py:910: UserWarning: [W095] Model 'en_pipeline' (0.0.0) was trained with spaCy v3.7.5 and may not be 100% compatible with the current version (3.8.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

Extracted Entities: [('Apple', 0, 5, 'ORG'), ('Steve Jobs', 21, 31, 'PER')]


from sklearn.metrics import classification_report
from itertools import chain

# function to convert spaCy predictions to the same format as CRF
def spacy_predict(doc_tokens):
    text = " ".join(doc_tokens)
    doc = nlp(text)
    ents = {ent.text: ent.label_ for ent in doc.ents}
    labels = []
    for token in doc_tokens:
        if token in ents:
            labels.append(f"B-{ents[token]}")
        else:
            labels.append("O")
    return labels

# applying to full test set
spacy_preds = []
true_labels = []
label_list = dataset["test"].features["ner_tags"].feature.names

for example in dataset["test"]:
    tokens = example["tokens"]
    true = [label_list[i] for i in example["ner_tags"]]
    pred = spacy_predict(tokens)

    # padding/truncating to match length
    pred += ["O"] * (len(true) - len(pred))  # if pred shorter
    pred = pred[:len(true)]  # in case pred is longer

    spacy_preds.append(pred)
    true_labels.append(true)




# flattens the lists
flat_true_labels = list(chain.from_iterable(true_labels))
flat_spacy_preds = list(chain.from_iterable(spacy_preds))

# evaluation
print("spaCy NER performance:")
print(classification_report(flat_true_labels, flat_spacy_preds))

spaCy NER performance:
              precision    recall  f1-score   support

       B-LOC       0.89      0.72      0.79      1668
      B-MISC       0.85      0.57      0.68       702
       B-ORG       0.76      0.49      0.60      1661
       B-PER       0.66      0.21      0.32      1617
       I-LOC       0.00      0.00      0.00       257
      I-MISC       0.00      0.00      0.00       216
       I-ORG       0.00      0.00      0.00       835
       I-PER       0.00      0.00      0.00      1156
           O       0.89      1.00      0.94     38323

    accuracy                           0.88     46435
   macro avg       0.45      0.33      0.37     46435
weighted avg       0.83      0.88      0.85     46435

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))


import pandas as pd
import matplotlib.pyplot as plt

# flattening crf outputs
flat_crf_true = list(chain.from_iterable(y_test))      # y_test from crf
flat_crf_pred = list(chain.from_iterable(y_pred))      # y_pred from crf

# flattening spacy outputs
flat_spacy_true = list(chain.from_iterable(true_labels))       
flat_spacy_pred = list(chain.from_iterable(spacy_preds))

# generating classification reports as dictionaries
crf_report_dict = classification_report(flat_crf_true, flat_crf_pred, output_dict=True)
spacy_report_dict = classification_report(flat_spacy_true, flat_spacy_pred, output_dict=True)

# converting dictionaries to dataframes
crf_df = pd.DataFrame(crf_report_dict).transpose()
spacy_df = pd.DataFrame(spacy_report_dict).transpose()


# plotting macro average scores
metrics_to_plot = ["precision", "recall", "f1-score"]
x = range(len(metrics_to_plot))

plt.figure(figsize=(8, 5))
plt.bar([i - 0.2 for i in x], crf_df.loc["macro avg"][metrics_to_plot], width=0.4, label="crf")
plt.bar([i + 0.2 for i in x], spacy_df.loc["macro avg"][metrics_to_plot], width=0.4, label="spacy")

plt.xticks(x, metrics_to_plot)
plt.ylabel("score")
plt.ylim(0, 1.05)
plt.title("crf vs spacy ner performance (macro avg)")
plt.legend()
plt.tight_layout()
plt.show()

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\sklearn\metrics\_classification.py:1469: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))


# side-by-side macro avg comparison table
combined_df = pd.concat([
    crf_df.loc[["macro avg"]][metrics_to_plot].rename(columns=lambda x: f"CRF_{x}"),
    spacy_df.loc[["macro avg"]][metrics_to_plot].rename(columns=lambda x: f"spaCy_{x}")
], axis=1)

print("\nMacro Average Comparison (CRF vs spaCy):")
display(combined_df)

Macro Average Comparison (CRF vs spaCy):


import csv

# helper function to extract entities from crf token-label pairs
def extract_crf_entities(tokens, labels):
    entities = []
    current_entity = []
    current_label = None

    for idx, (token, label) in enumerate(zip(tokens, labels)):
        if label.startswith("B-"):
            if current_entity:
                entities.append((" ".join(current_entity), current_label, start_idx, idx - 1))
            current_entity = [token]
            current_label = label[2:]
            start_idx = idx
        elif label.startswith("I-") and current_label == label[2:]:
            current_entity.append(token)
        else:
            if current_entity:
                entities.append((" ".join(current_entity), current_label, start_idx, idx - 1))
                current_entity = []
                current_label = None

    if current_entity:
        entities.append((" ".join(current_entity), current_label, start_idx, len(tokens) - 1))

    return entities

# open file and write both outputs
with open("combined_ner_entities.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["model", "sentence_id", "entity", "label", "start", "end"])

    for idx, (example, crf_labels) in enumerate(zip(dataset["test"], y_pred)):
        tokens = example["tokens"]
        text = " ".join(tokens)

        # --- spaCy entities ---
        doc = nlp(text)
        for ent in doc.ents:
            writer.writerow(["spaCy", idx, ent.text, ent.label_, ent.start_char, ent.end_char])

        # --- CRF entities (token index positions) ---
        crf_entities = extract_crf_entities(tokens, crf_labels)
        for ent_text, ent_label, start_idx, end_idx in crf_entities:
            writer.writerow(["CRF", idx, ent_text, ent_label, start_idx, end_idx])


import spacy

# spaCy's model for relation extraction
nlp = spacy.load("en_core_web_sm")

def extract_relations(text):
    doc = nlp(text)
    relations = []

    for token in doc:
        # check for passive or active subject
        if (token.dep_ == "nsubj" or token.dep_ == "nsubjpass") and token.head.dep_ == "ROOT":
            subject = token.text
            predicate = token.head.text

            # look for object of preposition or agent
            for child in token.head.children:
                if child.dep_ in ("prep", "agent"):
                    for obj in child.children:
                        if obj.dep_ == "pobj":
                            object_phrase = " ".join([tok.text for tok in obj.subtree])
                            relations.append((subject, predicate, object_phrase))


    return relations

text = "Apple was founded by Steve Jobs."
print(extract_relations(text))

[('Apple', 'founded', 'Steve Jobs')]


from spacy import displacy
displacy.render(nlp("Apple was founded by Steve Jobs."), style="dep", jupyter=True)


displacy.render(nlp("Apple was founded by Steve Jobs."), style="ent", jupyter=True)


texts = [
    "Apple was founded by Steve Jobs.",
    "Google acquired YouTube in 2006.",
    # you can add more sentences here
]

for i, text in enumerate(texts):
    print(f"Sentence {i}: {text}")
    print("Extracted Relations:", extract_relations(text))
    print()

Sentence 0: Apple was founded by Steve Jobs.
Extracted Relations: [('Apple', 'founded', 'Steve Jobs')]

Sentence 1: Google acquired YouTube in 2006.
Extracted Relations: [('Google', 'acquired', '2006')]


pip install rdflibhttp://localhost:8888/notebooks/Documents/WebData%20Project/Untitled.ipynb#4.-Knowledge-Graph-Building:

Note: you may need to restart the kernel to use updated packages.

ERROR: Invalid requirement: 'rdflibhttp://localhost:8888/notebooks/Documents/WebData%20Project/Untitled.ipynb#4.-Knowledge-Graph-Building:'


from rdflib import Graph, URIRef, Namespace
from rdflib.namespace import RDF

# sample aligned triples
triples = [
    ("Tesla", "schema:founder", "Elon Musk"),
    ("Google", "dbo:acquisition", "YouTube")
]

# create RDF graph
g = Graph()

# define namespaces
EX = Namespace("http://example.org/")
SCHEMA = Namespace("http://schema.org/")
DBO = Namespace("http://dbpedia.org/ontology/")

prefix_map = {
    "schema": SCHEMA,
    "dbo": DBO,
    "ex": EX
}

# add triples to graph
for s, p, o in triples:
    prefix, pred = p.split(":")
    predicate_uri = prefix_map[prefix][pred]
    subject_uri = URIRef(EX[s.replace(" ", "_")])
    object_uri = URIRef(EX[o.replace(" ", "_")])
    g.add((subject_uri, predicate_uri, object_uri))


print(g.serialize(format="turtle"))

@prefix ns1: <http://dbpedia.org/ontology/> .
@prefix ns2: <http://schema.org/> .

<http://example.org/Google> ns1:acquisition <http://example.org/YouTube> .

<http://example.org/Tesla> ns2:founder <http://example.org/Elon_Musk> .


query = """
SELECT ?predicate ?object
WHERE {
  <http://example.org/Google> ?predicate ?object
}
"""

for row in g.query(query):
    print(row.predicate, "-->", row.object)

http://dbpedia.org/ontology/acquisition --> http://example.org/YouTube


tesla_uri = URIRef("http://dbpedia.org/resource/Tesla,_Inc.")


g.add((tesla_uri, SCHEMA.founder, URIRef("http://dbpedia.org/resource/Elon_Musk")))

<Graph identifier=Nc39f8bbe3d6e464dbbe616ccbb4c5eab (<class 'rdflib.graph.Graph'>)>


text = """Star Wars IV is a Movie where there are different kinds of creatures, like humans and wookies. Some creatures are Jedis; for instance, the human Luke is a Jedi, and Master Yoda — for whom the species is not known — is also a Jedi. The wookie named Chewbacca is Han’s co-pilot on the Millennium Falcon starship. The speed of Millennium Falcon is 1.5 (above the speed of light!)"""


triples = extract_relations(text)
print("Extracted Triples:")
for t in triples:
    print(t)

Extracted Triples:
('wookie', 'is', 'the Millennium Falcon starship')


pip install selenium beautifulsoup4 requests

Requirement already satisfied: selenium in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.31.0)
Requirement already satisfied: beautifulsoup4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (4.12.2)
Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.31.0)
Requirement already satisfied: urllib3[socks]<3,>=1.26 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (1.26.16)
Requirement already satisfied: trio~=0.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (0.29.0)
Requirement already satisfied: trio-websocket~=0.9 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (0.12.2)
Requirement already satisfied: certifi>=2021.10.8 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (2023.7.22)
Requirement already satisfied: typing_extensions~=4.9 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (4.13.2)
Requirement already satisfied: websocket-client~=1.8 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from selenium) (1.8.0)
Requirement already satisfied: soupsieve>1.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from beautifulsoup4) (2.4)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests) (3.4)
Requirement already satisfied: attrs>=23.2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (25.3.0)
Requirement already satisfied: sortedcontainers in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (2.4.0)
Requirement already satisfied: outcome in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.3.0.post0)
Requirement already satisfied: sniffio>=1.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.3.1)
Requirement already satisfied: cffi>=1.14 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio~=0.17->selenium) (1.15.1)
Requirement already satisfied: wsproto>=0.14 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from trio-websocket~=0.9->selenium) (1.2.0)
Requirement already satisfied: PySocks!=1.5.7,<2.0,>=1.5.6 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from urllib3[socks]<3,>=1.26->selenium) (1.7.1)
Requirement already satisfied: pycparser in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from cffi>=1.14->trio~=0.17->selenium) (2.21)
Requirement already satisfied: h11<1,>=0.9.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.14.0)
Note: you may need to restart the kernel to use updated packages.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

def fetch_npr_articles():
    options = Options()
    driver = webdriver.Chrome(options=options)

    print("✅ Driver launched. Navigating to listing page...")
    driver.get("https://www.npr.org/sections/world/")
    time.sleep(3)  # allows page to load

    soup = BeautifulSoup(driver.page_source, "html.parser")
    print("✅ Section page loaded")

    # collect article URLs
    links = []
    for a in soup.select("article a[href^='https://www.npr.org/']"):
        href = a['href']
        if '/202' in href and href not in links:
            links.append(href)
        if len(links) >= 10:
            break

    print(f"🔗 Collected {len(links)} article links")

    articles = []

    for url in links:
        try:
            print(f" Opening {url}")
            driver.get(url)
            time.sleep(3)  # Wait for article to load

            article_soup = BeautifulSoup(driver.page_source, "html.parser")

            # --- Title
            title_tag = article_soup.find("h1")
            title = title_tag.get_text(strip=True) if title_tag else "No title"

            # --- Date
            date_tag = article_soup.find("time")
            publication_date = date_tag.get("datetime") if date_tag else "Unknown"

            # --- Content (fixed)
            paragraphs = article_soup.select("div[class*='storytext'] p")
            content = "\n".join(p.get_text(strip=True) for p in paragraphs) if paragraphs else "No content"

            articles.append({
                "title": title,
                "url": url,
                "publication_date": publication_date,
                "content": content
            })

        except Exception as e:
            print(f" Error processing {url}: {e}")

    driver.quit()
    return articles

articles = fetch_npr_articles()
for i, article in enumerate(articles):
    print(f"\n Article {i+1}")
    print(f"Title: {article['title']}")
    print(f"Date: {article['publication_date']}")
    print(f"URL: {article['url']}")
    print(f"Content Preview:\n{article['content'][:300]}...\n{'-'*80}")

✅ Driver launched. Navigating to listing page...
✅ Section page loaded
🔗 Collected 10 article links
 Opening https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories
 Opening https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts
 Opening https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline
 Opening https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade
 Opening https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope
 Opening https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis
 Opening https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences
 Opening https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis
 Opening https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador
 Opening https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke

 Article 1
Title: Do you have memories of Pope Francis to share? Send them our way
Date: 2025-04-21T14:25:05-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories
Content Preview:
Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the ...
--------------------------------------------------------------------------------

 Article 2
Title: Pope Francis is remembered around the world for his generosity of spirit
Date: 2025-04-21T14:05:36-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts
Content Preview:
People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption
Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the p...
--------------------------------------------------------------------------------

 Article 3
Title: What happens next after a pope dies, according to recent history
Date: 2025-04-21T13:14:58-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline
Content Preview:
The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption
This is a developing story. For more of our coverage head toour latest updates.
Pope Francis' death on Monday sets in motion weeks-long series of events...
--------------------------------------------------------------------------------

 Article 4
Title: China warns of 'countermeasures' against any deals that harm its interests
Date: 2025-04-21T11:08:30-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade
Content Preview:
People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China's Guangdong province.Jade Gao/AFP via Getty Imageshide caption
As the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm i...
--------------------------------------------------------------------------------

 Article 5
Title: Who is Cardinal Kevin Farrell, the acting head of the Vatican?
Date: 2025-04-21T10:34:12-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope
Content Preview:
Cardinal Kevin Farrell, Camerlengo of the Apostolic Chamber, announced the death of Pope Francis from the Casa Santa Marta in Vatican City on Monday.Vatican Pool/Getty Imageshide caption
Cardinal Kevin Farrell, who announcedPope Francis' deathon Monday morning, is now the acting head of the Vatican ...
--------------------------------------------------------------------------------

 Article 6
Title: A brief history of Trump's feud with Pope Francis
Date: 2025-04-21T09:13:28-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis
Content Preview:
Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who...
--------------------------------------------------------------------------------

 Article 7
Title: Argentina's president, a former critic of Pope Francis, offers his condolences
Date: 2025-04-21T08:32:37-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences
Content Preview:
Pope Francis meets with newly elected Argentinian President Javier Milei before a Canonization Ceremony in St. Peter's Basilica on Feb. 11, 2024 in Vatican City, Vatican.Vatican Pool/Getty Imageshide caption
Argentina's president sent profound condolences to the family of Pope Francis and to all Cat...
--------------------------------------------------------------------------------

 Article 8
Title: Leaders in Africa mourn the passing of Pope Francis
Date: 2025-04-21T08:07:03-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis
Content Preview:
Pope Francis meets with president of Kenya William Samoei Ruto during the G7 Leaders Summit on day two of the 50th G7 summit at Borgo Egnazia on June 14, 2024 in Fasano, Italy.Vatican Media via Vatican Pool/Getty Imageshide caption
On Monday,Kenyan PresidentWilliam Rutoposted on X that Francis "exem...
--------------------------------------------------------------------------------

 Article 9
Title: Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return
Date: 2025-04-21T07:19:41-04:00
URL: https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador
Content Preview:
Good morning. You're reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day.
Pope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in o...
--------------------------------------------------------------------------------

 Article 10
Title: Who will be the next pope? Here's how the conclave works
Date: 2025-04-21T06:38:53-04:00
URL: https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke
Content Preview:
White smoke billows from a chimney at the Sistine Chapel, signaling that cardinal electors have chosen a new pope — Pope Francis — on March 13, 2013.Jeff J Mitchell/Getty Imageshide caption
Pope Francis has died at 88. For more of our coverage head toour latest updates.
The white smoke is famous. Wh...
--------------------------------------------------------------------------------


import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# downloading necessary nltk resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'\d+', '', text)
    text = text.translate(str.maketrans('', '', string.punctuation))
    tokens = nltk.word_tokenize(text)
    tokens = [word for word in tokens if word not in stopwords.words('english')]
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    return ' '.join(tokens)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Nejjari\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Nejjari\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Nejjari\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


# applying preprocessing

for article in articles:
    article["cleaned_title"] = preprocess_text(article.get("title", ""))
    article["cleaned_content"] = preprocess_text(article.get("content", ""))

# displaying the results as an example for the first few articles
for i, article in enumerate(articles[:3], 1):  # preview first 3
    print(f"\n📰 Article {i}")
    print("🔹 Original Title     :", article.get("title", ""))
    print("✅ Cleaned Title      :", article["cleaned_title"])
    print("📄 Original Content   :", article.get("content", "")[:300])
    print("✅ Cleaned Content    :", article["cleaned_content"][:300])
    print("📅 Date               :", article.get("publication_date", ""))
    print("🔗 URL                :", article.get("url", ""))
    print("-" * 100)

📰 Article 1
🔹 Original Title     : Do you have memories of Pope Francis to share? Send them our way
✅ Cleaned Title      : memory pope francis share send way
📄 Original Content   : Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the 
✅ Cleaned Content    : pope francis drive crowd inauguration mass pope st peter square march vatican city vatican mass held front expected crowd one million pilgrim faithful filled square surrounding street see former cardinal buenos aire officially take role pontiffspencer plattgetty imageshide caption wed love hear refl
📅 Date               : 2025-04-21T14:25:05-04:00
🔗 URL                : https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories
----------------------------------------------------------------------------------------------------

📰 Article 2
🔹 Original Title     : Pope Francis is remembered around the world for his generosity of spirit
✅ Cleaned Title      : pope francis remembered around world generosity spirit
📄 Original Content   : People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption
Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the p
✅ Cleaned Content    : people attend interfaith memorial meeting mourn death pope francis new delhi india mondayimtiyaz khananadolu via getty imageshide caption catholic across globe mourningthe death pope francis remembering humility generosity spirit concern poor steadfast effort restore trust church year scandal franci
📅 Date               : 2025-04-21T14:05:36-04:00
🔗 URL                : https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts
----------------------------------------------------------------------------------------------------

📰 Article 3
🔹 Original Title     : What happens next after a pope dies, according to recent history
✅ Cleaned Title      : happens next pope dy according recent history
📄 Original Content   : The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption
This is a developing story. For more of our coverage head toour latest updates.
Pope Francis' death on Monday sets in motion weeks-long series of events
✅ Cleaned Content    : funeral pope john paul ii saint peter basilica rome italy april eric vandevillegammarapho via getty imageshide caption developing story coverage head toour latest update pope francis death monday set motion weekslong series event period mourning process selecting successor vatican intricate set rule
📅 Date               : 2025-04-21T13:14:58-04:00
🔗 URL                : https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline
----------------------------------------------------------------------------------------------------


# loading the custom trained NER model
ner_model_path = r"C:\Users\Nejjari\Documents\WebData Project\en_ner_conll03\best_ner_model"
nlp_ner = spacy.load(ner_model_path)

# defining a named entity extraction function
def extract_entities(text):
    doc = nlp_ner(text)
    return [(ent.text, ent.label_) for ent in doc.ents]

# applying entity extraction to all cleaned articles
for article in articles:
    article["entities"] = extract_entities(article.get("content", ""))

# showing the result from one article
print("🔍 Named Entities in Article 1:\n")
for entity, label in articles[0]["entities"]:
    print(f"  - {entity} [{label}]")

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\spacy\util.py:910: UserWarning: [W095] Model 'en_pipeline' (0.0.0) was trained with spaCy v3.7.5 and may not be 100% compatible with the current version (3.8.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

🔍 Named Entities in Article 1:

  - Pope Francis [PER]
  - Inauguration Mass [ORG]
  - Pope [LOC]
  - St. Peter [LOC]
  - Vatican City [LOC]
  - Vatican [LOC]
  - Cardinal of Buenos Aires [ORG]
  - Spencer Platt [PER]
  - Getty Imageshide [PER]
  - Pope Francis [PER]


# load spaCy's small English model for dependency parsing
nlp_re = spacy.load("en_core_web_sm")

# define relation extraction function
def extract_relations(text):
    doc = nlp_re(text)
    relations = []

    for token in doc:
        # identifying the main verb (root of the sentence)
        if token.pos_ == "VERB" and token.dep_ == "ROOT":
            subject, object_ = None, None

            # checking verb's children for subject or object
            for child in token.children:
                if child.dep_ in ("nsubj", "nsubjpass"):
                    subject = child
                elif child.dep_ in ("dobj", "attr"):
                    object_ = child
                elif child.dep_ == "prep":
                    for pobj in child.children:
                        if pobj.dep_ == "pobj":
                            object_ = pobj

            # filtering only if both subject and object are named entities
            subj_ent = next((ent for ent in doc.ents if subject and subject.text in ent.text), None)
            obj_ent = next((ent for ent in doc.ents if object_ and object_.text in ent.text), None)

            if subj_ent and obj_ent:
                relations.append((subj_ent.text, token.lemma_, obj_ent.text))

    return relations


# add extracted relations to each article
for article in articles:
    article["relations"] = extract_relations(article.get("content", ""))


print("🔗 Relations found in Article 1:\n")
for rel in articles[0]["relations"]:
    print(f"({rel[0]}) ---[{rel[1]}]---> ({rel[2]})")

🔗 Relations found in Article 1:

(Francis) ---[drive]---> (Vatican City)


from spacy import displacy

# visualize dependency structure for article 1 in Jupyter
doc = nlp_re(articles[0]["content"])
displacy.render(doc, style="dep", jupyter=True)


from rdflib import Graph, URIRef, Namespace, RDF, Literal
from rdflib.namespace import XSD
import re

# helper to sanitize text for URIs
def clean_uri(value):
    value = value.strip()
    value = re.sub(r'[^a-zA-Z0-9_]', '_', value)
    value = re.sub(r'_+', '_', value)
    return value.strip('_')

# namespaces
EX = Namespace("http://example.org/")
NPR = Namespace("https://www.npr.org/article/")
g = Graph()
g.bind("ex", EX)
g.bind("npr", NPR)

# building graph
for i, article in enumerate(articles, 1):
    article_uri = URIRef(NPR[f"article_{i}"])
    g.add((article_uri, RDF.type, EX.Article))
    g.add((article_uri, EX.title, Literal(article.get("title", ""), datatype=XSD.string)))
    g.add((article_uri, EX.content, Literal(article.get("content", ""), datatype=XSD.string)))
    g.add((article_uri, EX.sourceURL, Literal(article.get("url", ""), datatype=XSD.anyURI)))

    date_str = article.get("publication_date", "")
    if date_str.count("-") == 2:
        g.add((article_uri, EX.date, Literal(date_str, datatype=XSD.date)))
    else:
        g.add((article_uri, EX.date, Literal(date_str, datatype=XSD.string)))

    for ent_text, ent_type in sorted(article.get("entities", [])):
        ent_uri = URIRef(EX[clean_uri(ent_text)])
        ent_class = URIRef(EX[clean_uri(ent_type)])
        g.add((ent_uri, RDF.type, ent_class))

    for subj, pred, obj in article.get("relations", []):
        subj_uri = URIRef(EX[clean_uri(subj)])
        pred_uri = URIRef(EX[clean_uri(pred)])
        obj_uri = URIRef(EX[clean_uri(obj)])
        g.add((subj_uri, pred_uri, obj_uri))

# output graph in Turtle format
ttl_output = g.serialize(format="turtle")
print(ttl_output)

@prefix ex: <http://example.org/> .
@prefix npr: <https://www.npr.org/article/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:115 ex:vote ex:April_2005at_age_84 .

ex:AFP_via_Getty_Imageshide a ex:ORG .

ex:AP_APhide a ex:ORG .

ex:AP_Photo a ex:ORG .

ex:AP_Photo_ASSOCIATED a ex:ORG .

ex:Abdul_Latif_Rashid a ex:PER .

ex:Accordingto_Britannica a ex:PER .

ex:Africa a ex:LOC .

ex:African a ex:MISC .

ex:AfterExtra a ex:ORG .

ex:Ahmedalso a ex:PER .

ex:Alessandra_Tarantino a ex:PER .

ex:Alessia_Pierdomenico a ex:LOC .

ex:America a ex:LOC .

ex:American a ex:MISC .

ex:Anadolu a ex:LOC .

ex:Andreas_Solaro a ex:PER .

ex:Andrew_Medichini_Andrew a ex:PER .

ex:Anthony_Kuhn a ex:PER .

ex:Apostolic_Camera a ex:ORG .

ex:Apostolic_Chamber a ex:MISC .

ex:Apostolic_Palace a ex:ORG .

ex:Archdiocese_of_Washington a ex:ORG .

ex:Argentina a ex:LOC .

ex:Argentine a ex:MISC .

ex:Argentinian a ex:MISC .

ex:Arthur a ex:PER ;
    ex:encounter ex:The_Black_Knight .

ex:Asia a ex:LOC .

ex:Associated_Pressthat_Francis a ex:ORG .

ex:Association_of_United_States_Catholic_Priests a ex:ORG .

ex:Basilica a ex:LOC .

ex:Beijing a ex:LOC .

ex:Benedict_XVI a ex:ORG .

ex:Bianca_Lott a ex:PER ;
    ex:study ex:her_spring_semester .

ex:Bishop a ex:PER .

ex:Black_Knight a ex:PER .

ex:Bloomberg a ex:LOC .

ex:Borgo_Egnazia a ex:LOC .

ex:Britannica a ex:LOC .

ex:Bry_Jensen a ex:PER ;
    ex:tell ex:NPR .

ex:Buenos_Aires a ex:LOC .

ex:CSI_300 a ex:MISC .

ex:Camerlengo a ex:PER .

ex:Canon a ex:PER .

ex:Canon_Law a ex:PER .

ex:Canonization_Ceremony a ex:ORG .

ex:Cardinal_of_Buenos_Aires a ex:ORG .

ex:Cardinals a ex:ORG .

ex:Casa_Santa_Marta a ex:ORG,
        ex:PER .

ex:Catholic a ex:MISC .

ex:Catholic_Charitable_Organizations a ex:ORG .

ex:Catholic_Church a ex:ORG .

ex:Catholic_University_of_America a ex:ORG .

ex:Catholics a ex:MISC .

ex:China a ex:LOC .

ex:Chinese a ex:MISC .

ex:Christian a ex:MISC .

ex:Christians a ex:MISC ;
    ex:move ex:Gaza .

ex:Church a ex:LOC .

ex:Clarin a ex:PER .

ex:College_of_Cardinals a ex:ORG .

ex:Commission_for_Confidential_Matters a ex:ORG .

ex:Conclaves a ex:PER .

ex:Cristina_Sille a ex:PER .

ex:Dallas a ex:LOC .

ex:Dennis a ex:PER .

ex:Dicastery_for_Laity a ex:ORG .

ex:Divine_Spirit a ex:PER .

ex:Donald_Trump a ex:PER ;
    ex:clash ex:recent_years ;
    ex:praise ex:2013 .

ex:Doug_Rand a ex:PER .

ex:Edward_J_Weisenburger a ex:PER .

ex:El_Salvador a ex:LOC .

ex:Ethiopia a ex:LOC .

ex:Eugenio_Pacelli a ex:PER .

ex:Europe a ex:LOC .

ex:European a ex:MISC .

ex:Extra a ex:MISC .

ex:Family a ex:LOC .

ex:Farrell a ex:LOC,
        ex:PER .

ex:Fasano a ex:LOC .

ex:Four_House_Democrats a ex:MISC .

ex:Francis a ex:PER ;
    ex:bear ex:Jorge_Bergoglio ;
    ex:choose ex:March_2013 ;
    ex:die ex:88,
        ex:Easter_Sunday,
        ex:the_age_of_88_He ;
    ex:drive ex:Vatican_City ;
    ex:meet ex:a_Canonization_Ceremony,
        ex:the_G7_Leaders_Summit,
        ex:withLee_Yong_soo ;
    ex:preside ex:Vatican_City ;
    ex:write ex:Santa_Maria_Maggiore,
        ex:his2025 .

ex:Francisroundly a ex:ORG .

ex:Franco_Origlia a ex:ORG .

ex:French a ex:MISC .

ex:Gabriel_Romanelli a ex:PER .

ex:Gallatin_Gateway a ex:PER .

ex:Gaza_City a ex:LOC .

ex:George_Antone a ex:PER .

ex:Germany a ex:LOC .

ex:Getty_Imageshide a ex:PER .

ex:Gioacchino_Pecci a ex:PER .

ex:Giovanni_Battistaannounced a ex:PER ;
    ex:battistaannounce ex:tens_of_thousands .

ex:Global_South a ex:MISC .

ex:God a ex:PER .

ex:God_Bless a ex:PER .

ex:Graham_Chapman a ex:PER .

ex:Grand_Ayatollah_Ali a ex:PER .

ex:Gregg_Gassman a ex:PER .

ex:Guangdong a ex:LOC .

ex:Guangzhou a ex:LOC .

ex:Habemus_Papam a ex:ORG .

ex:Hatciri_Lopez a ex:PER .

ex:Here a ex:PER .

ex:Holy_Family_Church a ex:ORG .

ex:Holy_Father a ex:ORG .

ex:Holy_Gospels a ex:ORG .

ex:Holy_Grail_Fathom_Entertainmenthide a ex:ORG .

ex:Holy_Seeuntil a ex:ORG .

ex:Holy_Spirit a ex:ORG .

ex:ISIS a ex:LOC,
        ex:ORG .

ex:Ididsponsor a ex:ORG .

ex:Imtiyaz_Khan a ex:PER .

ex:Inauguration_Mass a ex:ORG .

ex:India a ex:LOC .

ex:Iraq a ex:LOC .

ex:Iraqi a ex:MISC .

ex:Ireland a ex:LOC .

ex:Irish a ex:MISC .

ex:Islam a ex:MISC .

ex:Israel a ex:LOC .

ex:Italian a ex:MISC .

ex:Italy a ex:LOC .

ex:J_Mitchell a ex:PER .

ex:Jade_Gao a ex:PER .

ex:Jane_Arraf a ex:PER .

ex:Japanese a ex:MISC .

ex:Javier_Milei a ex:PER .

ex:Jensen a ex:PER .

ex:Jesus a ex:PER .

ex:Joe_Raedle a ex:PER .

ex:John a ex:PER .

ex:John_Paul a ex:PER .

ex:John_Paul_II a ex:PER ;
    ex:appoint ex:2001 ;
    ex:die ex:Easter .

ex:Johnston_County a ex:PER .

ex:Jorge a ex:PER .

ex:Jorge_Mario_Bergoglio a ex:PER .

ex:Joseph_Ratzinger a ex:PER ;
    ex:announce ex:April_8 .

ex:Judaism a ex:LOC .

ex:Kenya_William_Samoei_Ruto a ex:ORG .

ex:Kenyan_PresidentWilliam_Rutoposted a ex:PER .

ex:Kevin_Farrell a ex:PER ;
    ex:announce ex:Monday ;
    ex:bear ex:Dublin ;
    ex:criticize ex:2018 .

ex:Kilmar_Abrego_Garcia a ex:PER .

ex:Kurt_Martens a ex:PER .

ex:LGBT_Catholics_Westminster a ex:ORG .

ex:Larry_Hogancalled a ex:PER .

ex:Latin a ex:LOC .

ex:Latin_America a ex:LOC .

ex:Latin_American a ex:MISC .

ex:Latin_for a ex:MISC .

ex:Latinos a ex:LOC .

ex:Lee a ex:PER .

ex:Life a ex:ORG .

ex:Lisa_Maree_Williams a ex:PER .

ex:Lord_for_the a ex:ORG .

ex:Lori a ex:PER .

ex:Manila_Cathedral a ex:LOC .

ex:Many_Africans a ex:MISC .

ex:Martens a ex:PER .

ex:Martin_Pendergast a ex:PER ;
    ex:tell ex:London .

ex:Mary_McAleese a ex:PER .

ex:Maryland a ex:LOC .

ex:Mass a ex:LOC .

ex:McCarrick a ex:ORG,
        ex:PER .

ex:Mexico a ex:LOC .

ex:Milei a ex:PER .

ex:Mont a ex:PER .

ex:Muslim_Arab a ex:MISC .

ex:N_C a ex:LOC .

ex:New_Delhi a ex:LOC .

ex:Northfield a ex:LOC .

ex:Notably a ex:PER .

ex:Omar_Al a ex:PER .

ex:Opus_Dei a ex:PER .

ex:Palestinian_Christians a ex:MISC .

ex:Palestinians a ex:MISC .

ex:Papal_Basilicas a ex:PER .

ex:Papal_Swiss_Guard a ex:MISC .

ex:Patsy a ex:PER .

ex:Pauline_Chapel a ex:ORG .

ex:People_of_God a ex:ORG .

ex:Peter a ex:PER .

ex:Pius_IX a ex:ORG .

ex:Pontifacts a ex:ORG .

ex:Pontifical_Gregorian_University a ex:ORG .

ex:Pope a ex:LOC,
        ex:PER .

ex:Pope_Benedict a ex:PER .

ex:Pope_Francis a ex:PER .

ex:Pope_Francisin a ex:PER .

ex:Pope_Gregory_X_The a ex:PER .

ex:Pope_John_Paul a ex:PER .

ex:Pope_John_Paul_II a ex:PER .

ex:Pope_Leo_XIII a ex:PER .

ex:Pope_Pius a ex:PER .

ex:PresidentCyril_Ramaphosasaid a ex:PER .

ex:Prophet_Abraham a ex:ORG .

ex:Python a ex:ORG .

ex:Qattaa a ex:PER .

ex:Rand a ex:PER .

ex:Rapho a ex:LOC .

ex:Rashid a ex:PER .

ex:Ratzinger a ex:PER .

ex:Reese a ex:PER .

ex:Reuters a ex:ORG .

ex:Roman a ex:MISC .

ex:Roman_Catholic_Church a ex:MISC .

ex:Roman_Catholicism a ex:MISC .

ex:Rome a ex:LOC .

ex:Ruth_Angelettia a ex:PER .

ex:Sacred_College_of_Cardinals a ex:ORG .

ex:Saint_Peter a ex:PER .

ex:Salvadoran a ex:LOC .

ex:Salvadorian a ex:MISC .

ex:Sanctae_Marthae a ex:PER .

ex:Shiite a ex:PER .

ex:Sistani a ex:PER .

ex:Sistine a ex:MISC .

ex:Sistine_Chapel a ex:ORG .

ex:South_Africa a ex:LOC .

ex:South_Korea a ex:LOC .

ex:Spain a ex:LOC .

ex:Spanish a ex:MISC .

ex:Spanish_Catholic_Center a ex:MISC .

ex:Spencer_Platt a ex:PER .

ex:Square a ex:PER .

ex:St_Peter a ex:LOC,
        ex:ORG,
        ex:PER .

ex:State_Department a ex:ORG .

ex:Stephen_P_Newton a ex:PER ;
    ex:reflect ex:NPR .

ex:Summit a ex:PER .

ex:Supreme_Court a ex:ORG .

ex:Swiss a ex:MISC .

ex:Terry_Gilliam a ex:PER .

ex:Thomas_Reese a ex:PER .

ex:Trump a ex:MISC,
        ex:PER .

ex:Trumpmet a ex:PER .

ex:Truth_Social a ex:ORG .

ex:Typically a ex:PER .

ex:US a ex:LOC .

ex:US_First_Lady_Melania_Trump a ex:ORG .

ex:U_S a ex:LOC .

ex:U_S_Catholic a ex:MISC .

ex:U_S_Mexico a ex:LOC .

ex:United_Arab_Emirates a ex:ORG .

ex:Universal_Shepherd a ex:PER .

ex:University_of_Monterrey a ex:ORG .

ex:University_of_Salamanca a ex:ORG .

ex:Up_First a ex:ORG .

ex:Ur a ex:LOC .

ex:Vance a ex:ORG,
        ex:PER .

ex:Vandeville_Gamma a ex:PER .

ex:Vatican_City_State_Supreme_Court a ex:ORG .

ex:Vatican_Media a ex:ORG .

ex:Vatican_Pool a ex:LOC .

ex:Vatican_Press_Office a ex:ORG .

ex:Vaticanuntil a ex:ORG .

ex:WUNC a ex:ORG .

ex:Washington a ex:LOC .

ex:White a ex:MISC .

ex:White_House a ex:LOC .

ex:Willem_Marx a ex:PER .

ex:William_Lorirecalled a ex:PER .

ex:World_War_II a ex:MISC .

ex:X a ex:MISC .

ex:el_Papa_Francisco a ex:ORG .

ex:likePapabili a ex:LOC .

ex:m_pic_twitter_com_3dPPFoNWBr a ex:LOC .

ex:newspaperLaCroix_International a ex:LOC .

ex:the_College_of_Cardinals ex:lock ex:Vatican .

ex:the_People_of_God ex:attend ex:Monday .

ex:withLee_Yong a ex:PER .

npr:article_1 a ex:Article ;
    ex:content """Pope Francis drives through the crowds during the Inauguration Mass for the Pope in St. Peter's Square on March 19, 2013, in Vatican City, Vatican. The mass was held in front of an expected crowd of up to one million pilgrims and faithful who filled the square and the surrounding streets to see the former Cardinal of Buenos Aires officially take up his role as pontiff.Spencer Platt/Getty Imageshide caption
We'd love to hear about your reflections on Pope Francis and about any experiences you had with him over the years. You can fill out the form belowor via this link, and share your stories, photos, voice memos, etc. An editor may be in touch to learn more."""^^xsd:string ;
    ex:date "2025-04-21T14:25:05-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories"^^xsd:anyURI ;
    ex:title "Do you have memories of Pope Francis to share? Send them our way"^^xsd:string .

npr:article_10 a ex:Article ;
    ex:content """White smoke billows from a chimney at the Sistine Chapel, signaling that cardinal electors have chosen a new pope — Pope Francis — on March 13, 2013.Jeff J Mitchell/Getty Imageshide caption
Pope Francis has died at 88. For more of our coverage head toour latest updates.
The white smoke is famous. When it streams out of a chimney at the Sistine Chapel, it signals that a new pope has been chosen and sets off celebrations among some 1.4 billion Catholics around the world.
Behind the scenes, a mysterious and intensely dramatic process culminates in that smoke — literally. It's created by burning the ballots cardinals just used. White smoke signals that the Roman Catholic Church has a new leader; black smoke means the cardinals will need to vote again.
With the death of Pope Francis, the elaborate mechanism will now begin to decide who sits in power at the Vatican, the seat of the last absolute monarchy in Europe. It centers around the conclave, a gathering whose name stems from the Latin for "with key."
"That actually comes from the 13th century," Bry Jensen, host of the long-runningPontifacts podcast, tells NPR. She says cardinals couldn't agree on a new pope in 1268 and the Church went nearly three years without a pontiff, despite growing frustration outside the cardinals' ranks.
"They locked the cardinals up behind closed doors, and then they put them on water and bread so that they would focus on the essentials," says Kurt Martens, ordinary professor of Canon law at the School of Canon Law at the Catholic University of America.
That initial conclave elected an archdeacon who wasnot an ordained priest, who became Pope Gregory X. The new pope ordered that future Church transitions would begin with a conclave, to avoid long vacancies.
Cardinals in the conclave will be locked away within the Vatican, cut off from the outside world. As they deliberate, news outlets point cameras at the chapel's chimney, and arcane words enter casual conversation, likePapabili, or "pope-able," the term for cardinals with a chance of becoming pope.
Cardinals file into the Sistine Chapel for a conclave to elect a new pope on March 12, 2013 in the Vatican.MAURIX/Gamma-Rapho via Getty Imageshide caption
When a reigning pope dies, an immediate duty falls to the camerlengo, a cardinal whose title translates to "chamberlain." The camerlengo declares the pope is deceased andadministers the Holy Seeuntil a successor is chosen. The current camerlengo is Cardinal Kevin Farrell, the first American in that post.
You might have heard that the camerlengo uses a silver hammer to tap a pope's forehead three times, to ascertain whether he's alive. The practice has become a matter of legend, Martens says.
"The last time that that ritual was used was in 1878 when Pius IX died," he says, "but that's not done anymore."
Funeral rites for the late pope are held for nine days, as he is mourned and celebrated. Conclaves must begin within 15 to 20 days after a pope dies or resigns.
Upon the pope's death, the dean of the College of Cardinals calls the cardinal electors to the Vatican. There are currently135 of them. To join the conclave, cardinals must be under 80 years old.
During the conclave, the cardinals live in theDomus Sanctae Marthae, a hotel-like facility next to St. Peter's Basilica.It's where Pope Francis opted to live, rather than in the Apostolic Palace's papal apartments. The residence has beencompared to a three-star hotel.
"I've eaten there. I must say I'd rather go to a nice Roman restaurant than eat there," Martens says. But, he adds, that's part of the point.
"You don't want to make it more than a three-star hotel. Because you don't want the Cardinals to get comfortable," but instead to focus on electing a pontiff.
The rituals take place according torules popes have refined over the centuries, clarifying the timeframe and obligations. But the conclave itself must be obscured by "total secrecy," as Pope John Paul II wrote. Cardinal electors must sign an oath of secrecy and seclusion, under threat of excommunication.
That's why the process intrigues so many people, says Gregg Gassman, a librarian who edits the Pontifacts podcast.
"Some of the mystery does come from the closed nature of the conclave itself," he says. "It's fascinating."
Once the cardinals are gathered, the dean of the College of Cardinals presides over a mass. The group then walks together from the Pauline Chapel to the Sistine Chapel, singing hymns invoking the Holy Spirit.
In the Sistine Chapel, the conclave swears a secrecy oath in Latin, touching the Holy Gospels.
"When that ceremony is over, you have the papal master of ceremonies who in a dramatic way says,Extra omnes," Martens says.  "Roughly, it means, 'Get the hell out of here, all you who don't belong here,' meaning only the cardinal electors can remain."
Outside the chapel, the famed Papal Swiss Guard stands guard.
Swiss Guards line up in front of St. Peter's Basilica at the Vatican, Wednesday, Dec. 25, 2024. (AP Photo/Andrew Medichini)Andrew Medichini/AP/APhide caption
"That's when the doors are locked, that's when the verbal and communicative gates go down," Jensen says. "AfterExtra omnes, there is no further communication until a pope has been elected, aside from smoke."
"There's only one round that first evening, and then you will see black smoke or white smoke," Martens says.
Typically, he says, the first round is merely an indication of the cardinals' priorities. On the following day, the conclave starts holding two rounds of voting each morning, and another two in the afternoon.
After each vote, a needle is pushed through the ballots, binding them together.  If no winner emerges with a two-thirds majority, the two packages are "put together in that stove that is in the corner of the Sistine Chapel, to burn them and produce whatever smoke needs to be produced — white or black," Martens says.
A close up of the stove in which the cardinals will burn their ballots during the election for a new pontiff in the Sistine chapel in Vatican City shown October 1978. At right a container with chemicals to produce white and black smoke in the time of burning to say if a new pope was named or not. (AP Photo)ASSOCIATED PRESS/APhide caption
The Church once used wet straw or dry straw to produce the right color, but to avoid confusion, the process now relies on chemicals.
The cardinals will continue to pray and contemplate — and vote — until a new pope is elected.
"All of the conclaves from the 1900s onwards have been under four days," Jensen says.
Pope Francis presides over a Mass at St. Peter's Square on Feb. 9, 2025 in Vatican City. During the Mass, the Pope asked his master of ceremonies to continue reading his homily for him, as he became short of breath.Franco Origlia/Getty Imageshide caption
Francis was elected pope on the conclave's second day, for instance.
After a successful vote, the winning candidate is asked two questions. The first is whether they accept their election as pope.
"And then the second question is going to be, 'What name do you choose?' And then the name is chosen," Martens says.
Before Pope Francis was elected, many of the faithful in Buenos Aires knew their archbishop as simply "Father Jorge," asNPR reported in 2013.
Official documents are filled out, and the new pope is taken into a sacristy, to be fitted with papal attire.
"There are typically three sets of vestments ready," in sizes roughly equal to small, medium and large, Martens says.
Soon afterward, the senior cardinal deacon will appear on the balcony over St. Peter's Square, announcing,Habemus Papam!— "We have a pope!"
It will then be the new pope's turn to emerge onto the balcony and deliver his first blessing."""^^xsd:string ;
    ex:date "2025-04-21T06:38:53-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/nx-s1-5304054/conclave-pope-chosen-francis-dies-white-black-smoke"^^xsd:anyURI ;
    ex:title "Who will be the next pope? Here's how the conclave works"^^xsd:string .

npr:article_2 a ex:Article ;
    ex:content """People attend an interfaith memorial meeting to mourn the death of Pope Francis in New Delhi, India, on Monday.Imtiyaz Khan/Anadolu via Getty Imageshide caption
Catholics across the globe are mourningthe death of Pope Francis, remembering him for his humility, generosity of spirit, concern for the poor, and steadfast efforts to restore trust in the church after years of scandal.
Francis died early Monday in Rome at the age of 88, just one day after Easter Sunday. His death marks the end of a 12-year papacy that began in 2013 following the historic resignation of Benedict XVI — the first pontiff to step down in nearly six centuries.
The Vatican announced that the pope's body would be placed in a coffin on Monday evening, with Cardinal Kevin Farrell presiding over the rite in the chapel of Casa Santa Marta. The Dublin-born cardinal is theacting head of the Vaticanuntil a new pope is elected.
Outside St. Peter's Basilica in Vatican City, American tourists were among those mourning Francis' death, including Doug Rand and his wife, Ruth Angelettia from Gallatin Gateway, Mont.
A digital screen shows an image of Pope Francis in Saint Peter's Square on Monday in Vatican City.Alessia Pierdomenico/Bloomberg via Getty Imageshide caption
Rand described the pope as someone who worked "right up to the last day" for "the little guy and the poor and the disadvantaged and the abused."
Bianca Lott, from Northfield, Minn., is studying abroad in Rome for her spring semester. Given that Francis died on Easter Monday, she said she felt "a strange happiness at the timing," which she called "poetic."
BaltimoreArchbishop William Lorirecalled thepope's final appearancegreeting crowds in St. Peter's Square on Easter, just hours before his death. "It was as if to say farewell to the People of God whom he loved so dearly and served so devotedly," Lori said in a statement. "Throughout his pontificate, Pope Francis showed deep compassion for the poor and marginalized, uplifting the voices of migrants, the sick, the elderly, and victims of injustice."
Former Maryland Gov. Larry Hogancalled Francis"a truly extraordinary leader of the Church — humble, gracious, and deeply prayerful."
DetroitArchbishop Edward J. Weisenburger shared: "My heart is heavy as our world has lost a powerful, prophetic, and loving voice. Yet I rejoice in what I pray is a blessed reward of joy beyond all understanding for a truly great and loving Universal Shepherd."
The Rev. Stephen P. Newton, executive director of the Association of United States Catholic Priests, reflected in an email to NPR: "While we will miss his beautiful, often smiling presence, his example will continue to inspire us to become the church Jesus intended: one that is open and deeply listens to the movement of the Divine Spirit within us, our earth, and the universe."
On itswebsite, Opus Dei, the conservative Catholic organization, offered prayers "to the Lord for the soul of our beloved Pope Francis," adding, "God will have rewarded his generous dedication to the service of the People of God and the whole world."
People pray during a service for Pope Francis in Buenos Aires Cathedral on Monday.Cristina Sille/Picture Alliance via Getty Imageshide caption
Francis, the first-ever Latin American pope, once served as archbishop in Buenos Aires. In the Argentine capital, the government declared seven days of mourning and citizens gathered for a special mass at the city's cathedral,Reuters reports.
The pope also touched the lives of many Latinos around the world by communicating with them in Spanish. Hatciri Lopez, a lifelong Catholic from rural Johnston County, N.C., told NPR member station WUNC that Francis grew her faith.
"It's just easier for the message to get to your heart, instead of hearing it from a translator," she said. "Just as soon as I heard him speak, it would just strike my heart right away. I would just want to cry and just feel a sense of happiness and hope for the future."
In London, Martin Pendergast, secretary of LGBT Catholics Westminster,told The Associated Pressthat Francis was "the first pope to actually use the word 'gay,' so even the way he speaks has been a radical transformation — and some would say a bit of a revolution as well — compared with some of his predecessors."
In South Korea,political and religious leaders remembered the popefor his compassion toward the victims of conflict and disaster.
On a visit to South Korea in 2014, Francis met withLee Yong-soo, who was forced into sexual servitude by the Japanese military during World War II.
"He must have gone to a good place," Lee, now 96, said of the pontiff following his death.
In besieged Gaza, where more than 50,000 Palestinians have been killed in more than 18 months of war with Israel, Christians there were deeply moved by Pope Francis' nightly phone call to them offering comfort amid the frightening conflict. The Rev. Gabriel Romanelli of the Holy Family Church said the pope's final call came the night before his death, according toReuters.
Members of the clergy hold Mass for late Pope Francis at the Holy Family Church in Gaza City on Monday. Palestinian Christians in Gaza mourned the death of Pope Francis, who had maintained close and consistent contact with the besieged territory's small Christian community throughout the ongoing war with Israel.Omar Al-Qattaa/AFP via Getty Imageshide caption
"We lost a saint who taught us every day how to be brave, how to keep patient and stay strong," George Antone, at the Holy Family Church in Gaza, told the news agency.
Francis focused on outreach to the overwhelmingly Muslim Arab world during his papacy, making groundbreaking visits to Iraq and the United Arab Emirates.
In 2021, Francis was the first pope in history to travel to Iraq, meeting the revered Shiite spiritual leader Grand Ayatollah Ali al Sistani and visiting Ur, the reputed birthplace of the Prophet Abraham, known as the patriarch of Judaism, Christianity and Islam.
In apost on X, Iraqi President Abdul Latif Rashid was also amongthose to mourn the pope's death, calling him "a remarkable religious and humanitarian leader whose life was devoted to promoting peace, alleviating poverty, and fostering interfaith tolerance."
"His humanitarian stance against war and violence, and his continuous calls for peace and coexistence, will leave an indelible impact on the world," Rashid wrote.
Willem Marx, Anthony Kuhn and Jane Arraf contributed reporting."""^^xsd:string ;
    ex:date "2025-04-21T14:05:36-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61719/pope-francis-death-world-reacts"^^xsd:anyURI ;
    ex:title "Pope Francis is remembered around the world for his generosity of spirit"^^xsd:string .

npr:article_3 a ex:Article ;
    ex:content """The funeral of Pope John Paul II at Saint Peter's Basilica in Rome, Italy on April 8, 2005.Eric Vandeville/Gamma-Rapho via Getty Imageshide caption
This is a developing story. For more of our coverage head toour latest updates.
Pope Francis' death on Monday sets in motion weeks-long series of events, from a period of mourning to the process of selecting his successor.
The Vatican has an intricate set of rules governing the papal transition, a process the world doesn't get to watch unfold very often.
Pope Francis was chosen for the job in March 2013, several weeks after Pope Benedict XVIresigned from his post— a rare move that he blamed on his advanced age and diminishing strength. He died at age 95 inDecember 2022.
The most recent pope to die in office was the previous pontiff, Pope John Paul II. Hedied in April 2005at age 84, after 26 years in the papacy.
Here's what happened after:
1. The pope's death is confirmed
The pope's death is supposed to be immediately verified and communicated to relevant parties, as Father Thomas Reese, then-editor in chief ofAmerica,told NPR in 2005.
The prefect of the papal household tells the camerlengo — in 2025, that'sCardinal Kevin Farrell— who must verify the pope's death in the presence of the papal master of ceremonies, the cleric prelates of the Apostolic Camera and the secretary of the Apostolic Camera, who draws up a death certificate.
Then the camerlengo and prefect of the papal household pass the news to various officials in the Vatican, who relay it to the people of Rome and the heads of nations.
"Although this is the formal procedure, in fact most people will first hear of the death of the pope from the media," Reese said.
John Paul II died at 9:37 p.m. on April 2, 2005 — six days after Easter. The Vatican Press Office informed journalists of his death via emailexactly 17 minutes later.
Cardinal Giovanni Battistaannounced the pope'sdeath to tens of thousands of people who had gathered for a vigil in St. Peter's Square in Vatican City. The crowd grew silent, and some people clapped in tribute.
2. The papal apartments are locked
The camerlengo locks and seals the pope's apartment.
While looting ("by staff, the cardinals or the Roman populace") was a concern in the past, modern popes are more concerned that their private papers stay out of the wrong hands.
The camerlengo destroys the pope's fisherman's ring and seal — traditionally with a special hammer,per Britannica— to symbolize the end of his reign and prevent misuse, like forging documents.
John Paul II's ring and seal were destroyed in asymbolic ritualon April 16, 2005 — at the end of the nine-day mourning period and before the start of the conclave.
3. The mourning period lasts nine days
Thousands of people wait in line at the St Peter's Basilica to view the body of Pope John Paul II on April 6, 2005 in Vatican City.Joe Raedle/Getty Imageshide caption
A pope's death is followed by a nine-day mourning period called the novemdiales. The cardinals arrange for the funeral rites to be observed during this time.
This is also when a pope lies in state. Approximately1 millionmourners visited John Paul II's body as it lay in state — first in the Apostolic Palace for staff and then St. Peter's Basilica for public viewing — for several days before his funeral on April 8.
According to Reese, the date for the funeral and burial is set by the College of Cardinals, but the apostolic constitution states it is to "take place, except for special reasons, between the fourth and sixth day after death."
Francis wrote in his2025 autobiographythat he found the customary service "excessive."
"So I have arranged with the master of ceremonies to lighten it: no catafalque, no ceremony for the closure of the casket, nor the deposition of the cypress casket into a second of lead and a third of oak," he wrote.
4. Burial occurs within four to five days
The camerlengo is tasked with arranging the funeral in accordance with instructions the pope leaves behind.
John Paul II's funeral took place six days after his death in Saint Peter's Square, at 10 a.m. local time on April 8.
The three-hour ceremony was conducted by the dean of the Sacred College of Cardinals, Cardinal Joseph Ratzinger, with help from some 164 cardinals.
About 2 million people watched online, and thelong list of dignitarieswho attended in person included four kings, five queens and at least 70 presidents and prime ministers, according toDemocracyNow.
John Paul II wasburied immediately afterwardin the crypt of St. Peter's Basilica in the Vatican. Pope Francis wrote in his memoir that he has a different final resting place in mind"
"When it happens, I will not be buried in Saint Peter's but at Santa Maria Maggiore," he wrote, referring to one of the four Papal Basilicas in Rome. "The Vatican is the home of my last service, not my eternal home."
5. The conclave choses the next pope
White smoke vents up from the chimney of the Sistine Chapel on April 19, 2005, meaning that Catholic Church cardinals elected a new leader after a conclave lasting little more than 24 hours.Andreas Solaro/AFP via Getty Imageshide caption
The camerlengo is the acting head of the Vatican until the next pope is chosen and organizes the election process, which is called the conclave.
All cardinals under 80 years of age when the pope dies have the right to vote for the next pope — 115 of them voted in 2005. The process involvesmultiple rounds of votingover several days.
The conclave traditionally begins 15 to 20 days after the pope's death (the College of Cardinals sets the date and time). All of the conclaves since the 1900s have lasted less than four days — the last conclave to run more than five days was in 1831, and it lasted for 54.
The election takes place in the Sistine Chapel and is completely secret. But the public can watch the chimney for hints — black smoke means the cardinals will need to vote again; white smoke means a new pope has been chosen.
In 2005,the conclavebegan on the afternoon of April 18 — 16 days after the pope's death, and 10 days after his funeral. It lasted just two days, ending on April 19 when Cardinal Joseph Ratzinger was elected after just four ballots.
After the vote, the winning candidate is asked two questions: Do they accept their election, and what name will they chose? Then official documents are filled out, the new pope is fitted with papal attire — there are typically three sets of garments at the ready — and the news is announced to the public.
The senior cardinal deacon appears on the balcony over St. Peter's Square and announces "Habemus Papam!"— "We have a pope!"
Ratzinger was announced as Pope Benedict XVI on April 19 and installed as pope with Mass on April 24. He made his first foreign trip, to his native Germany, in August of that year."""^^xsd:string ;
    ex:date "2025-04-21T13:14:58-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61909/pope-death-funeral-conclave-timeline"^^xsd:anyURI ;
    ex:title "What happens next after a pope dies, according to recent history"^^xsd:string .

npr:article_4 a ex:Article ;
    ex:content """People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China's Guangdong province.Jade Gao/AFP via Getty Imageshide caption
As the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm its interests.
China's commerce ministry says it respects the efforts of others to try to resolve trade disputes with the U.S. through consultation. But it warns that it will take "corresponding countermeasures" if any deals are struck at the expense of China's interests. It did not give details.
The comments come after reports that Trump is hoping touse tariff negotiations with other countriesto isolate China. At the same time, Trump has said he wants to do a deal with Beijing. This month he raised the base tariff on Chinese imports to a dizzying 145%. China responded in kind, with high tariffs on U.S. goods.
The Chinese commerce ministry says seeking tariff exemptions by harming the interests of others will only lead to failure on both sides and ultimately hurt everyone."""^^xsd:string ;
    ex:date "2025-04-21T11:08:30-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61668/china-tariffs-trump-trade"^^xsd:anyURI ;
    ex:title "China warns of 'countermeasures' against any deals that harm its interests"^^xsd:string .

npr:article_5 a ex:Article ;
    ex:content """Cardinal Kevin Farrell, Camerlengo of the Apostolic Chamber, announced the death of Pope Francis from the Casa Santa Marta in Vatican City on Monday.Vatican Pool/Getty Imageshide caption
Cardinal Kevin Farrell, who announcedPope Francis' deathon Monday morning, is now the acting head of the Vatican until a new pope is elected.
There's a name for the person with that job: the camerlengo.
Accordingto Britannica, the cardinal camerlengo in Roman Catholicism is a key dignitary of the Vatican who is personally appointed by the pope and tasked with "a specific series of functions in the crucial time of transition from one pope to his successor."
Those tasks include verifying the pope's death, destroying the late pope'ssymbolic fisherman ringandpreparing the conclave, the process by which a new pope is elected.
Farrell, a Dublin-born, naturalized U.S. citizen, held a series of positions at the Vatican before Pope Francis nominated him as the camerlengo in 2019. Here's what to know about him.
Farrell, 77, was born in September, 1947 in Dublin, and after completing secondary school went on to attend the University of Salamanca in Spain and the Pontifical Gregorian University in Rome.
He was ordained a priest on Dec. 24, 1978, and began his career serving as chaplain at the University of Monterrey in Mexico. He moved to the U.S. to join the Archdiocese of Washington in 1984, according to hisVatican biography.
Farrell held a series of positions in several parishes in the area, including director of the Spanish Catholic Center, acting executive director of the Catholic Charitable Organizations and secretary for financial affairs.
Pope John Paul II appointed Farrell as an auxiliary bishop of Washington in 2001. He served as moderator of the curia and chief vicar general until 2007, when he was appointed bishop of Dallas.
In 2016, Pope Francis appointed Farrell as the prefect of the newly established Dicastery for Laity, Family and Life.
"My administrative assistant, came in and said, 'The Pope's on the telephone, and I felt like saying, 'Yeah, yeah,'" Farrell said at a press conference at the time, according to the localNBC affiliate. "Eventually she did put on the Pope, and he told me that he would like me to go to Rome because Dallas needed a much better Bishop than I am."
The pope named Farrell a cardinal later that same year, and continued to elevate him to positions in the Vatican.
He was nominated as camerlengo in 2019, appointed as president of the Commission for Confidential Matters in 2020 and appointed as president of the Vatican City State Supreme Court effective January 2024.
Farrell has been in close proximity to scandal — and scandalous figures — during his career.
Notably, from 2002 to 2006, heworked and lived withTheodore McCarrick, a once-powerful Catholic cardinal who wasdefrocked by Pope Francisin 2019 after a Vatican investigation determined he had molested adults and children.
After those allegations came to light in 2018, Farrell publicly said he had not known or suspected anything about McCarrick's behavior.
Also in 2018, Farrell was criticized for allegedly barring a group called Voices of Faith from holding its fourth annual Women's Day event inside the Vatican.
Some people, includingmembers of the group, believed the reason was that several of the would-be speakers — including former Irish President Mary McAleese — openly supported same-sex marriage, among other issues.
When asked about the controversy at an unrelated event weeks later, Farrell did not go into much detail about the reason behind his decision.
"Having been told subsequently that Ididsponsor that event and having been told subsequently what the event was about, it was not appropriate for me to continue to sponsor such an event," he said, according to the French newspaperLaCroix International. "Obviously, when I withdrew the sponsorship of the event it couldn't be inside the Vatican."
Farrell hassaid publiclythat while the church cannot bless same-sex unions, that no one should be excluded from the "pastoral care and love of the Church."
Farrell's position as camerlengo doesn't inherently disqualify or prime him for the position of pope.
TheTimesreportsthat only two camerlengos have been elected pope before: Gioacchino Pecci, as Pope Leo XIII in 1878, and Eugenio Pacelli, as Pope Pius XII in 1939.
There has never been a pope from Ireland or the U.S."""^^xsd:string ;
    ex:date "2025-04-21T10:34:12-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61662/kevin-farrell-camerlengo-vatican-pope"^^xsd:anyURI ;
    ex:title "Who is Cardinal Kevin Farrell, the acting head of the Vatican?"^^xsd:string .

npr:article_6 a ex:Article ;
    ex:content """Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption
President Trump has acknowledged the pope's death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"
Trump and Francis clashed repeatedly in recent years.
Trump praised the pope at the start of Francis's papacy, in 2013, several years before Trump reached the White House.
"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.
Things soured soon after. During the 2016 election, Francisroundly criticizedTrump's campaign proposal to build a wall on the U.S.-Mexico border.
"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.
Trump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person's faith is disgraceful."
"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS's ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.
Trumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.
Look at their faces.pic.twitter.com/0t84cBX8bZ
Nearly a decade later, amidst the second Trump administration's crackdown on immigration, the pope once again made a rare public rebuke of the president's policies.
In apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."
He said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."
"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.
The letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.
"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote."""^^xsd:string ;
    ex:date "2025-04-21T09:13:28-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61636/trump-pope-francis"^^xsd:anyURI ;
    ex:title "A brief history of Trump's feud with Pope Francis"^^xsd:string .

npr:article_7 a ex:Article ;
    ex:content """Pope Francis meets with newly elected Argentinian President Javier Milei before a Canonization Ceremony in St. Peter's Basilica on Feb. 11, 2024 in Vatican City, Vatican.Vatican Pool/Getty Imageshide caption
Argentina's president sent profound condolences to the family of Pope Francis and to all Catholics ina messageposted to X from the pontiff's homeland.
Javier Milei, a far-right libertarian who stridently defends free markets, acknowledged his and the pope's differing viewpoints.
"Despite differences that seem minor today, having been able to know him in his goodness and wisdom was a true honor for me," Milei added on X. "I bid farewell to the Holy Father and stand with all of us who are today dealing with this sad news."
ADIÓSCon profundo dolor me entero esta triste mañana que el Papa Francisco, Jorge Bergoglio, falleció hoy y ya se encuentra descansando en paz. A pesar de diferencias que hoy resultan menores, haber podido conocerlo en su bondad y sabiduría fue un verdadero honor para mí.…pic.twitter.com/3dPPFoNWBr
During the 2023 presidential race, then-candidate Milei had decried the pope, calling him an "imbecile" who defended social justice and equating him to evil and the devil.
However, once in office, Milei softened his tone, even visiting the Vatican to meet with Francis.
Francis was born in Buenos Aires in 1936 as Jorge Bergoglio. His parents were Italian immigrants and as a boy he learned Italian, but Spanish was dominant in his home. He rose to be the Archbishop of Buenos Aires, and many in Argentina lament that he never came home to visit as pope.
Mass will be held today in his honor in the capital's cathedral where he presided. According to the newspaper Clarin, the country will observe seven days of mourning."""^^xsd:string ;
    ex:date "2025-04-21T08:32:37-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61624/argentina-milei-critic-francis-condolences"^^xsd:anyURI ;
    ex:title "Argentina's president, a former critic of Pope Francis, offers his condolences"^^xsd:string .

npr:article_8 a ex:Article ;
    ex:content """Pope Francis meets with president of Kenya William Samoei Ruto during the G7 Leaders Summit on day two of the 50th G7 summit at Borgo Egnazia on June 14, 2024 in Fasano, Italy.Vatican Media via Vatican Pool/Getty Imageshide caption
On Monday,Kenyan PresidentWilliam Rutoposted on X that Francis "exemplified servant leadership through his humility, his unwavering commitment to inclusivity and justice, and his deep compassion for the poor and the vulnerable."
In neighboring Ethiopia, Prime MinisterAbiy Ahmedalso turned to social media to mourn the pope, saying "may his legacy of compassion, humility, and service to humanity continue to inspire generations to come."
In South Africa, PresidentCyril Ramaphosasaid in a statement that Pope Francis was "a spiritual leader who sought to unite humanity and wished to see a world governed by fundamental human values.
Pope Francis, an Argentine, was notable as the first pontiff from the Global South. Many Africans will be watching as the Vatican decides on his successor, hoping for the first African pope."""^^xsd:string ;
    ex:date "2025-04-21T08:07:03-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61618/leaders-in-africa-mourn-the-passing-of-pope-francis"^^xsd:anyURI ;
    ex:title "Leaders in Africa mourn the passing of Pope Francis"^^xsd:string .

npr:article_9 a ex:Article ;
    ex:content """Good morning. You're reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day.
Pope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in over a millennium and was one of themost popular popes in decades. He was elected to his exalted post in 2013 and cast an image of humility during years of strain and change, within his church and worldwide.
Pope Francis waves to thousands of followers as he arrives at the Philippines' Manila Cathedral on Jan. 16, 2015. During his papacy, Francis strove to reach out to what he called the "periphery" of the world in Asia, Africa and Latin America.Lisa Maree Williams/Getty Imageshide caption
A big draw to Pope Francis was his personal story.He was the son of immigrants and grew up in Argentina, where he lived through turbulent times. Francis, born Jorge Mario Bergoglio, was the first pope from Latin America.
Four House Democrats were scheduled to land in El Salvador today to demand the release and return of Kilmar Abrego Garcia, a Salvadorian citizen who lived in Maryland. He was deported to a Salvadoran prison due to an "administrative error," according to the Trump administration. The lawmakerssaid in a statementthey hope "to pressure" the White House "to abide by a Supreme Court order."
The State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.Mandel Ngan/APhide caption
The State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.
NPR has learned that the Trump administration is substantiallyscaling back the State Department's annual reportson international human rights to remove critiques of abuses such as harsh prison conditions and government corruption. These reports are intended to guide congressional foreign aid and security assistance decisions. Moving forward, the State Department will no longer call out governments for restricting freedom of movement and peaceful assembly. Additionally, the reports will not condemn the detention of political prisoners without due process or the limitations placed on free and fair elections.
nasal sprayDDurrich/iStockphoto/Getty Imageshide caption
Living Better is aspecial seriesabout what it takes to stay healthy in America.
It is not just in your head; seasonal allergies are getting worse every year. This is due to the warming from climate change, making the pollen season longer. Luckily, there are ways to keep the pollen from taking over your life. Here aretips from doctorson how to get relief:
King Arthur (Graham Chapman) and his servant Patsy (Terry Gilliam) encounter The Black Knight (John Cleese)Monty Python and the Holy Grail/Fathom Entertainmenthide caption
This newsletter was edited byYvonne Dennis."""^^xsd:string ;
    ex:date "2025-04-21T07:19:41-04:00"^^xsd:string ;
    ex:sourceURL "https://www.npr.org/2025/04/21/g-s1-61597/up-first-newsletter-pope-francis-dies-house-democrats-el-salvador"^^xsd:anyURI ;
    ex:title "Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return"^^xsd:string .

ex:Dublin a ex:LOC .

ex:Easter a ex:PER .

ex:Easter_Sunday a ex:PER .

ex:Gaza a ex:LOC .

ex:Jorge_Bergoglio a ex:PER .

ex:London a ex:LOC .

ex:Santa_Maria_Maggiore a ex:LOC .

ex:Vatican a ex:LOC .

ex:his2025 a ex:MISC .

ex:NPR a ex:ORG .

ex:Vatican_City a ex:LOC .


import networkx as nx


G = nx.DiGraph()

# Choose only a couple of articles
target_articles = {"article_1", "article_2"}
max_node_length = 20  # short label names

for subj, pred, obj in g:
    subj_label = subj.split("/")[-1][:max_node_length]
    pred_label = pred.split("/")[-1][:max_node_length]
    obj_label = obj.split("/")[-1][:max_node_length] if isinstance(obj, URIRef) else str(obj)[:max_node_length]

    # Keep only nodes related to article_1 or article_2
    if not any(a in subj_label for a in target_articles) and not any(a in obj_label for a in target_articles):
        continue

    # Skip content/text relations
    if pred_label in {"content", "title", "sourceURL"}:
        continue

    G.add_node(subj_label)
    G.add_node(obj_label)
    G.add_edge(subj_label, obj_label, label=pred_label)

# Draw the simplified graph
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, k=0.8, seed=42)

nx.draw_networkx_nodes(G, pos, node_color="#AED6F1", node_size=1800)
nx.draw_networkx_edges(G, pos, arrows=True, edge_color="gray")
nx.draw_networkx_labels(G, pos, font_size=10)

edge_labels = nx.get_edge_attributes(G, 'label')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=9)

plt.title("Simplified RDF Graph (First 2 Articles)", fontsize=14)
plt.axis("off")
plt.show()

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 129504 (\N{BRAIN}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)


from rdflib.plugins.sparql import prepareQuery

# List all people
query_persons = prepareQuery("""
  SELECT ?entity WHERE {
    ?entity a <http://example.org/PER> .
  }
""")

print("Persons found in the graph:")
for result in g.query(query_persons):
    print(result)

# Extract subject-verb-object triples with URI pattern
query_triples = prepareQuery("""
  SELECT ?sub ?rel ?obj WHERE {
    ?sub ?rel ?obj .
    FILTER(STRSTARTS(STR(?sub), "http://example.org/")) .
    FILTER(STRSTARTS(STR(?rel), "http://example.org/")) .
    FILTER(STRSTARTS(STR(?obj), "http://example.org/")) .
  }
""")

print("\nTriples with example.org URIs:")
for result in g.query(query_triples):
    print(result)

# Search for keyword in article contents
keyword = "Trump"
query_keyword = prepareQuery(f"""
  SELECT ?doc ?text WHERE {{
    ?doc a <http://example.org/Article> ;
         <http://example.org/content> ?text .
    FILTER(CONTAINS(LCASE(STR(?text)), LCASE("{keyword}")))
  }}
""")

print(f"\nArticles mentioning '{keyword}':")
for result in g.query(query_keyword):
    print(result)

Persons found in the graph:
(rdflib.term.URIRef('http://example.org/Getty_Imageshide'),)
(rdflib.term.URIRef('http://example.org/Pope_Francis'),)
(rdflib.term.URIRef('http://example.org/Spencer_Platt'),)
(rdflib.term.URIRef('http://example.org/Abdul_Latif_Rashid'),)
(rdflib.term.URIRef('http://example.org/Anthony_Kuhn'),)
(rdflib.term.URIRef('http://example.org/Bianca_Lott'),)
(rdflib.term.URIRef('http://example.org/Casa_Santa_Marta'),)
(rdflib.term.URIRef('http://example.org/Cristina_Sille'),)
(rdflib.term.URIRef('http://example.org/Divine_Spirit'),)
(rdflib.term.URIRef('http://example.org/Doug_Rand'),)
(rdflib.term.URIRef('http://example.org/Easter'),)
(rdflib.term.URIRef('http://example.org/Easter_Sunday'),)
(rdflib.term.URIRef('http://example.org/Edward_J_Weisenburger'),)
(rdflib.term.URIRef('http://example.org/Francis'),)
(rdflib.term.URIRef('http://example.org/Gabriel_Romanelli'),)
(rdflib.term.URIRef('http://example.org/Gallatin_Gateway'),)
(rdflib.term.URIRef('http://example.org/George_Antone'),)
(rdflib.term.URIRef('http://example.org/God'),)
(rdflib.term.URIRef('http://example.org/Grand_Ayatollah_Ali'),)
(rdflib.term.URIRef('http://example.org/Hatciri_Lopez'),)
(rdflib.term.URIRef('http://example.org/Imtiyaz_Khan'),)
(rdflib.term.URIRef('http://example.org/Jane_Arraf'),)
(rdflib.term.URIRef('http://example.org/Jesus'),)
(rdflib.term.URIRef('http://example.org/Johnston_County'),)
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'),)
(rdflib.term.URIRef('http://example.org/Larry_Hogancalled'),)
(rdflib.term.URIRef('http://example.org/Lee'),)
(rdflib.term.URIRef('http://example.org/Lori'),)
(rdflib.term.URIRef('http://example.org/Martin_Pendergast'),)
(rdflib.term.URIRef('http://example.org/Mont'),)
(rdflib.term.URIRef('http://example.org/Omar_Al'),)
(rdflib.term.URIRef('http://example.org/Opus_Dei'),)
(rdflib.term.URIRef('http://example.org/Qattaa'),)
(rdflib.term.URIRef('http://example.org/Rand'),)
(rdflib.term.URIRef('http://example.org/Rashid'),)
(rdflib.term.URIRef('http://example.org/Ruth_Angelettia'),)
(rdflib.term.URIRef('http://example.org/Saint_Peter'),)
(rdflib.term.URIRef('http://example.org/Shiite'),)
(rdflib.term.URIRef('http://example.org/Sistani'),)
(rdflib.term.URIRef('http://example.org/Stephen_P_Newton'),)
(rdflib.term.URIRef('http://example.org/Universal_Shepherd'),)
(rdflib.term.URIRef('http://example.org/Willem_Marx'),)
(rdflib.term.URIRef('http://example.org/William_Lorirecalled'),)
(rdflib.term.URIRef('http://example.org/withLee_Yong'),)
(rdflib.term.URIRef('http://example.org/Andreas_Solaro'),)
(rdflib.term.URIRef('http://example.org/Giovanni_Battistaannounced'),)
(rdflib.term.URIRef('http://example.org/Here'),)
(rdflib.term.URIRef('http://example.org/Joe_Raedle'),)
(rdflib.term.URIRef('http://example.org/John_Paul'),)
(rdflib.term.URIRef('http://example.org/John_Paul_II'),)
(rdflib.term.URIRef('http://example.org/Joseph_Ratzinger'),)
(rdflib.term.URIRef('http://example.org/Papal_Basilicas'),)
(rdflib.term.URIRef('http://example.org/Peter'),)
(rdflib.term.URIRef('http://example.org/Pope_Benedict'),)
(rdflib.term.URIRef('http://example.org/Pope_John_Paul'),)
(rdflib.term.URIRef('http://example.org/Pope_John_Paul_II'),)
(rdflib.term.URIRef('http://example.org/Ratzinger'),)
(rdflib.term.URIRef('http://example.org/Reese'),)
(rdflib.term.URIRef('http://example.org/Square'),)
(rdflib.term.URIRef('http://example.org/St_Peter'),)
(rdflib.term.URIRef('http://example.org/Thomas_Reese'),)
(rdflib.term.URIRef('http://example.org/Vandeville_Gamma'),)
(rdflib.term.URIRef('http://example.org/Jade_Gao'),)
(rdflib.term.URIRef('http://example.org/Trump'),)
(rdflib.term.URIRef('http://example.org/Accordingto_Britannica'),)
(rdflib.term.URIRef('http://example.org/Bishop'),)
(rdflib.term.URIRef('http://example.org/Camerlengo'),)
(rdflib.term.URIRef('http://example.org/Eugenio_Pacelli'),)
(rdflib.term.URIRef('http://example.org/Farrell'),)
(rdflib.term.URIRef('http://example.org/Gioacchino_Pecci'),)
(rdflib.term.URIRef('http://example.org/Mary_McAleese'),)
(rdflib.term.URIRef('http://example.org/McCarrick'),)
(rdflib.term.URIRef('http://example.org/Notably'),)
(rdflib.term.URIRef('http://example.org/Pope'),)
(rdflib.term.URIRef('http://example.org/Pope_Francisin'),)
(rdflib.term.URIRef('http://example.org/Pope_Leo_XIII'),)
(rdflib.term.URIRef('http://example.org/Pope_Pius'),)
(rdflib.term.URIRef('http://example.org/Alessandra_Tarantino'),)
(rdflib.term.URIRef('http://example.org/Donald_Trump'),)
(rdflib.term.URIRef('http://example.org/God_Bless'),)
(rdflib.term.URIRef('http://example.org/Trumpmet'),)
(rdflib.term.URIRef('http://example.org/Vance'),)
(rdflib.term.URIRef('http://example.org/Clarin'),)
(rdflib.term.URIRef('http://example.org/Javier_Milei'),)
(rdflib.term.URIRef('http://example.org/Jorge_Bergoglio'),)
(rdflib.term.URIRef('http://example.org/Milei'),)
(rdflib.term.URIRef('http://example.org/Ahmedalso'),)
(rdflib.term.URIRef('http://example.org/Kenyan_PresidentWilliam_Rutoposted'),)
(rdflib.term.URIRef('http://example.org/PresidentCyril_Ramaphosasaid'),)
(rdflib.term.URIRef('http://example.org/Summit'),)
(rdflib.term.URIRef('http://example.org/Arthur'),)
(rdflib.term.URIRef('http://example.org/Black_Knight'),)
(rdflib.term.URIRef('http://example.org/Dennis'),)
(rdflib.term.URIRef('http://example.org/Graham_Chapman'),)
(rdflib.term.URIRef('http://example.org/John'),)
(rdflib.term.URIRef('http://example.org/Jorge_Mario_Bergoglio'),)
(rdflib.term.URIRef('http://example.org/Kilmar_Abrego_Garcia'),)
(rdflib.term.URIRef('http://example.org/Lisa_Maree_Williams'),)
(rdflib.term.URIRef('http://example.org/Patsy'),)
(rdflib.term.URIRef('http://example.org/Terry_Gilliam'),)
(rdflib.term.URIRef('http://example.org/Andrew_Medichini_Andrew'),)
(rdflib.term.URIRef('http://example.org/Bry_Jensen'),)
(rdflib.term.URIRef('http://example.org/Canon'),)
(rdflib.term.URIRef('http://example.org/Canon_Law'),)
(rdflib.term.URIRef('http://example.org/Conclaves'),)
(rdflib.term.URIRef('http://example.org/Gregg_Gassman'),)
(rdflib.term.URIRef('http://example.org/J_Mitchell'),)
(rdflib.term.URIRef('http://example.org/Jensen'),)
(rdflib.term.URIRef('http://example.org/Jorge'),)
(rdflib.term.URIRef('http://example.org/Kurt_Martens'),)
(rdflib.term.URIRef('http://example.org/Martens'),)
(rdflib.term.URIRef('http://example.org/Pope_Gregory_X_The'),)
(rdflib.term.URIRef('http://example.org/Sanctae_Marthae'),)
(rdflib.term.URIRef('http://example.org/Typically'),)

Triples with example.org URIs:
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/Easter_Sunday'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/the_G7_Leaders_Summit'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/preside'), rdflib.term.URIRef('http://example.org/Vatican_City'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/drive'), rdflib.term.URIRef('http://example.org/Vatican_City'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/choose'), rdflib.term.URIRef('http://example.org/March_2013'))
(rdflib.term.URIRef('http://example.org/Christians'), rdflib.term.URIRef('http://example.org/move'), rdflib.term.URIRef('http://example.org/Gaza'))
(rdflib.term.URIRef('http://example.org/John_Paul_II'), rdflib.term.URIRef('http://example.org/appoint'), rdflib.term.URIRef('http://example.org/2001'))
(rdflib.term.URIRef('http://example.org/the_People_of_God'), rdflib.term.URIRef('http://example.org/attend'), rdflib.term.URIRef('http://example.org/Monday'))
(rdflib.term.URIRef('http://example.org/Martin_Pendergast'), rdflib.term.URIRef('http://example.org/tell'), rdflib.term.URIRef('http://example.org/London'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/withLee_Yong_soo'))
(rdflib.term.URIRef('http://example.org/John_Paul_II'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/Easter'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/write'), rdflib.term.URIRef('http://example.org/Santa_Maria_Maggiore'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/announce'), rdflib.term.URIRef('http://example.org/Monday'))
(rdflib.term.URIRef('http://example.org/the_College_of_Cardinals'), rdflib.term.URIRef('http://example.org/lock'), rdflib.term.URIRef('http://example.org/Vatican'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/meet'), rdflib.term.URIRef('http://example.org/a_Canonization_Ceremony'))
(rdflib.term.URIRef('http://example.org/Donald_Trump'), rdflib.term.URIRef('http://example.org/praise'), rdflib.term.URIRef('http://example.org/2013'))
(rdflib.term.URIRef('http://example.org/Arthur'), rdflib.term.URIRef('http://example.org/encounter'), rdflib.term.URIRef('http://example.org/The_Black_Knight'))
(rdflib.term.URIRef('http://example.org/115'), rdflib.term.URIRef('http://example.org/vote'), rdflib.term.URIRef('http://example.org/April_2005at_age_84'))
(rdflib.term.URIRef('http://example.org/Donald_Trump'), rdflib.term.URIRef('http://example.org/clash'), rdflib.term.URIRef('http://example.org/recent_years'))
(rdflib.term.URIRef('http://example.org/Bianca_Lott'), rdflib.term.URIRef('http://example.org/study'), rdflib.term.URIRef('http://example.org/her_spring_semester'))
(rdflib.term.URIRef('http://example.org/Joseph_Ratzinger'), rdflib.term.URIRef('http://example.org/announce'), rdflib.term.URIRef('http://example.org/April_8'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/bear'), rdflib.term.URIRef('http://example.org/Jorge_Bergoglio'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/the_age_of_88_He'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/bear'), rdflib.term.URIRef('http://example.org/Dublin'))
(rdflib.term.URIRef('http://example.org/Kevin_Farrell'), rdflib.term.URIRef('http://example.org/criticize'), rdflib.term.URIRef('http://example.org/2018'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/write'), rdflib.term.URIRef('http://example.org/his2025'))
(rdflib.term.URIRef('http://example.org/Bry_Jensen'), rdflib.term.URIRef('http://example.org/tell'), rdflib.term.URIRef('http://example.org/NPR'))
(rdflib.term.URIRef('http://example.org/Stephen_P_Newton'), rdflib.term.URIRef('http://example.org/reflect'), rdflib.term.URIRef('http://example.org/NPR'))
(rdflib.term.URIRef('http://example.org/Francis'), rdflib.term.URIRef('http://example.org/die'), rdflib.term.URIRef('http://example.org/88'))
(rdflib.term.URIRef('http://example.org/Giovanni_Battistaannounced'), rdflib.term.URIRef('http://example.org/battistaannounce'), rdflib.term.URIRef('http://example.org/tens_of_thousands'))

Articles mentioning 'Trump':
(rdflib.term.URIRef('https://www.npr.org/article/article_4'), rdflib.term.Literal('People walk past a screen showing the CSI 300 Index at a shopping mall in Guangzhou, in southern China\'s Guangdong province.Jade Gao/AFP via Getty Imageshide caption\nAs the Trump administration negotiates trade deals with other countries, China has issued a warning against any agreements that harm its interests.\nChina\'s commerce ministry says it respects the efforts of others to try to resolve trade disputes with the U.S. through consultation. But it warns that it will take "corresponding countermeasures" if any deals are struck at the expense of China\'s interests. It did not give details.\nThe comments come after reports that Trump is hoping touse tariff negotiations with other countriesto isolate China. At the same time, Trump has said he wants to do a deal with Beijing. This month he raised the base tariff on Chinese imports to a dizzying 145%. China responded in kind, with high tariffs on U.S. goods.\nThe Chinese commerce ministry says seeking tariff exemptions by harming the interests of others will only lead to failure on both sides and ultimately hurt everyone.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_6'), rdflib.term.Literal('Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption\nPresident Trump has acknowledged the pope\'s death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"\nTrump and Francis clashed repeatedly in recent years.\nTrump praised the pope at the start of Francis\'s papacy, in 2013, several years before Trump reached the White House.\n"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.\nThings soured soon after. During the 2016 election, Francisroundly criticizedTrump\'s campaign proposal to build a wall on the U.S.-Mexico border.\n"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.\nTrump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person\'s faith is disgraceful."\n"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS\'s ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.\nTrumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.\nLook at their faces.pic.twitter.com/0t84cBX8bZ\nNearly a decade later, amidst the second Trump administration\'s crackdown on immigration, the pope once again made a rare public rebuke of the president\'s policies.\nIn apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."\nHe said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."\n"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.\nThe letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.\n"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_9'), rdflib.term.Literal('Good morning. You\'re reading the Up First newsletter.Subscribehere to get it delivered to your inbox, andlistento the Up First podcast for all the news you need to start your day.\nPope Francis died on Easter Monday at the age of 88.He was the first non-European head of the Roman Catholic Church in over a millennium and was one of themost popular popes in decades. He was elected to his exalted post in 2013 and cast an image of humility during years of strain and change, within his church and worldwide.\nPope Francis waves to thousands of followers as he arrives at the Philippines\' Manila Cathedral on Jan. 16, 2015. During his papacy, Francis strove to reach out to what he called the "periphery" of the world in Asia, Africa and Latin America.Lisa Maree Williams/Getty Imageshide caption\nA big draw to Pope Francis was his personal story.He was the son of immigrants and grew up in Argentina, where he lived through turbulent times. Francis, born Jorge Mario Bergoglio, was the first pope from Latin America.\nFour House Democrats were scheduled to land in El Salvador today to demand the release and return of Kilmar Abrego Garcia, a Salvadorian citizen who lived in Maryland. He was deported to a Salvadoran prison due to an "administrative error," according to the Trump administration. The lawmakerssaid in a statementthey hope "to pressure" the White House "to abide by a Supreme Court order."\nThe State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.Mandel Ngan/APhide caption\nThe State Department seal is seen on the briefing room lectern at the State Department in Washington, D.C., on Jan. 31, 2022.\nNPR has learned that the Trump administration is substantiallyscaling back the State Department\'s annual reportson international human rights to remove critiques of abuses such as harsh prison conditions and government corruption. These reports are intended to guide congressional foreign aid and security assistance decisions. Moving forward, the State Department will no longer call out governments for restricting freedom of movement and peaceful assembly. Additionally, the reports will not condemn the detention of political prisoners without due process or the limitations placed on free and fair elections.\nnasal sprayDDurrich/iStockphoto/Getty Imageshide caption\nLiving Better is aspecial seriesabout what it takes to stay healthy in America.\nIt is not just in your head; seasonal allergies are getting worse every year. This is due to the warming from climate change, making the pollen season longer. Luckily, there are ways to keep the pollen from taking over your life. Here aretips from doctorson how to get relief:\nKing Arthur (Graham Chapman) and his servant Patsy (Terry Gilliam) encounter The Black Knight (John Cleese)Monty Python and the Holy Grail/Fathom Entertainmenthide caption\nThis newsletter was edited byYvonne Dennis.', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))


query_titles_dates = prepareQuery("""
  SELECT ?article ?title ?date WHERE {
    ?article a <http://example.org/Article> ;
             <http://example.org/title> ?title ;
             <http://example.org/date> ?date .
  }
""")

print("\nArticles with their titles and dates:")
for result in g.query(query_titles_dates):
    print(result)

Articles with their titles and dates:
(rdflib.term.URIRef('https://www.npr.org/article/article_1'), rdflib.term.Literal('Do you have memories of Pope Francis to share? Send them our way', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T14:25:05-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_2'), rdflib.term.Literal('Pope Francis is remembered around the world for his generosity of spirit', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T14:05:36-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_3'), rdflib.term.Literal('What happens next after a pope dies, according to recent history', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T13:14:58-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_4'), rdflib.term.Literal("China warns of 'countermeasures' against any deals that harm its interests", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T11:08:30-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_5'), rdflib.term.Literal('Who is Cardinal Kevin Farrell, the acting head of the Vatican?', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T10:34:12-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_6'), rdflib.term.Literal("A brief history of Trump's feud with Pope Francis", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T09:13:28-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_7'), rdflib.term.Literal("Argentina's president, a former critic of Pope Francis, offers his condolences", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T08:32:37-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_8'), rdflib.term.Literal('Leaders in Africa mourn the passing of Pope Francis', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T08:07:03-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_9'), rdflib.term.Literal("Pope Francis dies at 88. And, House Democrats press for Abrego Garcia's return", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T07:19:41-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('https://www.npr.org/article/article_10'), rdflib.term.Literal("Who will be the next pope? Here's how the conclave works", datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('2025-04-21T06:38:53-04:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))


query_types = prepareQuery("""
  SELECT DISTINCT ?type WHERE {
    ?entity a ?type .
  }
""")

print("\nAll types found in the graph:")
for result in g.query(query_types):
    print(result)

All types found in the graph:
(rdflib.term.URIRef('http://example.org/Article'),)
(rdflib.term.URIRef('http://example.org/ORG'),)
(rdflib.term.URIRef('http://example.org/PER'),)
(rdflib.term.URIRef('http://example.org/LOC'),)
(rdflib.term.URIRef('http://example.org/MISC'),)


import requests

def spotlight_link(text, confidence=0.5, support=20):
    url = "https://api.dbpedia-spotlight.org/en/annotate"
    headers = {"Accept": "application/json"}
    params = {
        "text": text,
        "confidence": confidence,
        "support": support
    }

    try:
        response = requests.get(url, headers=headers, params=params, verify=False)  # SSL check skipped
        if response.status_code == 200:
            data = response.json()
            if "Resources" in data:
                return [(res["@surfaceForm"], res["@URI"]) for res in data["Resources"]]
            else:
                return []
        else:
            print("HTTP error:", response.status_code)
            return []
    except requests.exceptions.SSLError as e:
        print("SSL error:", e)
        return []


text = "Pope Francis met with Elon Musk and Donald Trump in Vatican City."

linked_entities = spotlight_link(text)
print("Linked Entities:")
for label, uri in linked_entities:
    print(f"{label} -> {uri}")

Linked Entities:
Pope Francis -> http://dbpedia.org/resource/Pope_Francis
Elon Musk -> http://dbpedia.org/resource/Elon_Musk
Donald Trump -> http://dbpedia.org/resource/Donald_Trump
Vatican City -> http://dbpedia.org/resource/Vatican_City

C:\Users\Nejjari\anaconda3\Anaconda\Lib\site-packages\urllib3\connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'api.dbpedia-spotlight.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(


import random

def clean_label(uri):
    """Clean URI to extract a readable label."""
    if isinstance(uri, str):
        if 'rdf-syntax-ns#type' in uri:
            return 'type'
        return uri.split('/')[-1]  # Get last part of URI
    return str(uri)

def get_node_type(triples, node):
    """Return the rdf:type of a node if available."""
    for s, p, o in triples:
        if str(s) == node and 'rdf-syntax-ns#type' in str(p):
            return o.split('/')[-1]
    return None

def visualize_clean_rdf_graph(g, max_edges=50):
    G = nx.DiGraph()
    triples = list(g)

    # Filter out long literals
    filtered_triples = [
        (s, p, o) for s, p, o in triples
        if not (isinstance(o, str) and len(o) > 80)
    ]

    sampled_triples = random.sample(filtered_triples, min(len(filtered_triples), max_edges))

    node_types = {}
    for s, p, o in triples:
        if 'rdf-syntax-ns#type' in str(p):
            node_types[str(s)] = o.split('/')[-1]

    for s, p, o in sampled_triples:
        G.add_node(str(s))
        G.add_node(str(o))
        G.add_edge(str(s), str(o), label=clean_label(p))

    edge_labels = nx.get_edge_attributes(G, 'label')
    pos = nx.spring_layout(G, k=0.5, iterations=25)

    # Assign node colors based on type
    node_colors = []
    color_map = {
        "PER": "lightcoral",
        "LOC": "skyblue",
        "ORG": "orange",
        "Article": "yellowgreen",
        "MISC": "plum"
    }
    for node in G.nodes():
        node_type = node_types.get(node, None)
        color = color_map.get(node_type, 'lightgray')
        node_colors.append(color)

    # Clean node labels for display
    labels = {node: clean_label(node) for node in G.nodes()}

    plt.figure(figsize=(16, 12))
    nx.draw(G, pos, labels=labels, node_color=node_colors, node_size=2500, font_size=9, arrows=True)
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color='green', font_size=8)
    plt.title("RDF Graph View (Grouped by Type, Clean Labels)")
    plt.axis("off")
    plt.show()


visualize_clean_rdf_graph(g, max_edges=60)


!pip install pykeen torch

Requirement already satisfied: pykeen in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (1.10.1)
Requirement already satisfied: torch in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (2.0.1)
Requirement already satisfied: dataclasses-json in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.6.7)
Requirement already satisfied: numpy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.24.3)
Requirement already satisfied: scipy>=1.7.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.11.1)
Requirement already satisfied: click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (8.0.4)
Requirement already satisfied: click-default-group in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.2.4)
Requirement already satisfied: scikit-learn in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (1.3.0)
Requirement already satisfied: tqdm in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.65.0)
Requirement already satisfied: requests in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (2.31.0)
Requirement already satisfied: optuna>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.3.0)
Requirement already satisfied: pandas>=1.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (2.0.3)
Requirement already satisfied: tabulate in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.8.10)
Requirement already satisfied: more-click in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.2)
Requirement already satisfied: more-itertools in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (8.12.0)
Requirement already satisfied: pystow>=0.4.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.7.0)
Requirement already satisfied: docdata in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.0.5)
Requirement already satisfied: class-resolver>=0.3.10 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.3.10)
Requirement already satisfied: pyyaml in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (6.0)
Requirement already satisfied: rexmex>=0.1.3 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.3)
Requirement already satisfied: torch-max-mem>=0.0.4 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.1.4)
Requirement already satisfied: torch-ppr>=0.0.7 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (0.0.8)
Requirement already satisfied: protobuf<4.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (3.20.3)
Requirement already satisfied: typing-extensions in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pykeen) (4.13.2)
Requirement already satisfied: filelock in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.9.0)
Requirement already satisfied: sympy in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (1.13.1)
Requirement already satisfied: networkx in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.1)
Requirement already satisfied: jinja2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from torch) (3.1.2)
Requirement already satisfied: alembic>=1.5.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (1.14.1)
Requirement already satisfied: colorlog in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (6.9.0)
Requirement already satisfied: packaging>=20.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (23.1)
Requirement already satisfied: sqlalchemy>=1.4.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from optuna>=2.0.0->pykeen) (1.4.39)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from pandas>=1.0.0->pykeen) (2023.3)
Requirement already satisfied: colorama in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from click->pykeen) (0.4.6)
Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from dataclasses-json->pykeen) (3.26.1)
Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from dataclasses-json->pykeen) (0.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from jinja2->torch) (2.1.1)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from requests->pykeen) (2023.7.22)
Requirement already satisfied: joblib>=1.1.1 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn->pykeen) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from scikit-learn->pykeen) (2.2.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sympy->torch) (1.3.0)
Requirement already satisfied: Mako in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from alembic>=1.5.0->optuna>=2.0.0->pykeen) (1.3.10)
Requirement already satisfied: six>=1.5 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->pykeen) (1.16.0)
Requirement already satisfied: greenlet!=0.4.17 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from sqlalchemy>=1.4.2->optuna>=2.0.0->pykeen) (2.0.1)
Requirement already satisfied: mypy-extensions>=0.3.0 in c:\users\nejjari\anaconda3\anaconda\lib\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json->pykeen) (1.0.0)


triples = [(str(s), str(p), str(o)) for s, p, o in g]


import numpy as np
from pykeen.triples import TriplesFactory

triples = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples, dtype=str)

# Create PyKEEN TriplesFactory
tf = TriplesFactory.from_labeled_triples(triples_array)

# printing to confirm
print(f"Total triples: {len(triples_array)}")
print("Sample triples:", triples_array[:5])

Total triples: 367
Sample triples: [['http://example.org/John_Paul_II'
  'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
  'http://example.org/PER']
 ['https://www.npr.org/article/article_6' 'http://example.org/content'
  'Pope Francis exchanges gifts with US President Donald Trump (C) and US First Lady Melania Trump during a private audience at the Vatican on May 24, 2017. US President Donald Trump met Pope Francis at the Vatican today in a keenly-anticipated first face-to-face encounter between two world leaders who have clashed repeatedly on several issues.Alessandra Tarantino/AFP via Getty Imageshide caption\nPresident Trump has acknowledged the pope\'s death in aone-line poston Truth Social, writing: "Rest in Peace Pope Francis! May God Bless him and all who loved him!"\nTrump and Francis clashed repeatedly in recent years.\nTrump praised the pope at the start of Francis\'s papacy, in 2013, several years before Trump reached the White House.\n"The new Pope is a humble man, very much like me, which probably explains why I like him so much!"Trump tweetedin December of that year, several months after Francis became pope.\nThings soured soon after. During the 2016 election, Francisroundly criticizedTrump\'s campaign proposal to build a wall on the U.S.-Mexico border.\n"A person who thinks only about building walls, wherever they may be, and not building bridges, is not Christian," Francis said at the time.\nTrump — who aggressively courted evangelical Christian leaders and voters during his campaign — fired back immediately, saying, "for a religious leader to question a person\'s faith is disgraceful."\n"If and when the Vatican is attacked by ISIS, which as everyone knows is ISIS\'s ultimate trophy, I can promise you that the Pope would have only wished and prayed that Donald Trump would have been President because this would not have happened," he added.\nTrumpmet the Popeduring a 2017 trip to the Vatican, later telling reporters: "He is something. We had a fantastic meeting." A photo from the visit, in which Trump is smiling next to a glum-looking Francis, quickly went viral.\nLook at their faces.pic.twitter.com/0t84cBX8bZ\nNearly a decade later, amidst the second Trump administration\'s crackdown on immigration, the pope once again made a rare public rebuke of the president\'s policies.\nIn apublic letterto U.S. Catholic bishops, February, Francis described the program of mass deportations as a "major crisis."\nHe said while nations have the right to defend themselves, "the rightly formed conscience cannot fail to make a critical judgment and express its disagreement with any measure that tacitly or explicitly identifies the illegal status of some migrants with criminality."\n"The act of deporting people who in many cases have left their own land for reasons of extreme poverty, insecurity, exploitation, persecution or serious deterioration of the environment, damages the dignity of many men and women, and of entire families, and places them in a state of particular vulnerability and defenselessness," Francis wrote.\nThe letter also appeared to respond to widely-criticized comments that Vice President Vance, who is Catholic, had made weeks earlier. Vance said people should care for their family, communities and country before caring for others — and Francis disagreed.\n"Christian love is not a concentric expansion of interests that little by little extend to other persons and groups," the pope wrote.']
 ['http://example.org/Stephen_P_Newton'
  'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
  'http://example.org/PER']
 ['http://example.org/Iraq'
  'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
  'http://example.org/LOC']
 ['http://example.org/Vatican_Media'
  'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
  'http://example.org/ORG']]


from pykeen.triples import TriplesFactory
from sklearn.model_selection import train_test_split

# converting RDF triples to list of strings
triples_list = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples_list)

# First split: 80% train, 20% temp (val + test)
train_triples, temp_triples = train_test_split(
    triples_array, test_size=0.2, random_state=42
)

# Second split: 50/50 of remaining → 10% val, 10% test
val_triples, test_triples = train_test_split(
    temp_triples, test_size=0.5, random_state=42
)

# Create TriplesFactories
training = TriplesFactory.from_labeled_triples(train_triples)
validation = TriplesFactory.from_labeled_triples(val_triples)
testing = TriplesFactory.from_labeled_triples(test_triples)

# Check the counts
print(f"Training: {training.num_triples}")
print(f"Validation: {validation.num_triples}")
print(f"Testing: {testing.num_triples}")

Training: 293
Validation: 37
Testing: 37


from pykeen.triples import TriplesFactory
from pykeen.models import TransE  
from pykeen.training import SLCWATrainingLoop
from pykeen.evaluation import RankBasedEvaluator
from sklearn.model_selection import train_test_split


# Assume you already have your RDFLib graph `g`
# Convert RDF graph to PyKEEN triples
triples = [(str(s), str(p), str(o)) for s, p, o in g]
triples_array = np.array(triples, dtype=str)

# Build TriplesFactory
tf = TriplesFactory.from_labeled_triples(triples_array)

# Manual split
from sklearn.model_selection import train_test_split

train_triples, temp = train_test_split(triples_array, test_size=0.2, random_state=42)
valid_triples, test_triples = train_test_split(temp, test_size=0.5, random_state=42)

training = TriplesFactory.from_labeled_triples(train_triples)
validation = TriplesFactory.from_labeled_triples(valid_triples)
testing = TriplesFactory.from_labeled_triples(test_triples)


model = TransE(triples_factory=training)

No random seed is specified. This may lead to non-reproducible results.


# Instantiate training loop with triples_factory
training_loop = SLCWATrainingLoop(
    model=model,
    triples_factory=training  # required at init
)

# Run training (triples_factory required again here)
training_loop_result = training_loop.train(
    triples_factory=training,  # also needed here
    num_epochs=100,
    batch_size=128
)

Training epochs on cpu:   0%|          | 0/100 [00:00<?, ?epoch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/3 [00:00<?, ?batch/s]


from pykeen.evaluation import RankBasedEvaluator

evaluator = RankBasedEvaluator()
results = evaluator.evaluate(
    model=model,
    mapped_triples=testing.mapped_triples,
    additional_filter_triples=[
        training.mapped_triples,
        validation.mapped_triples
    ]
)

# View key metrics
results.get_metric("mean_reciprocal_rank")
results.get_metric("hits_at_k")  # returns hits@1, hits@3, hits@10

Evaluating on cpu:   0%|          | 0.00/37.0 [00:00<?, ?triple/s]

0.02702702702702703


from sklearn.manifold import TSNE

# Extract learned entity embeddings
entity_embeddings = model.entity_representations[0]().detach().cpu().numpy()

# Reduce to 2D
tsne = TSNE(n_components=2, random_state=42)
entity_2d = tsne.fit_transform(entity_embeddings)

# Plot
plt.figure(figsize=(8,6))
plt.scatter(entity_2d[:, 0], entity_2d[:, 1], alpha=0.6)
plt.title("t-SNE of Entity Embeddings")
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.show()


from pykeen.pipeline import pipeline

models = ['TransE', 'DistMult', 'ComplEx']
model_results = {}

for model_name in models:
    print(f"Training {model_name}...")
    result = pipeline(
        training=training,
        validation=validation,
        testing=testing,
        model=model_name,
        model_kwargs=dict(embedding_dim=50),  # <- put it here
        training_kwargs=dict(batch_size=32),
        epochs=100,
        random_seed=42,
    )
    model_results[model_name] = result

No cuda devices were available. The model runs on CPU

Training TransE...

Training epochs on cpu:   0%|          | 0/100 [00:00<?, ?epoch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]

Training batches on cpu:   0%|          | 0/10 [00:00<?, ?batch/s]


from pykeen.evaluation import RankBasedEvaluator
import pandas as pd

model_results_summary = {}

for model_name, result in model_results.items():
    print(f"=== Evaluation Results for {model_name} ===")

    evaluator = RankBasedEvaluator()
    metrics = evaluator.evaluate(
        model=result.model,
        mapped_triples=testing.mapped_triples,
        additional_filter_triples=[
            training.mapped_triples,
            validation.mapped_triples,
        ]
    )

    # Store the results as dictionary
    model_results_summary[model_name] = metrics.to_dict()

INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value.
INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.

=== Evaluation Results for TransE ===

Evaluating on cpu:   0%|          | 0.00/37.0 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 0.04s seconds
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value.
INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.

=== Evaluation Results for DistMult ===

Evaluating on cpu:   0%|          | 0.00/37.0 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 0.03s seconds
INFO:pykeen.evaluation.evaluator:Currently automatic memory optimization only supports GPUs, but you're using a CPU. Therefore, the batch_size will be set to the default value.
INFO:pykeen.evaluation.evaluator:No evaluation batch_size provided. Setting batch_size to '32'.

=== Evaluation Results for ComplEx ===

Evaluating on cpu:   0%|          | 0.00/37.0 [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 0.03s seconds


# create DataFrame for comparison
summary_df = pd.DataFrame(model_results_summary)
summary_df = summary_df.round(4)  # Optional: round for clarity

                                                 TransE  \
head  {'optimistic': {'inverse_median_rank': 0.00561...   
tail  {'optimistic': {'inverse_median_rank': 0.00609...   
both  {'optimistic': {'inverse_median_rank': 0.00595...   

                                               DistMult  \
head  {'optimistic': {'inverse_median_rank': 0.00892...   
tail  {'optimistic': {'inverse_median_rank': 0.00847...   
both  {'optimistic': {'inverse_median_rank': 0.00877...   

                                                ComplEx  
head  {'optimistic': {'inverse_median_rank': 0.01219...  
tail  {'optimistic': {'inverse_median_rank': 0.01351...  
both  {'optimistic': {'inverse_median_rank': 0.01298...


def extract_core_metrics(metrics_dict):
    """Extracts core metrics for comparison from nested PyKEEN metric dicts."""
    try:
        # "both" = filtered, optimistic evaluation; common in link prediction
        both = metrics_dict['both']['optimistic']
        return {
            'MRR': both['mean_reciprocal_rank'],
            'Hits@1': both['hits_at_1'],
            'Hits@3': both['hits_at_3'],
            'Hits@10': both['hits_at_10']
        }
    except Exception as e:
        print("Error extracting metrics:", e)
        return {}

# Apply extraction for all results
flat_results = {}
for name, d in model_results_summary.items():
    flat_results[name] = extract_core_metrics(d)

df_metrics = pd.DataFrame(flat_results).T  # .T to make models as rows

Error extracting metrics: 'mean_reciprocal_rank'
Error extracting metrics: 'mean_reciprocal_rank'
Error extracting metrics: 'mean_reciprocal_rank'
Empty DataFrame
Columns: []
Index: [TransE, DistMult, ComplEx]


import json

# print the full nested metrics for one model
print(json.dumps(model_results_summary['TransE'], indent=2))

{
  "head": {
    "optimistic": {
      "inverse_median_rank": 0.0056179775280898875,
      "adjusted_arithmetic_mean_rank": 1.1498720257844344,
      "inverse_harmonic_mean_rank": 0.010257537252455581,
      "geometric_mean_rank": 136.4341549450835,
      "median_absolute_deviation": 94.88654198435853,
      "z_arithmetic_mean_rank": -1.5843886702161658,
      "arithmetic_mean_rank": 163.9189189189189,
      "z_geometric_mean_rank": -1.7509208414395,
      "z_inverse_harmonic_mean_rank": -0.9751401979423034,
      "adjusted_inverse_harmonic_mean_rank": -0.011930175509493494,
      "adjusted_arithmetic_mean_rank_index": -0.15093078758949852,
      "count": 37.0,
      "median_rank": 178.0,
      "standard_deviation": 77.98592332382236,
      "inverse_geometric_mean_rank": 0.00732954296087012,
      "inverse_arithmetic_mean_rank": 0.006100577081615829,
      "variance": 6081.804236669101,
      "harmonic_mean_rank": 97.4892876709375,
      "adjusted_geometric_mean_rank_index": -0.2751952661870738,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1619626806801366,
      "adjusted_hits_at_k": -0.03649043999044609
    },
    "realistic": {
      "inverse_median_rank": 0.00561797758564353,
      "adjusted_arithmetic_mean_rank": 1.1498719968550781,
      "inverse_harmonic_mean_rank": 0.010257537476718426,
      "geometric_mean_rank": 136.43414306640625,
      "median_absolute_deviation": 94.88654198435853,
      "z_arithmetic_mean_rank": -1.5843883643862817,
      "arithmetic_mean_rank": 163.91891479492188,
      "z_geometric_mean_rank": -1.7509201298293657,
      "z_inverse_harmonic_mean_rank": -0.9751401792007366,
      "adjusted_inverse_harmonic_mean_rank": -0.0119301752802032,
      "adjusted_arithmetic_mean_rank_index": -0.15093075845577264,
      "count": 37.0,
      "median_rank": 178.0,
      "standard_deviation": 77.98592376708984,
      "inverse_geometric_mean_rank": 0.007329543586820364,
      "inverse_arithmetic_mean_rank": 0.006100577302277088,
      "variance": 6081.8046875,
      "harmonic_mean_rank": 97.48928553950732,
      "adjusted_geometric_mean_rank_index": -0.2751951543420743,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1619626806801366,
      "adjusted_hits_at_k": -0.03649043999044609
    },
    "pessimistic": {
      "inverse_median_rank": 0.0056179775280898875,
      "adjusted_arithmetic_mean_rank": 1.1498720257844344,
      "inverse_harmonic_mean_rank": 0.010257537252455581,
      "geometric_mean_rank": 136.4341549450835,
      "median_absolute_deviation": 94.88654198435853,
      "z_arithmetic_mean_rank": -1.5843886702161658,
      "arithmetic_mean_rank": 163.9189189189189,
      "z_geometric_mean_rank": -1.7509208414395,
      "z_inverse_harmonic_mean_rank": -0.9751401979423034,
      "adjusted_inverse_harmonic_mean_rank": -0.011930175509493494,
      "adjusted_arithmetic_mean_rank_index": -0.15093078758949852,
      "count": 37.0,
      "median_rank": 178.0,
      "standard_deviation": 77.98592332382236,
      "inverse_geometric_mean_rank": 0.00732954296087012,
      "inverse_arithmetic_mean_rank": 0.006100577081615829,
      "variance": 6081.804236669101,
      "harmonic_mean_rank": 97.4892876709375,
      "adjusted_geometric_mean_rank_index": -0.2751952661870738,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1619626806801366,
      "adjusted_hits_at_k": -0.03649043999044609
    }
  },
  "tail": {
    "optimistic": {
      "inverse_median_rank": 0.006097560975609756,
      "adjusted_arithmetic_mean_rank": 1.1637410606482772,
      "inverse_harmonic_mean_rank": 0.008433153844356594,
      "geometric_mean_rank": 146.27384194941163,
      "median_absolute_deviation": 109.71256416941455,
      "z_arithmetic_mean_rank": -1.7310780985509666,
      "arithmetic_mean_rank": 169.32432432432432,
      "z_geometric_mean_rank": -2.1621542374267344,
      "z_inverse_harmonic_mean_rank": -1.1066681704329815,
      "adjusted_inverse_harmonic_mean_rank": -0.013403310310512798,
      "adjusted_arithmetic_mean_rank_index": -0.16487421677733094,
      "count": 37.0,
      "median_rank": 164.0,
      "standard_deviation": 78.62502663740504,
      "inverse_geometric_mean_rank": 0.006836492339798164,
      "inverse_arithmetic_mean_rank": 0.005905826017557861,
      "variance": 6181.8948137326515,
      "harmonic_mean_rank": 118.57959886136702,
      "adjusted_geometric_mean_rank_index": -0.339942517617952,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1495340671022203,
      "adjusted_hits_at_k": -0.03571428571428572
    },
    "realistic": {
      "inverse_median_rank": 0.006097560748457909,
      "adjusted_arithmetic_mean_rank": 1.1637410691513639,
      "inverse_harmonic_mean_rank": 0.008433152921497822,
      "geometric_mean_rank": 146.27381896972656,
      "median_absolute_deviation": 109.71256416941455,
      "z_arithmetic_mean_rank": -1.7310781884459931,
      "arithmetic_mean_rank": 169.32432556152344,
      "z_geometric_mean_rank": -2.1621528893208533,
      "z_inverse_harmonic_mean_rank": -1.106668248308493,
      "adjusted_inverse_harmonic_mean_rank": -0.013403311253694932,
      "adjusted_arithmetic_mean_rank_index": -0.16487422533926255,
      "count": 37.0,
      "median_rank": 164.0,
      "standard_deviation": 78.62503051757812,
      "inverse_geometric_mean_rank": 0.0068364934995770454,
      "inverse_arithmetic_mean_rank": 0.005905826110392809,
      "variance": 6181.8955078125,
      "harmonic_mean_rank": 118.57961183779754,
      "adjusted_geometric_mean_rank_index": -0.3399423056633659,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1495340671022203,
      "adjusted_hits_at_k": -0.03571428571428572
    },
    "pessimistic": {
      "inverse_median_rank": 0.006097560975609756,
      "adjusted_arithmetic_mean_rank": 1.1637410606482772,
      "inverse_harmonic_mean_rank": 0.008433153844356594,
      "geometric_mean_rank": 146.27384194941163,
      "median_absolute_deviation": 109.71256416941455,
      "z_arithmetic_mean_rank": -1.7310780985509666,
      "arithmetic_mean_rank": 169.32432432432432,
      "z_geometric_mean_rank": -2.1621542374267344,
      "z_inverse_harmonic_mean_rank": -1.1066681704329815,
      "adjusted_inverse_harmonic_mean_rank": -0.013403310310512798,
      "adjusted_arithmetic_mean_rank_index": -0.16487421677733094,
      "count": 37.0,
      "median_rank": 164.0,
      "standard_deviation": 78.62502663740504,
      "inverse_geometric_mean_rank": 0.006836492339798164,
      "inverse_arithmetic_mean_rank": 0.005905826017557861,
      "variance": 6181.8948137326515,
      "harmonic_mean_rank": 118.57959886136702,
      "adjusted_geometric_mean_rank_index": -0.339942517617952,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.1495340671022203,
      "adjusted_hits_at_k": -0.03571428571428572
    }
  },
  "both": {
    "optimistic": {
      "inverse_median_rank": 0.005952380952380952,
      "adjusted_arithmetic_mean_rank": 1.1568774629386376,
      "inverse_harmonic_mean_rank": 0.009345345548406088,
      "geometric_mean_rank": 141.26835461963404,
      "median_absolute_deviation": 106.74735973240334,
      "z_arithmetic_mean_rank": -2.345325544113173,
      "arithmetic_mean_rank": 166.6216216216216,
      "z_geometric_mean_rank": -2.8134040954014115,
      "z_inverse_harmonic_mean_rank": -1.4715919275655787,
      "adjusted_inverse_harmonic_mean_rank": -0.012666885393668326,
      "adjusted_arithmetic_mean_rank_index": -0.1579743008314436,
      "count": 74.0,
      "median_rank": 168.0,
      "standard_deviation": 78.35275443211982,
      "inverse_geometric_mean_rank": 0.0070787261782195065,
      "inverse_arithmetic_mean_rank": 0.006001622060016221,
      "variance": 6139.154127100072,
      "harmonic_mean_rank": 107.005139063109,
      "adjusted_geometric_mean_rank_index": -0.31527984840562095,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.634498534649764,
      "adjusted_hits_at_k": -0.03610221749620178
    },
    "realistic": {
      "inverse_median_rank": 0.0059523810632526875,
      "adjusted_arithmetic_mean_rank": 1.156877505888879,
      "inverse_harmonic_mean_rank": 0.009345345199108124,
      "geometric_mean_rank": 141.26834106445312,
      "median_absolute_deviation": 106.74735973240334,
      "z_arithmetic_mean_rank": -2.345326186221328,
      "arithmetic_mean_rank": 166.6216278076172,
      "z_geometric_mean_rank": -2.8134029611751874,
      "z_inverse_harmonic_mean_rank": -1.4715919690474113,
      "adjusted_inverse_harmonic_mean_rank": -0.01266688575072765,
      "adjusted_arithmetic_mean_rank_index": -0.1579743440819794,
      "count": 74.0,
      "median_rank": 168.0,
      "standard_deviation": 78.3527603149414,
      "inverse_geometric_mean_rank": 0.007078726775944233,
      "inverse_arithmetic_mean_rank": 0.006001621950417757,
      "variance": 6139.15478515625,
      "harmonic_mean_rank": 107.00514306260568,
      "adjusted_geometric_mean_rank_index": -0.31527972130028514,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.634498534649764,
      "adjusted_hits_at_k": -0.03610221749620178
    },
    "pessimistic": {
      "inverse_median_rank": 0.005952380952380952,
      "adjusted_arithmetic_mean_rank": 1.1568774629386376,
      "inverse_harmonic_mean_rank": 0.009345345548406088,
      "geometric_mean_rank": 141.26835461963404,
      "median_absolute_deviation": 106.74735973240334,
      "z_arithmetic_mean_rank": -2.345325544113173,
      "arithmetic_mean_rank": 166.6216216216216,
      "z_geometric_mean_rank": -2.8134040954014115,
      "z_inverse_harmonic_mean_rank": -1.4715919275655787,
      "adjusted_inverse_harmonic_mean_rank": -0.012666885393668326,
      "adjusted_arithmetic_mean_rank_index": -0.1579743008314436,
      "count": 74.0,
      "median_rank": 168.0,
      "standard_deviation": 78.35275443211982,
      "inverse_geometric_mean_rank": 0.0070787261782195065,
      "inverse_arithmetic_mean_rank": 0.006001622060016221,
      "variance": 6139.154127100072,
      "harmonic_mean_rank": 107.005139063109,
      "adjusted_geometric_mean_rank_index": -0.31527984840562095,
      "hits_at_1": 0.0,
      "hits_at_3": 0.0,
      "hits_at_5": 0.0,
      "hits_at_10": 0.0,
      "z_hits_at_k": -1.634498534649764,
      "adjusted_hits_at_k": -0.03610221749620178
    }
  }
}


import pandas as pd

# Define core metrics to extract
core_metrics = [
    "mean_reciprocal_rank",               
    "arithmetic_mean_rank",
    "harmonic_mean_rank",
    "geometric_mean_rank",
    "hits_at_1",
    "hits_at_3",
    "hits_at_10"
]

# Extract metrics safely from 'both → optimistic'
def extract_metrics(summary, metrics=core_metrics):
    extracted = {}
    optimistic = summary.get("both", {}).get("optimistic", {})
    for metric in metrics:
        value = optimistic.get(metric)
        if value is not None:
            extracted[metric] = value
    return extracted

# Flatten all model metrics
flat_results = {
    name: extract_metrics(summary)
    for name, summary in model_results_summary.items()
}

# Build and display DataFrame
df_metrics = pd.DataFrame(flat_results).T.round(4)
print(df_metrics)

          arithmetic_mean_rank  harmonic_mean_rank  geometric_mean_rank  \
TransE                166.6216            107.0051             141.2684   
DistMult              119.8378             14.4287              66.3157   
ComplEx               106.8649             15.6083              63.6594   

          hits_at_1  hits_at_3  hits_at_10  
TransE       0.0000     0.0000      0.0000  
DistMult     0.0405     0.0676      0.0946  
ComplEx      0.0405     0.0405      0.0811


# Split metrics into two categories for clarity
rank_metrics = ['arithmetic_mean_rank', 'harmonic_mean_rank', 'geometric_mean_rank']
hits_metrics = ['hits_at_1', 'hits_at_3', 'hits_at_10']

# Plot Rank Metrics (lower is better)
df_metrics[rank_metrics].plot(kind='bar', figsize=(10, 5), title='Model Comparison - Rank Metrics')
plt.ylabel('Rank (lower is better)')
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.tight_layout()
plt.show()

# Plot Hits@K Metrics (higher is better)
df_metrics[hits_metrics].plot(kind='bar', figsize=(10, 5), title='Model Comparison - Hits@K Metrics')
plt.ylabel('Score (higher is better)')
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.tight_layout()
plt.show()


from sklearn.manifold import TSNE

for model_name, result in model_results.items():
    model = result.model

    # Get entity embeddings and move to CPU
    entity_embeddings = model.entity_representations[0]().detach().cpu().numpy()

    # Fix for complex embeddings (e.g. ComplEx)
    if np.iscomplexobj(entity_embeddings):
        print(f" {model_name} embeddings are complex — using real part only")
        entity_embeddings = entity_embeddings.real

    # Reduce to 2D with t-SNE
    tsne = TSNE(n_components=2, random_state=42)
    entity_2d = tsne.fit_transform(entity_embeddings)

    # Plot
    plt.figure(figsize=(8, 6))
    plt.scatter(entity_2d[:, 0], entity_2d[:, 1], alpha=0.5)
    plt.title(f"{model_name} - t-SNE of Entity Embeddings")
    plt.xlabel("Component 1")
    plt.ylabel("Component 2")
    plt.grid(True)
    plt.show()

⚠️ ComplEx embeddings are complex — using real part only


from sklearn.metrics.pairwise import cosine_similarity

# 1. Extract real or magnitude embeddings (to handle complex numbers in ComplEx)

entity_embeddings = model_results['ComplEx'].model.entity_representations[0]().detach().cpu().numpy().real

# 2. Define similarity function
def find_similar_entities(entity_id, entity_embeddings, top_k=5):
    entity_vector = entity_embeddings[entity_id].reshape(1, -1)
    similarities = cosine_similarity(entity_vector, entity_embeddings)
    most_similar = np.argsort(similarities[0])[-top_k-1:-1][::-1]  # exclude the entity itself
    return most_similar

# 3. Build ID mappings
tf = model_results['ComplEx'].training
entity_to_id = tf.entity_to_id
id_to_entity = {v: k for k, v in entity_to_id.items()}

# 4. Choose a target entity
target = "http://example.org/LOC"
target_id = entity_to_id[target]

# 5. Find similar entities
similar_ids = find_similar_entities(target_id, entity_embeddings)

# 6. Display results
print(f"Entities similar to {target}:")
for i in similar_ids:
    print("-", id_to_entity[i])

Entities similar to http://example.org/LOC:
- http://example.org/Vatican_City
- http://example.org/Universal_Shepherd
- http://example.org/Pope
- http://example.org/his2025
- http://example.org/Mass


# Function to normalize embeddings (real part or magnitude if complex)
def preprocess_embeddings(model_name, model):
    # Extract embeddings
    raw_embeddings = model.entity_representations[0]().detach().cpu().numpy()
    if model_name == 'ComplEx':
        # ComplEx embeddings are complex-valued
        # Option 1: Use real part
        return raw_embeddings.real
        # Option 2 (alternative): return np.abs(raw_embeddings)
    else:
        return raw_embeddings

# Similarity function using cosine similarity
def find_similar_entities(entity_id, entity_embeddings, top_k=5):
    entity_vector = entity_embeddings[entity_id].reshape(1, -1)
    similarities = cosine_similarity(entity_vector, entity_embeddings)
    most_similar = np.argsort(similarities[0])[-top_k-1:-1][::-1]  # exclude self
    return most_similar

# Choose the target entity
target_entity = "http://example.org/LOC"

# Iterate over all 3 models
for model_name in ['TransE', 'DistMult', 'ComplEx']:
    print(f"\n Similar entities in model: {model_name}")
    
    # Get model and training triples factory
    model = model_results[model_name].model
    tf = model_results[model_name].training
    
    # Preprocess embeddings
    entity_embeddings = preprocess_embeddings(model_name, model)
    
    # ID mappings
    entity_to_id = tf.entity_to_id
    id_to_entity = {v: k for k, v in entity_to_id.items()}
    
    # Get target ID and compute similarity
    target_id = entity_to_id[target_entity]
    similar_ids = find_similar_entities(target_id, entity_embeddings)

    # Show top similar entities
    print(f"Entities similar to {target_entity} using {model_name}:")
    for i in similar_ids:
        print("-", id_to_entity[i])

🔍 Similar entities in model: TransE
Entities similar to http://example.org/LOC using TransE:
- http://example.org/Guangzhou
- http://example.org/U_S
- http://example.org/Latinos
- http://example.org/likePapabili
- http://example.org/Rome

🔍 Similar entities in model: DistMult
Entities similar to http://example.org/LOC using DistMult:
- https://www.npr.org/article/article_2
- http://example.org/John_Paul_II
- http://example.org/Pope_Francis
- http://example.org/Eugenio_Pacelli
- http://example.org/Easter

🔍 Similar entities in model: ComplEx
Entities similar to http://example.org/LOC using ComplEx:
- http://example.org/Vatican_City
- http://example.org/Universal_Shepherd
- http://example.org/Pope
- http://example.org/his2025
- http://example.org/Mass


def plot_entity_embeddings(model_name, model_results, sample_size=50):
    model = model_results[model_name].model
    tf = model_results[model_name].training
    entity_embeddings = preprocess_embeddings(model_name, model)
    
    # Reverse map: id -> label
    id_to_entity = {v: k for k, v in tf.entity_to_id.items()}
    
    # Dimensionality reduction
    tsne = TSNE(n_components=2, random_state=42)
    embeddings_2d = tsne.fit_transform(entity_embeddings)
    
    # Choose sample entities to show
    sample_indices = np.random.choice(len(embeddings_2d), size=min(sample_size, len(embeddings_2d)), replace=False)
    
    # Plot
    plt.figure(figsize=(12, 8))
    for idx in sample_indices:
        x, y = embeddings_2d[idx]
        label = id_to_entity[idx].split("/")[-1][:20]  # shorten long URIs
        plt.scatter(x, y, alpha=0.6)
        plt.annotate(label, (x, y), fontsize=8, alpha=0.75)

    plt.title(f"{model_name} - Entity Embeddings (t-SNE)")
    plt.grid(True)
    plt.tight_layout()
    plt.show()


for name in ['TransE', 'DistMult', 'ComplEx']:
    print(f"\n Visualizing embeddings for model: {name}")
    plot_entity_embeddings(name, model_results)

🔍 Visualizing embeddings for model: TransE

🔍 Visualizing embeddings for model: DistMult

🔍 Visualizing embeddings for model: ComplEx


print("Relations in model:")
for r in result.training.relation_to_id.keys():
    print(r)
    
# We list the available relations in the model to help choose valid triples for prediction.

Relations in model:
http://example.org/announce
http://example.org/appoint
http://example.org/attend
http://example.org/battistaannounce
http://example.org/bear
http://example.org/clash
http://example.org/content
http://example.org/criticize
http://example.org/date
http://example.org/die
http://example.org/encounter
http://example.org/lock
http://example.org/meet
http://example.org/move
http://example.org/praise
http://example.org/preside
http://example.org/reflect
http://example.org/sourceURL
http://example.org/study
http://example.org/tell
http://example.org/title
http://example.org/vote
http://example.org/write
http://www.w3.org/1999/02/22-rdf-syntax-ns#type


def predict_tail_entities(model, head, relation, triples_factory, k=5):
    import torch

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    # Convert head and relation to IDs
    try:
        head_id = torch.tensor([triples_factory.entity_to_id[head]], device=device)
        rel_id = torch.tensor([triples_factory.relation_to_id[relation]], device=device)
    except KeyError as e:
        print(f" Entity or relation not found: {e}")
        return []

    # Generate all possible tail triples (h, r, ?)
    all_tail_ids = torch.arange(triples_factory.num_entities, device=device)

    # Shape: [num_entities, 3]
    triples = torch.stack([
        head_id.repeat(len(all_tail_ids)),
        rel_id.repeat(len(all_tail_ids)),
        all_tail_ids
    ], dim=1)

    # Score all tail triples
    with torch.no_grad():
        scores = model.score_hrt(triples)

    # Check actual size
    if scores.dim() == 0 or scores.numel() == 0:
        print(" No scores returned by model.")
        return []

    # Ensure scores is a 1D tensor
    scores = scores.view(-1)
    actual_k = min(k, scores.shape[0])

    # Top-k scoring tails
    topk = torch.topk(scores, k=actual_k)
    top_indices = topk.indices.tolist()
    top_scores = topk.values.tolist()

    id_to_label = {v: k for k, v in triples_factory.entity_to_id.items()}
    return [(id_to_label[idx], score) for idx, score in zip(top_indices, top_scores)]


head = 'http://example.org/Pope_Francis'
relation = 'http://example.org/die'

for name, result in model_results.items():
    print(f"\n Predictions using {name}:")
    try:
        preds = predict_tail_entities(
            model=result.model,
            head=head,
            relation=relation,
            triples_factory=result.training,
            k=5
        )
        for entity, score in preds:
            print(f"- {entity:60}  score: {score:.4f}")
    except Exception as e:
        print(f" Error with {name}: {e}")

🔮 Predictions using TransE:
- http://example.org/Pope_Francis                               score: -5.7292
- http://example.org/Latinos                                    score: -7.4719
- http://example.org/Rome                                       score: -7.4900
- http://example.org/el_Papa_Francisco                          score: -8.1880
- http://example.org/Gaza                                       score: -8.2402

🔮 Predictions using DistMult:
- https://www.npr.org/2025/04/21/g-s1-61930/share-pope-francis-memories  score: 0.0907
- http://example.org/Imtiyaz_Khan                               score: 0.0899
- http://example.org/Eugenio_Pacelli                            score: 0.0791
- http://example.org/Jorge_Mario_Bergoglio                      score: 0.0755
- http://example.org/Vance                                      score: 0.0727

🔮 Predictions using ComplEx:
- http://example.org/Archdiocese_of_Washington                  score: 47.8624
- http://example.org/Lori                                       score: 38.7026
- https://www.npr.org/article/article_9                         score: 37.1092
- http://example.org/Pope_Pius                                  score: 36.9296
- 2025-04-21T06:38:53-04:00                                     score: 36.5162


print("Sample heads:", list(training.entity_to_id.keys())[:5])
print("Sample relations:", list(training.relation_to_id.keys())[:5])

Sample heads: ['2025-04-21T06:38:53-04:00', '2025-04-21T07:19:41-04:00', '2025-04-21T08:07:03-04:00', '2025-04-21T08:32:37-04:00', '2025-04-21T11:08:30-04:00']
Sample relations: ['http://example.org/announce', 'http://example.org/appoint', 'http://example.org/attend', 'http://example.org/battistaannounce', 'http://example.org/bear']


from pykeen.predict import predict_target


sample_head = list(model_results['TransE'].training.entity_labeling.label_to_id.keys())[0]
sample_relation = list(model_results['TransE'].training.relation_labeling.label_to_id.keys())[0]

print(f" Predicting: ({sample_head}, {sample_relation}, ?)")

# Store predictions
all_predictions = {}

for name, result in model_results.items():
    print(f" Predicting with model: {name}")
    
    try:
        # Run prediction
        pred_df = predict_target(
            model=result.model,
            head=sample_head,
            relation=sample_relation,
            triples_factory=result.training
        ).df

        # Keep only top 5
        top_preds = pred_df.head(5)[['tail_label', 'score']].copy()
        all_predictions[f'{name} Tail'] = top_preds['tail_label'].tolist()
        all_predictions[f'{name} Score'] = top_preds['score'].round(4).tolist()
    
    except Exception as e:
        print(f" Error with model {name}: {e}")
        all_predictions[f'{name} Tail'] = ['ERROR'] * 5
        all_predictions[f'{name} Score'] = [None] * 5

# Combine into a single DataFrame
df_all = pd.DataFrame(all_predictions)
pd.set_option('display.max_colwidth', None)
display(df_all)

🎯 Predicting: (2025-04-21T06:38:53-04:00, http://example.org/announce, ?)
🔮 Predicting with model: TransE
🔮 Predicting with model: DistMult
🔮 Predicting with model: ComplEx

	TransE Tail	TransE Score	DistMult Tail	DistMult Score	ComplEx Tail	ComplEx Score
0	2025-04-21T06:38:53-04:00	-5.8472	http://example.org/Ur	0.0979	http://example.org/Many_Africans	48.3942
1	http://example.org/University_of_Monterrey	-7.1925	http://example.org/Northfield	0.0946	http://example.org/Archdiocese_of_Washington	39.9880
2	http://example.org/the_age_of_88_He	-7.2320	http://example.org/Vatican_Pool	0.0941	https://www.npr.org/article/article_9	36.6563
3	http://example.org/Vaticanuntil	-7.3815	https://www.npr.org/article/article_8	0.0928	http://example.org/Square	34.9605
4	http://example.org/Northfield	-7.3889	http://example.org/Judaism	0.0921	http://example.org/God	29.8242

Project: Web scrapping, knowledge base construction¶

Part 1: Web scrapping and knowledge base construction¶

Task 1: Model for NER¶

1.Text Cleaning & Preprocessing:¶

2. Named Entity Recognition (NER)¶

3. Relation Extraction (RE):¶

4. Knowledge Graph Building:¶

Task 2: Pipeline for Knowledge Graph Construction¶

1. Fetch News Articles:¶

2. Use Methods from Task 1:¶

Part 2: Knowledge Graph Embedding¶