https://joenasriani.github.io/ai-knowledge-graph

This system takes an unstructured text document, and uses an LLM of your choice to extract knowledge in the form of Subject-Predicate-Object (SPO) triplets, and visualizes the relationships as an interactive knowledge graph. A demo of a knowlege graph created with this project can be found here: Industrial-Revolution Knowledge Graph
pip install -r requirements.txt or uv sync)pip install -r requirements.txtconfig.tomlpython generate-graph.py --input your_text_file.txt --output knowledge_graph.html
Or with UV:
uv run generate-graph.py --input your_text_file.txt --output knowledge_graph.html
Or installing and using as a module:
pip install --upgrade -e .
generate-graph --input your_text_file.txt --output knowledge_graph.html
The system can be configured using the config.toml file:
[llm]
model = "gemma3" # Google open weight model
api_key = "sk-1234"
base_url = "http://localhost:11434/v1/chat/completions" # Local Ollama instance running locally (but can be any OpenAI compatible endpoint)
max_tokens = 8192
temperature = 0.2
[chunking]
chunk_size = 200 # Number of words per chunk
overlap = 20 # Number of words to overlap between chunks
[standardization]
enabled = true # Enable entity standardization
use_llm_for_entities = true # Use LLM for additional entity resolution
[inference]
enabled = true # Enable relationship inference
use_llm_for_inference = true # Use LLM for relationship inference
apply_transitive = true # Apply transitive inference rules
--input FILE: Input text file to process--output FILE: Output HTML file for visualization (default: knowledge_graph.html)--config FILE: Path to config file (default: config.toml)--debug: Enable debug output with raw LLM responses--no-standardize: Disable entity standardization--no-inference: Disable relationship inference--test: Generate sample visualization using test datagenerate-graph --help
usage: generate-graph [-h] [--test] [--config CONFIG] [--output OUTPUT] [--input INPUT] [--debug] [--no-standardize] [--no-inference]
Knowledge Graph Generator and Visualizer
options:
-h, --help show this help message and exit
--test Generate a test visualization with sample data
--config CONFIG Path to configuration file
--output OUTPUT Output HTML file path
--input INPUT Path to input text file (required unless --test is used)
--debug Enable debug output (raw LLM responses and extracted JSON)
--no-standardize Disable entity standardization
--no-inference Disable relationship inference
Command:
generate-graph --input data/industrial-revolution.txt --output industrial-revolution-kg.html
Console Output:
Using input text from file: data/industrial-revolution.txt
==================================================
PHASE 1: INITIAL TRIPLE EXTRACTION
==================================================
Processing text in 13 chunks (size: 100 words, overlap: 20 words)
Processing chunk 1/13 (100 words)
Processing chunk 2/13 (100 words)
Processing chunk 3/13 (100 words)
Processing chunk 4/13 (100 words)
Processing chunk 5/13 (100 words)
Processing chunk 6/13 (100 words)
Processing chunk 7/13 (100 words)
Processing chunk 8/13 (100 words)
Processing chunk 9/13 (100 words)
Processing chunk 10/13 (100 words)
Processing chunk 11/13 (100 words)
Processing chunk 12/13 (86 words)
Processing chunk 13/13 (20 words)
Extracted a total of 216 triples from all chunks
==================================================
PHASE 2: ENTITY STANDARDIZATION
==================================================
Starting with 216 triples and 201 unique entities
Standardizing entity names across all triples...
Applied LLM-based entity standardization for 15 entity groups
Standardized 201 entities into 181 standard forms
After standardization: 216 triples and 160 unique entities
==================================================
PHASE 3: RELATIONSHIP INFERENCE
==================================================
Starting with 216 triples
Top 5 relationship types before inference:
- enables: 20 occurrences
- impacts: 15 occurrences
- enabled: 12 occurrences
- pioneered: 10 occurrences
- invented: 9 occurrences
Inferring additional relationships between entities...
Identified 9 disconnected communities in the graph
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 3 new relationships between communities
Inferred 9 new relationships within communities
Inferred 2 new relationships within communities
Inferred 88 relationships based on lexical similarity
Added -22 inferred relationships
Top 5 relationship types after inference:
- related to: 65 occurrences
- advances via Artificial Intelligence: 36 occurrences
- pioneered via computing: 26 occurrences
- enables via computing: 24 occurrences
- enables: 21 occurrences
Added 370 inferred relationships
Final knowledge graph: 564 triples
Saved raw knowledge graph data to /mnt/c/Users/rmcdermo/Documents/industrial-revolution-kg.json
Processing 564 triples for visualization
Found 161 unique nodes
Found 355 inferred relationships
Detected 9 communities using Louvain method
Nodes in NetworkX graph: 161
Edges in NetworkX graph: 537
Knowledge graph visualization saved to /mnt/c/Users/rmcdermo/Documents/industrial-revolution-kg.html
Graph Statistics: {
"nodes": 161,
"edges": 564,
"original_edges": 209,
"inferred_edges": 355,
"communities": 9
}
Knowledge Graph Statistics:
Nodes: 161
Edges: 564
Communities: 9
To view the visualization, open the following file in your browser:
file:///mnt/c/Users/rmcdermo/Documents/industrial-revolution-kg.html
process_with_llm functionstandardization.use_llm_for_entities config)inference.use_llm_for_inference config)Both the second and third passes are optional and can be disabled in the configuration to minimize LLM usage or control these processes manually.
.
├── config.toml # Main configuration file for the system
├── generate-graph.py # Entry point when run directly as a script
├── pyproject.toml # Python project metadata and build configuration
├── requirements.txt # Python dependencies for 'pip' users
├── uv.lock # Python dependencies for 'uv' users
└── src/ # Source code
├── generate_graph.py # Main entry point script when run as a module
└── knowledge_graph/ # Core package
├── __init__.py # Package initialization
├── config.py # Configuration loading and validation
├── entity_standardization.py # Entity standardization algorithms
├── llm.py # LLM interaction and response processing
├── main.py # Main program flow and orchestration
├── prompts.py # Centralized collection of LLM prompts
├── text_utils.py # Text processing and chunking utilities
├── visualization.py # Knowledge graph visualization generator
└── templates/ # HTML templates for visualization
└── graph_template.html # Base template for interactive graph
This diagram illustrates the program flow.
flowchart TD
%% Main entry points
A[main.py - Entry Point] --> B{Parse Arguments}
%% Test mode branch
B -->|--test flag| C[sample_data_visualization]
C --> D[visualize_knowledge_graph]
%% Normal processing branch
B -->|normal processing| E[load_config]
E --> F[process_text_in_chunks]
%% Text processing
F --> G[chunk_text]
G --> H[process_with_llm]
%% LLM processing
H --> I[call_llm]
I --> J[extract_json_from_text]
%% Entity standardization phase
F --> K{standardization enabled?}
K -->|yes| L[standardize_entities]
K -->|no| M{inference enabled?}
L --> M
%% Relationship inference phase
M -->|yes| N[infer_relationships]
M -->|no| O[visualize_knowledge_graph]
N --> O
%% Visualization components
O --> P[_calculate_centrality_metrics]
O --> Q[_detect_communities]
O --> R[_calculate_node_sizes]
O --> S[_add_nodes_and_edges_to_network]
O --> T[_get_visualization_options]
O --> U[_save_and_modify_html]
%% Subprocesses
L --> L1[_resolve_entities_with_llm]
N --> N1[_identify_communities]
N --> N2[_infer_relationships_with_llm]
N --> N3[_infer_within_community_relationships]
N --> N4[_apply_transitive_inference]
N --> N5[_infer_relationships_by_lexical_similarity]
N --> N6[_deduplicate_triples]
%% File outputs
U --> V[HTML Visualization]
F --> W[JSON Data Export]
%% Prompts usage
Y[prompts.py] --> H
Y --> L1
Y --> N2
Y --> N3
%% Module dependencies
subgraph Modules
main.py
config.py
text_utils.py
llm.py
entity_standardization.py
visualization.py
prompts.py
end
%% Phases
subgraph Phase 1: Triple Extraction
G
H
I
J
end
subgraph Phase 2: Entity Standardization
L
L1
end
subgraph Phase 3: Relationship Inference
N
N1
N2
N3
N4
N5
N6
end
subgraph Phase 4: Visualization
O
P
Q
R
S
T
U
end
Entry Point: The program starts in main.py which parses command-line arguments.
--test flag is provided, it generates a sample visualizationConfiguration: Loads settings from config.toml using config.py
text_utils.pyprompts.py to guide the LLM’s extraction processprompts.py for entity resolutionprompts.py for relationship inference