pyKE

Welcome to pyKE’s documentation. In this documentation you can find all information about the project.

README

An Open-source library for Knowledge Embedding forked from github.org/thunlp/OpenKE. The original API changed drastically to be more pythonic.

Overview

This is an implementation based on [TensorFlow](http://www.tensorflow.org) for knowledge representation learning (KRL). It includes native C++ implementations for underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs.

Installation

  1. Clone repository and enter directory

    git clone https://github.com/ifis-tu-bs/pyKE.git
    cd pyKE
    
  2. Install package

    python setup.py install
    

Quickstart

To compute a knowledge graph embedding, first import datasets and set configure parameters for training, then train models and export results. Here is an example to train the FB15K dataset with the TransE model.

from pyke.dataset import Dataset
from pyke.embedding import Embedding
from pyke.models import TransE

# Read the dataset
dataset = Dataset("./benchmarks/fb15k.nt")
embedding = Embedding(
    dataset,
    TransE,
    folds=20,
    epochs=20,
    neg_ent=1,
    neg_rel=0,
    bern=False,
    workers=4,
    dimension=50,  # TransE-specific
    margin=1.0,  # TransE-specific
)

# Train the model. It is saved in the process.
embedding.train(prefix="./TransE", post_epoch=print)

# Save the embedding to a JSON file
embedding.save_to_json("TransE.json")

Interfaces

The class pyke.embedding.Embedding represents an embedding which requires a dataset and a model class. Initialize your data set in form of a N-triples file with the class pyke.dataset.Dataset.

Models

The class pyke.models.base.BaseModel declares the methods that all implemented model classes share, including the loss function neccessairy for training (inserting information into the model) and prediction (aka. retrieving information from the model). This project implements the following model classes:

  • RESCAL
  • TransE
  • TransH
  • TransR
  • TransD
  • HolE
  • ComplEx
  • DistMult

Notes

The original fork consists of a C++ library which is compiled once you use the project. Please note, that the compilation is only supported on UNIX-based systems. In the future the C++ library should be replaced by a python library.

API reference

pyke package

Subpackages

pyke.models package
Submodules
pyke.models.ComplEx module
pyke.models.DistMult module
pyke.models.HolE module
pyke.models.RESCAL module
pyke.models.TransD module
pyke.models.TransE module
pyke.models.TransH module
pyke.models.TransR module
pyke.models.base module
class pyke.models.base.BaseModel(ent_count=None, rel_count=None, batch_size=0, variants=0, optimizer=None, norm_func=<function l1>, per_process_gpu_memory_fraction=0.5)[source]

Bases: object

Properties and behaviour that different embedding models share.

entity(head=None)[source]

Embeds a batch of subjects.

fit(head, tail, label, score)[source]

Trains the model on a batch of weighted statements.

get_all_instance(in_batch=False)[source]
get_all_labels(in_batch=False)[source]
get_negative_instance(in_batch=True)[source]
get_positive_instance(in_batch=True)[source]
get_predict_instance()[source]
predict(head, tail, label)[source]

Evaluates the model’s scores on a batch of statements.

relation(label=None)[source]

Embeds a batch of predicates.

restore(prefix: str)[source]

Reads a model from filesystem.

Parameters:prefix – Model prefix of the model to laod
save(prefix: str, step: int = None)[source]

Save the model to filesystem.

Parameters:
  • prefix – File prefix for the model
  • step – Step of the model (appended to prefix)
save_to_json(filename: str)[source]

Save the embedding as JSON file. The JSON file contains the embedding parameters (e.g. entity and relation matrices). These parameters depend on the model.

Parameters:filename – Filename for the output JSON file
Module contents

Submodules

pyke.dataset module

pyke.embedding module

pyke.library module

class pyke.library.Library[source]

Bases: object

Manages the connection to the library.

CPP_BASE = 'cpp_library/Base.cpp'
MAKE_SCRIPT = 'cpp_library/make.sh'
static compile_library(destination: str)[source]

Compile the library to the path destination.

Parameters:destination – path for the library
static get_library(temp_dir: str = None, library_name: str = None)[source]

Return the C++ library. The function compiles it if it doesn’t exist and it loads the library.

Parameters:
  • temp_dir – directory where the library is saved (optional)
  • library_name – filename of the library
Returns:

c++ library

library = None
library_name = 'pyke.so'
static load_library(path: str)[source]

Loads the library from path.

Parameters:path – path to the library (.so)
temp_dir = '.pyke'

pyke.norm module

pyke.norm.l1(vectors)[source]

Implements the l1 norm on a vectorspace.

Parameters vectors - Tensor of dimension at least one, returning vectors whose norm is to be computed.

Return Value Tensor of reduced dimension returning the norms. The order is preserved.

pyke.norm.l2(vectors)[source]

Implements the euclidean norm on a vectorspace.

Parameters vectors - Tensor of dimension at least one, returning vectors whose norm is to be computed.

Return Value Tensor of reduced dimension returning the norms. The order is preserved.

pyke.parser module

pyke.utils module

pyke.utils.get_array_pointer(a)[source]

Returns the address of the numpy array.

Parameters:a – Numpy array
Returns:Memory address of the array
pyke.utils.md5(filename: str)[source]

Returns the MD5-hashsum of a file.

Parameters:filename – Filename
Returns:MD5-hashsum of the file
pyke.utils.split_nt_line(line: str)[source]

Splits a line from a N-triples file into subject, predicate and object.

Parameters:line – Line from a N-triples file
Returns:tuple with subject, predicate, object

Module contents

Indices and tables