distributions package

Submodules

distributions.base_distribution module

class distributions.base_distribution.BaseDistribution(scoring_function)[source]

Bases: Distribution

Base distribution class, which can be used to build an EBM.

constrain(features, moments=None, proposal=None, context_distribution=<distributions.single_context_distribution.SingleContextDistribution object>, context_sampling_size=1, n_samples=512, iterations=1000, learning_rate=0.05, tolerance=1e-05, sampling_size=32)[source]

Constrains features to the base according to their moments, so producing an EBM

Parameters:
features: list(feature)

multiple features to constrain

moments: list(float)

moments for the features. There should be as many moments as there are features

proposal: distribution

distribution to sample from, if different from self

context_distribution: distribution

to contextualize the sampling and scoring

context_sampling_size:

size of the batch when sampling context

n_samples: int

number of samples to use to fit the coefficients

learning_rate: float

multipliers of the delta used when fitting the coefficients

tolerance: float

accepted difference between the targets and moments

sampling_size:

size of the batch when sampling samples

Returns:
exponential scorer with fitted coefficients

distributions.context_distribution module

class distributions.context_distribution.ContextDistribution(path='contexts.txt')[source]

Bases: Distribution

Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.

log_score(contexts)[source]

Computes log-probabilities of the contexts

Parameters:
contexts: list(str)

list of contexts to (log-)score

Returns:
tensor of logprobabilities
sample(sampling_size=32)[source]

Samples random elements from the list of contexts

Parameters:
sampling_size: int

number of contexts to sample

Returns:
tuple of (list of texts, tensor of logprobs)

distributions.dataset_context_distribution module

class distributions.dataset_context_distribution.DatasetContextDistribution(dataset='', subset='', split='train', key='text', prefix='')[source]

Bases: Distribution

Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.

log_score(contexts)[source]

Computes plausible log-probabilities of the contexts. Note that there’s no check that the context are part of the dataset, hence the plausible qualifier.

Parameters:
contexts: list(str)

list of contexts to (log-)score

Returns:
tensor of logprobabilities
sample(sampling_size=32)[source]

Samples random elements from the list of contexts

Parameters:
sampling_size: int

number of contexts to sample

Returns:
tuple of (list of texts, tensor of logprobs)

distributions.distribution module

class distributions.distribution.Distribution(scoring_function)[source]

Bases: PositiveScorer

Abstract distribution class, a core entity which can be introduced as a PositiveScorer that can produce samples.

abstract sample(context)[source]

Produces samples for the context from the distribution.

distributions.lm_distribution module

class distributions.lm_distribution.LMDistribution(network='gpt2', tokenizer='gpt2', nature='causal', freeze=True, length=40, device='cpu', **config)[source]

Bases: BaseDistribution

Language model distribution class, a core class for all NLP use-cases, relying on Huggingface’s Transformers library.

freeze(frozen=True)[source]

Freeze (or unfreeze) parameters for gradient computation.

Parameters:
frozen: boolean (True)

state to transition to, default is to freeze

log_score(samples, context='', grad=False, sum=True)[source]

Computes log-probabilities for the samples according to the language model network in the given context

Parameters:
samples: list(Sample)

samples to (log-)score as a list()

context: text

context for which to (log-)score the samples

grad: boolean

flag to eventually compute the gradients, e.g. when fitting

sum: boolean

flag to eventually return token-level tensor of scores

Returns:
tensor of log-probabilities
sample(context='', sampling_size=32, sum=True)[source]

Samples sequences from the language model in the given context

Parameters:
context: text

contextual text for which to sample

sampling_size: int

number of sequences to sample

sum: Boolean

flag to eventually return token-level tensor of scores

Returns:
tuple of (list of Sample(tokens, text), tensor of logprobs)
to(device)[source]
class distributions.lm_distribution.TextSample(token_ids, text)

Bases: tuple

property text

Alias for field number 1

property token_ids

Alias for field number 0

distributions.single_context_distribution module

class distributions.single_context_distribution.SingleContextDistribution(context='')[source]

Bases: Distribution

Single context distribution class, useful to sample the same context that is to fall back to a fixed-context case.

log_score(contexts)[source]

Computes log-probabilities of the contexts to match the instance’s context

Parameters:
contexts: list(str)

list of contexts to (log-)score

Returns:
tensor of log-probabilities
sample(sampling_size=32)[source]

Samples multiple copies of the instance’s context

Parameters:
sampling_size: int

number of contexts to sample

Returns:
tuple of (list of texts, tensor of log-probabilities)

Module contents