distributions package¶

Submodules¶

distributions.base_distribution module¶

class distributions.base_distribution.BaseDistribution(scoring_function)[source]¶

Bases: Distribution

Base distribution class, which can be used to build an EBM.

constrain(features, moments=None, proposal=None, context_distribution=<distributions.single_context_distribution.SingleContextDistribution object>, context_sampling_size=1, n_samples=512, iterations=1000, learning_rate=0.05, tolerance=1e-05, sampling_size=32)[source]¶

Constrains features to the base according to their moments, so producing an EBM

Parameters:

features: list(feature): multiple features to constrain
moments: list(float): moments for the features. There should be as many moments as there are features
proposal: distribution: distribution to sample from, if different from self
context_distribution: distribution: to contextualize the sampling and scoring
context_sampling_size:: size of the batch when sampling context
n_samples: int: number of samples to use to fit the coefficients
learning_rate: float: multipliers of the delta used when fitting the coefficients
tolerance: float: accepted difference between the targets and moments
sampling_size:: size of the batch when sampling samples

Returns:

exponential scorer with fitted coefficients

distributions.context_distribution module¶

class distributions.context_distribution.ContextDistribution(path='contexts.txt')[source]¶

Bases: Distribution

Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.

log_score(contexts)[source]¶

Computes log-probabilities of the contexts

Parameters:

contexts: list(str): list of contexts to (log-)score

Returns:

tensor of logprobabilities

sample(sampling_size=32)[source]¶

Samples random elements from the list of contexts

Parameters:

sampling_size: int: number of contexts to sample

Returns:

tuple of (list of texts, tensor of logprobs)

distributions.dataset_context_distribution module¶

class distributions.dataset_context_distribution.DatasetContextDistribution(dataset='', subset='', split='train', key='text', prefix='')[source]¶

Bases: Distribution

Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.

log_score(contexts)[source]¶

Computes plausible log-probabilities of the contexts. Note that there’s no check that the context are part of the dataset, hence the plausible qualifier.

Parameters:

contexts: list(str): list of contexts to (log-)score

Returns:

tensor of logprobabilities

sample(sampling_size=32)[source]¶

Samples random elements from the list of contexts

Parameters:

sampling_size: int: number of contexts to sample

Returns:

tuple of (list of texts, tensor of logprobs)

distributions.distribution module¶

class distributions.distribution.Distribution(scoring_function)[source]¶

Bases: PositiveScorer

Abstract distribution class, a core entity which can be introduced as a PositiveScorer that can produce samples.

abstract sample(context)[source]¶: Produces samples for the context from the distribution.

distributions.lm_distribution module¶

class distributions.lm_distribution.LMDistribution(network='gpt2', tokenizer='gpt2', nature='causal', freeze=True, length=40, device='cpu', **config)[source]¶

Bases: BaseDistribution

Language model distribution class, a core class for all NLP use-cases, relying on Huggingface’s Transformers library.

freeze(frozen=True)[source]¶

Freeze (or unfreeze) parameters for gradient computation.

Parameters:

frozen: boolean (True): state to transition to, default is to freeze

log_score(samples, context='', grad=False, sum=True)[source]¶

Computes log-probabilities for the samples according to the language model network in the given context

Parameters:

samples: list(Sample): samples to (log-)score as a list()
context: text: context for which to (log-)score the samples
grad: boolean: flag to eventually compute the gradients, e.g. when fitting
sum: boolean: flag to eventually return token-level tensor of scores

Returns:

tensor of log-probabilities

sample(context='', sampling_size=32, sum=True)[source]¶

Samples sequences from the language model in the given context

Parameters:

context: text: contextual text for which to sample
sampling_size: int: number of sequences to sample
sum: Boolean: flag to eventually return token-level tensor of scores

Returns:

tuple of (list of Sample(tokens, text), tensor of logprobs)

to(device)[source]¶

class distributions.lm_distribution.TextSample(token_ids, text)¶

Bases: tuple

property text¶: Alias for field number 1

property token_ids¶: Alias for field number 0

distributions.single_context_distribution module¶

class distributions.single_context_distribution.SingleContextDistribution(context='')[source]¶

Bases: Distribution

Single context distribution class, useful to sample the same context that is to fall back to a fixed-context case.

log_score(contexts)[source]¶

Computes log-probabilities of the contexts to match the instance’s context

Parameters:

contexts: list(str): list of contexts to (log-)score

Returns:

tensor of log-probabilities

sample(sampling_size=32)[source]¶

Samples multiple copies of the instance’s context

Parameters:

sampling_size: int: number of contexts to sample

Returns:

tuple of (list of texts, tensor of log-probabilities)

distributions package¶

Submodules¶

distributions.base_distribution module¶

distributions.context_distribution module¶

distributions.dataset_context_distribution module¶

distributions.distribution module¶

distributions.lm_distribution module¶

distributions.single_context_distribution module¶

Module contents¶

Table of Contents

This Page