distributions package¶
Submodules¶
distributions.base_distribution module¶
- class distributions.base_distribution.BaseDistribution(scoring_function)[source]¶
Bases:
Distribution
Base distribution class, which can be used to build an EBM.
- constrain(features, moments=None, proposal=None, context_distribution=<distributions.single_context_distribution.SingleContextDistribution object>, context_sampling_size=1, n_samples=512, iterations=1000, learning_rate=0.05, tolerance=1e-05, sampling_size=32)[source]¶
Constrains features to the base according to their moments, so producing an EBM
- Parameters:
- features: list(feature)
multiple features to constrain
- moments: list(float)
moments for the features. There should be as many moments as there are features
- proposal: distribution
distribution to sample from, if different from self
- context_distribution: distribution
to contextualize the sampling and scoring
- context_sampling_size:
size of the batch when sampling context
- n_samples: int
number of samples to use to fit the coefficients
- learning_rate: float
multipliers of the delta used when fitting the coefficients
- tolerance: float
accepted difference between the targets and moments
- sampling_size:
size of the batch when sampling samples
- Returns:
- exponential scorer with fitted coefficients
distributions.context_distribution module¶
- class distributions.context_distribution.ContextDistribution(path='contexts.txt')[source]¶
Bases:
Distribution
Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.
distributions.dataset_context_distribution module¶
- class distributions.dataset_context_distribution.DatasetContextDistribution(dataset='', subset='', split='train', key='text', prefix='')[source]¶
Bases:
Distribution
Context distribution class, fetching the contexts from a text file. It can be used as a template for other context distributions.
distributions.distribution module¶
distributions.lm_distribution module¶
- class distributions.lm_distribution.LMDistribution(network='gpt2', tokenizer='gpt2', nature='causal', freeze=True, length=40, device='cpu', **config)[source]¶
Bases:
BaseDistribution
Language model distribution class, a core class for all NLP use-cases, relying on Huggingface’s Transformers library.
- freeze(frozen=True)[source]¶
Freeze (or unfreeze) parameters for gradient computation.
- Parameters:
- frozen: boolean (True)
state to transition to, default is to freeze
- log_score(samples, context='', grad=False, sum=True)[source]¶
Computes log-probabilities for the samples according to the language model network in the given context
- Parameters:
- samples: list(Sample)
samples to (log-)score as a list()
- context: text
context for which to (log-)score the samples
- grad: boolean
flag to eventually compute the gradients, e.g. when fitting
- sum: boolean
flag to eventually return token-level tensor of scores
- Returns:
- tensor of log-probabilities
- sample(context='', sampling_size=32, sum=True)[source]¶
Samples sequences from the language model in the given context
- Parameters:
- context: text
contextual text for which to sample
- sampling_size: int
number of sequences to sample
- sum: Boolean
flag to eventually return token-level tensor of scores
- Returns:
- tuple of (list of Sample(tokens, text), tensor of logprobs)
distributions.single_context_distribution module¶
- class distributions.single_context_distribution.SingleContextDistribution(context='')[source]¶
Bases:
Distribution
Single context distribution class, useful to sample the same context that is to fall back to a fixed-context case.