LLooM Operators
The LLooM Operators are a lower-level API for the operators that underlie the LLooM algorithm. The operators are defined as the concept_induction
module within the text_lloom
Python package. The LLooM Workbench module calls these functions internally to carry out concept induction.
import text_lloom.concept_induction as ci
Core operators:
Distill
: Shards out and scales down data to the context window while preserving salient details.Distill-filter
: Performs extractive summarization that selects exact quotes from the original text.Distill-summarize
: Performs abstractive summarization in the form of bullet point text summaries.
Cluster
: Recombines shards from the Distill step into groupings that share enough meaningful overlap to induce meaningful rather than surface-level conceptsSynthesize
: Prompts the model to generalize from provided examples to generate concept descriptions and criteria in natural language.Score
: Labels all text documents by applying concept criteria expressed as zero- shot prompts.
Additional operators:
Seed
: Allows the user to steer concept induction. Accepts a user-provided seed term to condition the Distill or Synthesize operators, which can improve the quality and alignment of the output concepts.Loop
: Further iterates on concepts by looping back to concept generation after scoring.
🚧 Under construction
Detailed documentation coming soon!
distill_filter
distill_filter(text_df, doc_col, doc_id_col, model_name, n_quotes=3, seed=None, sess=None)
distill_summarize
distill_summarize(text_df, doc_col, doc_id_col, model_name, n_bullets="2-4", n_words_per_bullet="5-8", seed=None, sess=None):
cluster
cluster(text_df, doc_col, doc_id_col, cluster_id_col="cluster_id", min_cluster_size=None, embed_model_name="text-embedding-ada-002", batch_size=20, randomize=False, sess=None)
synthesize
synthesize(cluster_df, doc_col, doc_id_col, model_name, cluster_id_col="cluster_id", concept_col_prefix="concept", n_concepts=None, batch_size=None, verbose=False, pattern_phrase="unifying pattern", dedupe=True, seed=None, sess=None, return_logs=False)
score_concepts
score_concepts(text_df, text_col, doc_id_col, concepts, model_name="gpt-3.5-turbo", batch_size=5, get_highlights=False, sess=None, threshold=1.0)
loop
loop(score_df, doc_col, doc_id_col, debug=False)
review
review(concepts, concept_df, concept_col_prefix, model_name, debug=False, sess=None, return_logs=False)