Skip to main contentAlchemist
Alchemist is a powerful platform designed to streamline the process of creating instruction fine-tuning datasets for
language models. By focusing on dataset curation rather than the fine-tuning process itself, Alchemist empowers users to
build high-quality datasets efficiently, setting the stage for more effective model training.
Sample Ingestion
The journey with Alchemist begins by uploading prompt logs from existing systems. These logs serve as the raw material
for creating your fine-tuning dataset. Alchemist’s user-friendly interface makes it easy to import large volumes of
data, ensuring that you have a rich pool of samples to work with. This initial step is crucial as it lays the foundation
for the entire curation process.
Data Curation
Once your data is uploaded, Alchemist provides a robust set of tools for searching through and curating a subset of
samples that best represent your desired outcomes. This curation process combines manual search capabilities with
Alchemist’s proprietary algorithms, creating a semi-automated, human-in-the-loop workflow. Users can leverage advanced
search functionalities to identify relevant samples, while Alchemist’s algorithms assist in surfacing potentially
valuable data points that might otherwise be overlooked. This hybrid approach ensures both efficiency in processing
large datasets and the quality that comes from human oversight.
Instruction Generation
After curating your dataset, Alchemist takes the process a step further by automatically generating instructions from
the samples in your curated set. These instructions are tailored to fit the format required by your chosen model or
platform, whether it’s the Llama format for AWS or the OpenAI format for Azure. This versatility ensures that your
curated dataset can be seamlessly integrated into various fine-tuning pipelines. While Alchemist doesn’t perform the
actual fine-tuning, it provides you with a meticulously prepared dataset, formatted and ready for use, significantly
reducing the time and effort required to prepare data for model fine-tuning.