Generating Synthetic LLM Training Data with datakeg
I’ve been keen to have a play with fine-tuning a local LLM for a little while now, and something I’ve wanted is the ability to easily turn documentation into training data. Ultimately, I’d like to turn documention into an LLM so you can use it as a local expert without needing RAG, but the first step is creating the training data to make that possible, so I created datakeg, a small CLI tool which helps synthasise training data. ...
