Creating a new Wikipedia based qna.yaml
In this tutorial we will walk you through building out a new qna.yaml
for adding new or updated knowledge to the granite-7b-lab
model. Let's get started!
mkdir instructlab
git clone git@github.com:erictherobot/wikipedia-markdown-generator.git
erictherobot
has written a helpful tool to pull down markdown versions of the articles for us.
git clone git@github.com:<USERNAME>/instructlab-knowledge-docs.git
After this, clone down your instructlab knowledge docs repository. It can be named whatever you'd like, but if you use our https://ui.instructlab.ai, you'll notice you already have instructlab-knowledge-docs
.
cd wikipedia-markdown-generator
python3.11 -m venv venv-md-gen
source venv-md-gen/bin/activate
pip install -r requirements
python3 wiki-to-md.py Texas_Longhorns_football
Texas_Longhorns_football
there, a Wikipedia article I wanted to pull down and create the qna.yaml
against. You should choose whatever new knowledge you want to do here.
cp md_output/Texas_Longhorns_football.md ../../instructlab-knowledge-docs/
cd ../../instructlab-knowledge-docs
git add .
git commit -m "added markdown doc"
git push origin main
cd ..
Next, we go ahead and copy the markdown into the knowledge repository, and commit it to our repository and push it up to GitHub.
git clone git@github.com/instructlab/taxonomy
cd taxonomy
Next we pull down the upstream public taxonomy directory, and cd
into that directory.
mkdir -p arts/sports/american_football/college/university_of_texas/
qna.yaml
. In this case, the Dewey Decimal System says sports should be under arts; this is American Football, college level with the University of Texas. Also, notice the underscores for the spaces; this is important.
wget https://raw.githubusercontent.com/instructlab/taxonomy/main/docs/template_qna.yaml
mv template_qna.yaml sports/american_football/college/university_of_texas/qna.yaml
template_qna.yaml
and fill it out for the needed questions and answers. Be sure to put the context at a maximum of about 500 Tokens and questions and answers around 250 Tokens.
vim sports/american_football/college/university_of_texas/qna.yaml