Skip to content

Community Model Build Process

Note

This document is the Community Build Process, these are the general steps to get the cmb built.

Community Model Build diagram

Add the PRs to the local tree

Add the PRs you want to be built into the run. Tag the PRs with "cmb-running."

mkdir -p compositional_skills/general/synonyms
vi compositional_skills/general/synonyms/attribution.txt
vi compositional_skills/general/synonyms/qna.yaml

Verify changes

ilab taxonomy diff

Warning

~/.local/share/instructlab/datasets -- should be empty before starting Every gpu should be "empty", or 0% check with nvidia-smi

Create the data

ilab data generate

Run the training after the generate is complete

ilab model train --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/knowledge_train_msgs_XXXXXXX.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/skills_train_msgs_XXXXXXX.jsonl

Post training evaluation steps

If you want to send a sanity check, you can set these two variables to do a subset of the training:

export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # mtbench
export INSTRUCTLAB_EVAL_MMLU_MIN_TASKS=true # mmlu

(optional in case of sanity of a specific Sample Model creation)

ilab model evaluate --benchmark mt_bench --model ~/.local/share/instructlab/checkpoints/hf_format/samples_XXXXXX

Tip

We should do the revaluation because we want to reverify the numbers before going any farther.

General Benchmarking

  • mmlu: general model knowledge, general facts, it's a knowledge number out of 100
  • mt_bench: is a skill based, extraction, etc, out of 10

Note

we want around 7.1 for mt_bench average for a model candidate

Specific Benchmarking

mmlu_branch: these are specific to the general knowledge

ilab model evaluate --benchmark mmlu_branch --model ~/.local/share/checkpoints/hf_format/<checkpoint> --tasks-dir ~/.local/share/instructlab/datasets/<node-dataset> --base-model ~/.cache/instructlab/models/granite-7b-redhat-lab

mt_bench_branch: these are specific for the skills

ilab model evaluate --benchmark mt_bench_branch --model ~/.local/share/checkpoints/hf_format/<checkpoint> --taxonomy-path ~/.local/share/instructlab/taxonomy --judge-model ~/.cache/instructlab/models/prometheus-8x7b-v2-0 --base-model ~/.cache/instructlab/models/granite-7b-redhat-lab --base-branch main --branch main

Hosting the release candidates

rsync over the files

mkdir $(date +%F)
cd $(date +%F)
rsync --info=progress2 -avz -e <USERNAME>@<REMOTE>:~/.local/share/checkpoints/hf_format/samples_xxxxx ./

Set up (if needed)

python3.11 -m venv venv
source venv/bin/activate
pip install vllm
./run.sh

run.sh

#!/bin/bash

DIRECTORY=$1

DATE=$(date +%F)
RANDOM_STRING=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 10; echo)
RANDOM_PORT=$(shuf -i 8001-8800 -n 1)
API_KEY=$RANDOM_STRING-$DATE

echo "$DIRECTORY,$API_KEY,$RANDOM_PORT" >> model_hosting.csv

echo "ilab model chat --endpoint-url http://cmb-staging.DOMAIN.xx:$RANDOM_PORT/v1 --api-key $API_KEY --model $DIRECTORY" >> model_ilab_scripting.sh

python -m vllm.entrypoints.openai.api_server --model $DIRECTORY --api-key $API_KEY --host 0.0.0.0 --port $RANDOM_PORT --tensor-parallel-size 2

Find the ilab random command to host the model, send that on after the PR letter

cat model_ilab_scripting.sh

Form letter for PRs

Hi! 👋 Thank you for submitting this PR. We are ready to do some validation now, and we have a few candidates to see if they improve the model. We some resources to run these release candidates, but we need you to help us. Can you reach out to me either on Slack (@awesome) or email me at awesomeATinstructlab.ai so I can get you access via ilab model chat? We can only run these models for a "week" or so, so please reach out as soon as possible and tell me which one is best for you on this PR.

With confirmed success

With confirmed success, tag the PR with "ready-for-merge" and remove the "community-build-ready" tags. Wait till the "week" before shutting down the staging instance, and merge in all the ones that have been tagged.

Steps to Merge and Release

After you have merged in the PRs to the taxonomy, now you need to push this to huggingface, if you don't have access to HuggingFace, you will need to find someone to add you to it ;).

1) Clone down the repository on the staging box if you haven't already

git clone https://huggingface.co/instructlab/granite-7b-lab
cd granite-7b-lab
vi .git/config
# url = git@hf.co:instructlab/granite-7b-lab
# verify you can authenticate with hf.com: ssh -T git@hf.co
2) Copy in the samples_xxxx into the granite-7b-lab 3) git add . && git commit 4) Write up a good commit message 5) tag and push
git tag cmb-run-XXXXX
git push origin main
git push origin cmb-run-XXXXX

Convert to gguf

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp/
pip install -r requirements.txt
make -j8
./convert_hf_to_gguf.py ../granite-7b-lab --outfile granite-7b-fp16.gguf
./llama-quantize granite-7b-fp16.gguf granite-7b-_Q4_K_M.gguf Q4_K_M/
./llama-cli -m granite-7b-_Q4_K_M.gguf -p "who is batman?" -n 128