Pipeline overview — from answer to recommendation
This page is the bird’s-eye view of the model: what comes in, which four numbers define each micro-skill, how one student answer updates mastery, and how the next task is selected. Use it as an entry point to the guide or as a cheat sheet for explaining the system to a product, teaching, or data team.
0. Decoder for abbreviations
Section titled “0. Decoder for abbreviations”| Short form | Full English | Meaning in this guide |
|---|---|---|
| BKT | Bayesian Knowledge Tracing | a probabilistic model of whether a student has learned a skill |
| EM | Expectation–Maximization | an algorithm that fits hidden-state models from observations |
| HMM | Hidden Markov Model | a model where the real state is hidden and only its effects are observed |
| ZPD | Zone of Proximal Development | the task difficulty band where growth is likely |
| P(L) | Probability(Learned) | probability that the student currently knows a skill |
| P(L₀) / pInit | initial probability learned | prior P(L) before any attempts |
| pT | transition probability | chance of learning after one attempt |
| pS | slip probability | chance of answering wrong despite knowing |
| pG | guess probability | chance of answering right without knowing |
| P(solve) | probability of solving a task | estimated chance of a correct answer on this task now |
| closeness | closeness to target | how close P(solve) is to the target, about 0.7 |
| mastery | per-student knowledge estimate | the student’s P(L) vector over all micro-skills |
| micro-skill | atomic skill | the smallest skill unit for which we track one P(L) |
| prereq | prerequisite | a predecessor skill in the dependency graph |
| rarity bonus | bonus for under-trained skills | nudges the selector toward tasks involving weak skills |
| Baum–Welch | a specific EM algorithm | EM for hidden Markov models |
1. Bird’s-eye view
Section titled “1. Bird’s-eye view”🔧 Offline — every N weeks
Section titled “🔧 Offline — every N weeks”flowchart LR classDef off fill:#fde68a,stroke:#a16207,color:#0f172a A1[Answer log]:::off --> A2[EM]:::off --> A3[Parameters]:::off| Step | What it means |
|---|---|
| Answer log | All historical answers: (student, task, correct/wrong, timestamp). This is the input to the offline stage. |
| EM | Fits four BKT parameters for each micro-skill so they best explain the observed answers. This is not run during every student session. |
| Parameters | The fitted or default values per micro-skill. In the hackathon version we use literature defaults . |
The parameters are passed to the online engine and stay there until the next fitting cycle.
⚡ Online — on every student answer
Section titled “⚡ Online — on every student answer”flowchart LR classDef onl fill:#bbf7d0,stroke:#15803d,color:#0f172a classDef sel fill:#e9d5ff,stroke:#7e22ce,color:#0f172a B1[applyAttempt]:::onl --> B2["P(L)"]:::onl --> B3["P(solve)"]:::onl --> B4[ZPD score]:::sel --> B5[Top-N]:::sel| Step | What it means |
|---|---|
| applyAttempt | Updates P(L) for every micro-skill tagged on the task, using the current P(L), whether the answer was correct, and BKT parameters. |
| P(L) | The student’s mastery vector: one number from 0 to 1 for each micro-skill. |
| P(solve) | The probability of solving a specific task right now, computed from the student’s mastery on that task’s micro-skills. |
| ZPD score | The task priority: closeness to target P(solve) ≈ 0.7 plus a small bonus for under-trained skills. |
| Top-N | The best candidate tasks, with recent repeats filtered out. |
2. What the model receives
Section titled “2. What the model receives”| Field | Type | Example | Source |
|---|---|---|---|
student_id | string | "u_142" | session / user database |
mastery | Record<skillId, number> | { "define.t1.add": 0.31 } | accumulated online |
history | AttemptRecord[] | see below | answer log |
task.id | string | "q_007" | task pool |
task.microskills | string[] | ["define.t2.mix"] | task tagging |
task.difficulty | number ∈ [0,1] | 0.55 | author estimate / tie-breaker |
Example AttemptRecord:
{ "task_id": "q_007", "correct": true, "ts": "2026-05-07T18:42:11Z", "per_skill": { "define.t2.mix": true }}Example Task:
{ "id": "q_007", "topic": "linear", "microskills": ["define.t2.mix"], "difficulty": 0.55, "prompt_et": "Pille on 3 aastat vanem kui Mart…", "answer": "x = 12"}3. BKT parameters — four numbers per micro-skill
Section titled “3. BKT parameters — four numbers per micro-skill”| Name | Meaning | Hackathon default |
|---|---|---|
| prior probability that the skill was already learned | 0.20 | |
| probability of learning after one attempt | 0.10 | |
| slip: knew it but answered wrong | 0.10 | |
| guess: did not know but answered right | 0.20 |
Source: packages/bkt-core/src/microskills.ts → DEFAULT_BKT.
Why these defaults? They are conservative literature-style defaults for school mathematics: learning is gradual, slips are possible, and occasional guesses are allowed. The model therefore does not conclude “learned” after a single correct answer or “knows nothing” after one wrong answer.
4. Online learning update — on every answer
Section titled “4. Online learning update — on every answer”For one task micro-skill:
then the learning transition:
Implementation: bkt-core/src/bkt.ts (bktUpdate). For multi-skill tasks,
this update is applied independently to every tagged micro-skill via
applyAttempt.
Interactive: BktSimulator lets you click “correct / wrong” and watch move.
5. Task selection
Section titled “5. Task selection”For every task in the pool we compute a joint over the micro-skills involved. We use a geometric mean, which is stricter than an arithmetic mean because one weak component pulls the whole value down:
The task score is:
The Gaussian peaks around 0.7 — the ZPD target. The rarity bonus gently favours tasks that include under-trained micro-skills.
Implementation: bkt-core/src/bkt.ts (scoreTaskForStudent and
recommend), with the last five task IDs filtered out to reduce repetition.
Numerical example
Section titled “Numerical example”- for the “parentheses” micro-skill.
- A one-skill parentheses task has .
- Closeness is , almost zero.
- A multi-skill task combining parentheses with familiar arithmetic may land around and enter the ZPD.
closeness = exp(−(p−target)²/σ²). Выше у пика, быстро падает к краям. Чем меньше σ², тем уже «ZPD-окно».
6. Where parameters come from — EM fitting offline
Section titled “6. Where parameters come from — EM fitting offline”The goal is to recover from answer histories. The algorithm is EM / Baum–Welch for a two-state hidden Markov model: “knows” and “does not know”.
The algorithm:
- Collect observations, roughly 3000 answers per skill.
- Start with guessed parameters, such as literature defaults.
- E-step: estimate the probability of each hidden “knew / did not know” state over time.
- M-step: re-estimate parameters so the observed answers become more likely.
- Repeat until parameters stabilise, roughly 20 iterations.
| Item | Value |
|---|---|
| Data volume per skill | ~3000 observations |
| Iterations to convergence | ~20 |
| Parameter precision | about ±0.01 |
More detail: NB-3 EM fitting.
7. Do we use the dependency graph?
Section titled “7. Do we use the dependency graph?”Short answer: not directly. The graph is drawn in the UI for humans, but the selector code does not read it.
Two different concepts
Section titled “Two different concepts”| What | Where it lives | Who fills it | Used by code? |
|---|---|---|---|
Skill dependency DAG (t3.mix → t1.add, t2.mix…) | data/matx-define/microskills.json, field prereq | curriculum author | ❌ no — rendered in UI |
Task tags (task.microskills = ["t3.mix", "t1.add", …]) | data/matx-define/tasks.json | teacher / content author | ✅ yes — read by recommend() |
The dependency graph is visualised in ProgressionMatrix.tsx, but
recommend() does not load it.
What the teacher does
Section titled “What the teacher does”When adding a task, the teacher writes the prompt and answer, then manually
lists all micro-skills the task uses, including prerequisites. For a t3.mix
task the tags might include define.t3.mix, define.t2.mix, define.t1.add,
and define.t1.mul.
That means the teacher flattens the graph into the task tag list. In the current bank, 16 of 20 tasks are multi-skill tasks.
Why the model still works
Section titled “Why the model still works”The selector looks at task.microskills and state.mastery. When a weak
prerequisite is already present in the task tags, the geometric mean pulls
the joint down. So prerequisite checking happens
indirectly — because prerequisites are included in the task’s micro-skill
list.
What we lose by not reading the graph directly
Section titled “What we lose by not reading the graph directly”- Tagging mistakes matter: if a teacher forgets to include
t1.add, the model cannot know the task depends on it. - We cannot write explicit rules such as “do not show
t3.mixuntilt2.mixreaches ”. - The selector cannot explain missing prerequisites unless that information is duplicated in task tags.
Current task distribution
Section titled “Current task distribution”9 micro-skills, 20 tasks. Multi-skill tasks: 16 / 20.
| Skills in task | Number of tasks |
|---|---|
| 1 | 4 |
| 2 | 6 |
| 3 | 5 |
| 5 | 3 |
| 6 | 1 |
| 9 | 1 |
So 80% of tasks use at least two skills; the geometric mean matters on most recommendations.
Open design questions
Section titled “Open design questions”1. Should v2 make the code read the dependency graph?
Section titled “1. Should v2 make the code read the dependency graph?”Currently the selector only reads task tags and student mastery. If it read
the graph, it could know that a t3.mix task is premature without t1.add,
even if the teacher forgot to include that prerequisite in the task tags.
That would be more predictable, but it adds rules that may conflict with ZPD
scoring.
2. Who owns tagging — teacher or code?
Section titled “2. Who owns tagging — teacher or code?”Currently the teacher lists every skill manually. Code could instead let the teacher specify only the top-level skill and expand prerequisites from the DAG. That reduces routine work but makes graph errors propagate everywhere.
3. One parameter set for all skills, or separate parameters per skill?
Section titled “3. One parameter set for all skills, or separate parameters per skill?”Currently all 9 micro-skills share . In real life, slip and guess rates differ by skill: long multi-step equations invite more slips, while simple arithmetic is harder to guess. Per-skill parameters need enough data, roughly 3000 answers per skill.
4. Forced gap-closing vs ZPD
Section titled “4. Forced gap-closing vs ZPD”For a new student with on basic arithmetic, any task depending on that skill has a low joint . This educationally forces the student to close the basics first. But it may become boring if the bank has only a few pure basic tasks. A future version may need a “rescue mode” that widens the ZPD window after too many repeats.
5. What happens when MATx adds skills from other topics?
Section titled “5. What happens when MATx adds skills from other topics?”After integration, skills may come from percentages, equations, auxiliary formulas, and modelling. Cross-topic tasks will appear. We need to decide whether a single formula over all skills is enough, or whether the selector should balance parallel topic tracks.
6. Should the ZPD width be static or dynamic?
Section titled “6. Should the ZPD width be static or dynamic?”The current gives a reasonable window around . A dynamic version could widen the window for newcomers, narrow it for advanced students, and react to streaks of correct or wrong answers.
8. Where to look in code
Section titled “8. Where to look in code”- Types and default parameters —
packages/bkt-core/src/microskills.ts - Update and selection —
packages/bkt-core/src/bkt.ts - Skill graph —
data/matx-bridge.json - Simulator widgets —
study-guide/src/widgets/