Skip to content

Pipeline overview — from answer to recommendation

This page is the bird’s-eye view of the model: what comes in, which four numbers define each micro-skill, how one student answer updates mastery, and how the next task is selected. Use it as an entry point to the guide or as a cheat sheet for explaining the system to a product, teaching, or data team.

Short formFull EnglishMeaning in this guide
BKTBayesian Knowledge Tracinga probabilistic model of whether a student has learned a skill
EMExpectation–Maximizationan algorithm that fits hidden-state models from observations
HMMHidden Markov Modela model where the real state is hidden and only its effects are observed
ZPDZone of Proximal Developmentthe task difficulty band where growth is likely
P(L)Probability(Learned)probability that the student currently knows a skill
P(L₀) / pInitinitial probability learnedprior P(L) before any attempts
pTtransition probabilitychance of learning after one attempt
pSslip probabilitychance of answering wrong despite knowing
pGguess probabilitychance of answering right without knowing
P(solve)probability of solving a taskestimated chance of a correct answer on this task now
closenesscloseness to targethow close P(solve) is to the target, about 0.7
masteryper-student knowledge estimatethe student’s P(L) vector over all micro-skills
micro-skillatomic skillthe smallest skill unit for which we track one P(L)
prereqprerequisitea predecessor skill in the dependency graph
rarity bonusbonus for under-trained skillsnudges the selector toward tasks involving weak skills
Baum–Welcha specific EM algorithmEM for hidden Markov models
flowchart LR
classDef off fill:#fde68a,stroke:#a16207,color:#0f172a
A1[Answer log]:::off --> A2[EM]:::off --> A3[Parameters]:::off
StepWhat it means
Answer logAll historical answers: (student, task, correct/wrong, timestamp). This is the input to the offline stage.
EMFits four BKT parameters for each micro-skill so they best explain the observed answers. This is not run during every student session.
ParametersThe fitted or default values {pinit,pT,pS,pG}\{p_{init}, p_T, p_S, p_G\} per micro-skill. In the hackathon version we use literature defaults (0.2,0.1,0.1,0.2)(0.2, 0.1, 0.1, 0.2).

The parameters are passed to the online engine and stay there until the next fitting cycle.

flowchart LR
classDef onl fill:#bbf7d0,stroke:#15803d,color:#0f172a
classDef sel fill:#e9d5ff,stroke:#7e22ce,color:#0f172a
B1[applyAttempt]:::onl --> B2["P(L)"]:::onl --> B3["P(solve)"]:::onl --> B4[ZPD score]:::sel --> B5[Top-N]:::sel
StepWhat it means
applyAttemptUpdates P(L) for every micro-skill tagged on the task, using the current P(L), whether the answer was correct, and BKT parameters.
P(L)The student’s mastery vector: one number from 0 to 1 for each micro-skill.
P(solve)The probability of solving a specific task right now, computed from the student’s mastery on that task’s micro-skills.
ZPD scoreThe task priority: closeness to target P(solve) ≈ 0.7 plus a small bonus for under-trained skills.
Top-NThe best candidate tasks, with recent repeats filtered out.
FieldTypeExampleSource
student_idstring"u_142"session / user database
masteryRecord<skillId, number>{ "define.t1.add": 0.31 }accumulated online
historyAttemptRecord[]see belowanswer log
task.idstring"q_007"task pool
task.microskillsstring[]["define.t2.mix"]task tagging
task.difficultynumber ∈ [0,1]0.55author estimate / tie-breaker

Example AttemptRecord:

{
"task_id": "q_007",
"correct": true,
"ts": "2026-05-07T18:42:11Z",
"per_skill": { "define.t2.mix": true }
}

Example Task:

{
"id": "q_007",
"topic": "linear",
"microskills": ["define.t2.mix"],
"difficulty": 0.55,
"prompt_et": "Pille on 3 aastat vanem kui Mart…",
"answer": "x = 12"
}

3. BKT parameters — four numbers per micro-skill

Section titled “3. BKT parameters — four numbers per micro-skill”
NameMeaningHackathon default
pinitp_{init}prior probability that the skill was already learned0.20
pTp_Tprobability of learning after one attempt0.10
pSp_Sslip: knew it but answered wrong0.10
pGp_Gguess: did not know but answered right0.20

Source: packages/bkt-core/src/microskills.ts → DEFAULT_BKT.

Why these defaults? They are conservative literature-style defaults for school mathematics: learning is gradual, slips are possible, and occasional guesses are allowed. The model therefore does not conclude “learned” after a single correct answer or “knows nothing” after one wrong answer.

4. Online learning update — on every answer

Section titled “4. Online learning update — on every answer”

For one task micro-skill:

P(Lcorrect)=P(L)(1pS)P(L)(1pS)+(1P(L))pGP(L \mid correct) = \frac{P(L) \cdot (1 - p_S)}{P(L) \cdot (1 - p_S) + (1 - P(L)) \cdot p_G} P(Lwrong)=P(L)pSP(L)pS+(1P(L))(1pG)P(L \mid wrong) = \frac{P(L) \cdot p_S}{P(L) \cdot p_S + (1 - P(L)) \cdot (1 - p_G)}

then the learning transition:

P(Lnew)=posterior+(1posterior)pTP(L_{new}) = posterior + (1 - posterior) \cdot p_T

Implementation: bkt-core/src/bkt.ts (bktUpdate). For multi-skill tasks, this update is applied independently to every tagged micro-skill via applyAttempt.

Interactive: BktSimulator lets you click “correct / wrong” and watch P(L)P(L) move.

For every task in the pool we compute a joint P(solve)P(\text{solve}) over the micro-skills involved. We use a geometric mean, which is stricter than an arithmetic mean because one weak component pulls the whole value down:

P(solve)joint=exp ⁣(1ni=1nlogP(solve)i)P(solve)_{joint} = \exp\!\left(\frac{1}{n} \sum_{i=1}^{n} \log P(solve)_i\right)

The task score is:

score=exp ⁣((P(solve)joint0.7)20.03)closeness+0.15{s:P(Ls)<0.4}nrarityscore = \underbrace{\exp\!\left(-\frac{(P(solve)_{joint} - 0.7)^2}{0.03}\right)}_{closeness} + 0.15 \cdot \underbrace{\frac{|\{s : P(L_s) < 0.4\}|}{n}}_{rarity}

The Gaussian peaks around 0.7 — the ZPD target. The rarity bonus gently favours tasks that include under-trained micro-skills.

Implementation: bkt-core/src/bkt.ts (scoreTaskForStudent and recommend), with the last five task IDs filtered out to reduce repetition.

  • P(L)=0.166P(L) = 0.166 for the “parentheses” micro-skill.
  • A one-skill parentheses task has P(solve)=0.1660.9+0.8340.2=0.317P(solve) = 0.166 \cdot 0.9 + 0.834 \cdot 0.2 = 0.317.
  • Closeness is exp((0.3170.7)2/0.03)0.0073\exp(-(0.317 - 0.7)^2 / 0.03) \approx 0.0073, almost zero.
  • A multi-skill task combining parentheses with familiar arithmetic may land around P(solve)[0.55,0.65]P(solve) \in [0.55, 0.65] and enter the ZPD.

closeness = exp(−(p−target)²/σ²). Выше у пика, быстро падает к краям. Чем меньше σ², тем уже «ZPD-окно».

6. Where parameters come from — EM fitting offline

Section titled “6. Where parameters come from — EM fitting offline”

The goal is to recover (pinit,pT,pS,pG)(p_{init}, p_T, p_S, p_G) from answer histories. The algorithm is EM / Baum–Welch for a two-state hidden Markov model: “knows” and “does not know”.

The algorithm:

  1. Collect observations, roughly 3000 answers per skill.
  2. Start with guessed parameters, such as literature defaults.
  3. E-step: estimate the probability of each hidden “knew / did not know” state over time.
  4. M-step: re-estimate parameters so the observed answers become more likely.
  5. Repeat until parameters stabilise, roughly 20 iterations.
ItemValue
Data volume per skill~3000 observations
Iterations to convergence~20
Parameter precisionabout ±0.01

More detail: NB-3 EM fitting.

Short answer: not directly. The graph is drawn in the UI for humans, but the selector code does not read it.

WhatWhere it livesWho fills itUsed by code?
Skill dependency DAG (t3.mix → t1.add, t2.mix…)data/matx-define/microskills.json, field prereqcurriculum author❌ no — rendered in UI
Task tags (task.microskills = ["t3.mix", "t1.add", …])data/matx-define/tasks.jsonteacher / content author✅ yes — read by recommend()

The dependency graph is visualised in ProgressionMatrix.tsx, but recommend() does not load it.

When adding a task, the teacher writes the prompt and answer, then manually lists all micro-skills the task uses, including prerequisites. For a t3.mix task the tags might include define.t3.mix, define.t2.mix, define.t1.add, and define.t1.mul.

That means the teacher flattens the graph into the task tag list. In the current bank, 16 of 20 tasks are multi-skill tasks.

The selector looks at task.microskills and state.mastery. When a weak prerequisite is already present in the task tags, the geometric mean pulls the joint P(solve)P(\text{solve}) down. So prerequisite checking happens indirectly — because prerequisites are included in the task’s micro-skill list.

What we lose by not reading the graph directly

Section titled “What we lose by not reading the graph directly”
  • Tagging mistakes matter: if a teacher forgets to include t1.add, the model cannot know the task depends on it.
  • We cannot write explicit rules such as “do not show t3.mix until t2.mix reaches P(L)0.5P(L) \geq 0.5”.
  • The selector cannot explain missing prerequisites unless that information is duplicated in task tags.

9 micro-skills, 20 tasks. Multi-skill tasks: 16 / 20.

Skills in taskNumber of tasks
14
26
35
53
61
91

So 80% of tasks use at least two skills; the geometric mean matters on most recommendations.

1. Should v2 make the code read the dependency graph?

Section titled “1. Should v2 make the code read the dependency graph?”

Currently the selector only reads task tags and student mastery. If it read the graph, it could know that a t3.mix task is premature without t1.add, even if the teacher forgot to include that prerequisite in the task tags. That would be more predictable, but it adds rules that may conflict with ZPD scoring.

Currently the teacher lists every skill manually. Code could instead let the teacher specify only the top-level skill and expand prerequisites from the DAG. That reduces routine work but makes graph errors propagate everywhere.

3. One parameter set for all skills, or separate parameters per skill?

Section titled “3. One parameter set for all skills, or separate parameters per skill?”

Currently all 9 micro-skills share (0.2,0.1,0.1,0.2)(0.2, 0.1, 0.1, 0.2). In real life, slip and guess rates differ by skill: long multi-step equations invite more slips, while simple arithmetic is harder to guess. Per-skill parameters need enough data, roughly 3000 answers per skill.

For a new student with P(L)=0P(L)=0 on basic arithmetic, any task depending on that skill has a low joint P(solve)P(\text{solve}). This educationally forces the student to close the basics first. But it may become boring if the bank has only a few pure basic tasks. A future version may need a “rescue mode” that widens the ZPD window after too many repeats.

5. What happens when MATx adds skills from other topics?

Section titled “5. What happens when MATx adds skills from other topics?”

After integration, skills may come from percentages, equations, auxiliary formulas, and modelling. Cross-topic tasks will appear. We need to decide whether a single formula over all skills is enough, or whether the selector should balance parallel topic tracks.

6. Should the ZPD width be static or dynamic?

Section titled “6. Should the ZPD width be static or dynamic?”

The current σ2=0.03\sigma^2 = 0.03 gives a reasonable window around P(solve)[0.55,0.85]P(\text{solve}) \in [0.55, 0.85]. A dynamic version could widen the window for newcomers, narrow it for advanced students, and react to streaks of correct or wrong answers.

  • Types and default parameters — packages/bkt-core/src/microskills.ts
  • Update and selection — packages/bkt-core/src/bkt.ts
  • Skill graph — data/matx-bridge.json
  • Simulator widgets — study-guide/src/widgets/