Pipeline overview — from answer to recommendation

This page is the bird’s-eye view of the model: what comes in, which four numbers define each micro-skill, how one student answer updates mastery, and how the next task is selected. Use it as an entry point to the guide or as a cheat sheet for explaining the system to a product, teaching, or data team.

0. Decoder for abbreviations

Short form	Full English	Meaning in this guide
BKT	Bayesian Knowledge Tracing	a probabilistic model of whether a student has learned a skill
EM	Expectation–Maximization	an algorithm that fits hidden-state models from observations
HMM	Hidden Markov Model	a model where the real state is hidden and only its effects are observed
ZPD	Zone of Proximal Development	the task difficulty band where growth is likely
P(L)	Probability(Learned)	probability that the student currently knows a skill
P(L₀) / pInit	initial probability learned	prior P(L) before any attempts
pT	transition probability	chance of learning after one attempt
pS	slip probability	chance of answering wrong despite knowing
pG	guess probability	chance of answering right without knowing
P(solve)	probability of solving a task	estimated chance of a correct answer on this task now
closeness	closeness to target	how close P(solve) is to the target, about 0.7
mastery	per-student knowledge estimate	the student’s P(L) vector over all micro-skills
micro-skill	atomic skill	the smallest skill unit for which we track one P(L)
prereq	prerequisite	a predecessor skill in the dependency graph
rarity bonus	bonus for under-trained skills	nudges the selector toward tasks involving weak skills
Baum–Welch	a specific EM algorithm	EM for hidden Markov models

1. Bird’s-eye view

🔧 Offline — every N weeks

flowchart LR
  classDef off fill:#fde68a,stroke:#a16207,color:#0f172a
  A1[Answer log]:::off --> A2[EM]:::off --> A3[Parameters]:::off

Step	What it means
Answer log	All historical answers: `(student, task, correct/wrong, timestamp)`. This is the input to the offline stage.
EM	Fits four BKT parameters for each micro-skill so they best explain the observed answers. This is not run during every student session.
Parameters	The fitted or default values $\{p_{init}, p_T, p_S, p_G\}$ per micro-skill. In the hackathon version we use literature defaults $(0.2, 0.1, 0.1, 0.2)$ .

The parameters are passed to the online engine and stay there until the next fitting cycle.

⚡ Online — on every student answer

flowchart LR
  classDef onl fill:#bbf7d0,stroke:#15803d,color:#0f172a
  classDef sel fill:#e9d5ff,stroke:#7e22ce,color:#0f172a
  B1[applyAttempt]:::onl --> B2["P(L)"]:::onl --> B3["P(solve)"]:::onl --> B4[ZPD score]:::sel --> B5[Top-N]:::sel

Step	What it means
applyAttempt	Updates P(L) for every micro-skill tagged on the task, using the current P(L), whether the answer was correct, and BKT parameters.
P(L)	The student’s mastery vector: one number from 0 to 1 for each micro-skill.
P(solve)	The probability of solving a specific task right now, computed from the student’s mastery on that task’s micro-skills.
ZPD score	The task priority: closeness to target P(solve) ≈ 0.7 plus a small bonus for under-trained skills.
Top-N	The best candidate tasks, with recent repeats filtered out.

2. What the model receives

Field	Type	Example	Source
`student_id`	string	`"u_142"`	session / user database
`mastery`	`Record<skillId, number>`	`{ "define.t1.add": 0.31 }`	accumulated online
`history`	`AttemptRecord[]`	see below	answer log
`task.id`	string	`"q_007"`	task pool
`task.microskills`	`string[]`	`["define.t2.mix"]`	task tagging
`task.difficulty`	number ∈ [0,1]	`0.55`	author estimate / tie-breaker

Example AttemptRecord:

{
  "task_id": "q_007",
  "correct": true,
  "ts": "2026-05-07T18:42:11Z",
  "per_skill": { "define.t2.mix": true }
}

Example Task:

{
  "id": "q_007",
  "topic": "linear",
  "microskills": ["define.t2.mix"],
  "difficulty": 0.55,
  "prompt_et": "Pille on 3 aastat vanem kui Mart…",
  "answer": "x = 12"
}

3. BKT parameters — four numbers per micro-skill

Name	Meaning	Hackathon default
$p_{init}$	prior probability that the skill was already learned	0.20
$p_T$	probability of learning after one attempt	0.10
$p_S$	slip: knew it but answered wrong	0.10
$p_G$	guess: did not know but answered right	0.20

Source: packages/bkt-core/src/microskills.ts → DEFAULT_BKT.

Why these defaults? They are conservative literature-style defaults for school mathematics: learning is gradual, slips are possible, and occasional guesses are allowed. The model therefore does not conclude “learned” after a single correct answer or “knows nothing” after one wrong answer.

4. Online learning update — on every answer

For one task micro-skill:

P(L \mid correct) = \frac{P(L) \cdot (1 - p_S)}{P(L) \cdot (1 - p_S) + (1 - P(L)) \cdot p_G}

P(L \mid wrong) = \frac{P(L) \cdot p_S}{P(L) \cdot p_S + (1 - P(L)) \cdot (1 - p_G)}

then the learning transition:

P(L_{new}) = posterior + (1 - posterior) \cdot p_T

Implementation: bkt-core/src/bkt.ts (bktUpdate). For multi-skill tasks, this update is applied independently to every tagged micro-skill via applyAttempt.

Interactive: BktSimulator lets you click “correct / wrong” and watch $P(L)$ move.

5. Task selection

For every task in the pool we compute a joint $P(\text{solve})$ over the micro-skills involved. We use a geometric mean, which is stricter than an arithmetic mean because one weak component pulls the whole value down:

P(solve)_{joint} = \exp\!\left(\frac{1}{n} \sum_{i=1}^{n} \log P(solve)_i\right)

The task score is:

score = \underbrace{\exp\!\left(-\frac{(P(solve)_{joint} - 0.7)^2}{0.03}\right)}_{closeness} + 0.15 \cdot \underbrace{\frac{|\{s : P(L_s) < 0.4\}|}{n}}_{rarity}

The Gaussian peaks around 0.7 — the ZPD target. The rarity bonus gently favours tasks that include under-trained micro-skills.

Implementation: bkt-core/src/bkt.ts (scoreTaskForStudent and recommend), with the last five task IDs filtered out to reduce repetition.

Numerical example

$P(L) = 0.166$ for the “parentheses” micro-skill.
A one-skill parentheses task has $P(solve) = 0.166 \cdot 0.9 + 0.834 \cdot 0.2 = 0.317$ .
Closeness is $\exp(-(0.317 - 0.7)^2 / 0.03) \approx 0.0073$ , almost zero.
A multi-skill task combining parentheses with familiar arithmetic may land around $P(solve) \in [0.55, 0.65]$ and enter the ZPD.

target0.70σ² (ширина)0.030

closeness = exp(−(p−target)²/σ²). Выше у пика, быстро падает к краям. Чем меньше σ², тем уже «ZPD-окно».

6. Where parameters come from — EM fitting offline

The goal is to recover $(p_{init}, p_T, p_S, p_G)$ from answer histories. The algorithm is EM / Baum–Welch for a two-state hidden Markov model: “knows” and “does not know”.

The algorithm:

Collect observations, roughly 3000 answers per skill.
Start with guessed parameters, such as literature defaults.
E-step: estimate the probability of each hidden “knew / did not know” state over time.
M-step: re-estimate parameters so the observed answers become more likely.
Repeat until parameters stabilise, roughly 20 iterations.

Item	Value
Data volume per skill	~3000 observations
Iterations to convergence	~20
Parameter precision	about ±0.01

More detail: NB-3 EM fitting.

7. Do we use the dependency graph?

Short answer: not directly. The graph is drawn in the UI for humans, but the selector code does not read it.

Two different concepts

What	Where it lives	Who fills it	Used by code?
Skill dependency DAG (`t3.mix → t1.add, t2.mix…`)	`data/matx-define/microskills.json`, field `prereq`	curriculum author	❌ no — rendered in UI
Task tags (`task.microskills = ["t3.mix", "t1.add", …]`)	`data/matx-define/tasks.json`	teacher / content author	✅ yes — read by `recommend()`

The dependency graph is visualised in ProgressionMatrix.tsx, but recommend() does not load it.

What the teacher does

When adding a task, the teacher writes the prompt and answer, then manually lists all micro-skills the task uses, including prerequisites. For a t3.mix task the tags might include define.t3.mix, define.t2.mix, define.t1.add, and define.t1.mul.

That means the teacher flattens the graph into the task tag list. In the current bank, 16 of 20 tasks are multi-skill tasks.

Why the model still works

The selector looks at task.microskills and state.mastery. When a weak prerequisite is already present in the task tags, the geometric mean pulls the joint $P(\text{solve})$ down. So prerequisite checking happens indirectly — because prerequisites are included in the task’s micro-skill list.

What we lose by not reading the graph directly

Tagging mistakes matter: if a teacher forgets to include t1.add, the model cannot know the task depends on it.
We cannot write explicit rules such as “do not show t3.mix until t2.mix reaches $P(L) \geq 0.5$ ”.
The selector cannot explain missing prerequisites unless that information is duplicated in task tags.

Current task distribution

9 micro-skills, 20 tasks. Multi-skill tasks: 16 / 20.

Skills in task	Number of tasks
1	4
2	6
3	5
5	3
6	1
9	1

So 80% of tasks use at least two skills; the geometric mean matters on most recommendations.

Open design questions

1. Should v2 make the code read the dependency graph?

Currently the selector only reads task tags and student mastery. If it read the graph, it could know that a t3.mix task is premature without t1.add, even if the teacher forgot to include that prerequisite in the task tags. That would be more predictable, but it adds rules that may conflict with ZPD scoring.

2. Who owns tagging — teacher or code?

Currently the teacher lists every skill manually. Code could instead let the teacher specify only the top-level skill and expand prerequisites from the DAG. That reduces routine work but makes graph errors propagate everywhere.

3. One parameter set for all skills, or separate parameters per skill?

Currently all 9 micro-skills share $(0.2, 0.1, 0.1, 0.2)$ . In real life, slip and guess rates differ by skill: long multi-step equations invite more slips, while simple arithmetic is harder to guess. Per-skill parameters need enough data, roughly 3000 answers per skill.

4. Forced gap-closing vs ZPD

For a new student with $P(L)=0$ on basic arithmetic, any task depending on that skill has a low joint $P(\text{solve})$ . This educationally forces the student to close the basics first. But it may become boring if the bank has only a few pure basic tasks. A future version may need a “rescue mode” that widens the ZPD window after too many repeats.

5. What happens when MATx adds skills from other topics?

After integration, skills may come from percentages, equations, auxiliary formulas, and modelling. Cross-topic tasks will appear. We need to decide whether a single formula over all skills is enough, or whether the selector should balance parallel topic tracks.

6. Should the ZPD width be static or dynamic?

The current $\sigma^2 = 0.03$ gives a reasonable window around $P(\text{solve}) \in [0.55, 0.85]$ . A dynamic version could widen the window for newcomers, narrow it for advanced students, and react to streaks of correct or wrong answers.

8. Where to look in code

Types and default parameters — packages/bkt-core/src/microskills.ts
Update and selection — packages/bkt-core/src/bkt.ts
Skill graph — data/matx-bridge.json
Simulator widgets — study-guide/src/widgets/