We did it!
The newest version of Parser / Parsing Library works with 100% accuracy across the pdfs. Not just physics, Math, or computer science but every A-level pdfs. There are still some rooms for improvement which I will do tomorrow. But I guess now I will gradually start shifting my focus from OpenPastPaper towards the hackathon I am currently attending.
Anyway, it can classify all the questions — along with sub-questions — in any pdfs. Sub-questions can be important if we are going to separate them as each sub-question can be a part of a different chapter which will be interesting to see. I am still skeptical about the reliability of sub-question extraction and the use case for now, but I may just get rid of sub-question extraction in the future if it deems too time-consuming.
Bug — what is this stupid bug.
The problem is about a dictionary that stores some metadata about its characters. But when appending to an array, the entire array seems to have a single copy of the metadata everywhere — which is weird like this is not supposed to happen but now I cannot even pinpoint the bug.
Update — The issue has been resolved!