There’s a dilemma — whether to store the entire question including the equations and variables or just the text.
Solve the inequality |x − 2| ≥ |x + 5|
Solve the inequality
Now, looking at what I am trying to build — the obvious answer would be to store the text only as the classifier would have a better grasp of what it is reading. But thinking about this logically and from a human perspective — I can easily classify the question by just looking at the equations — the text is there just for me to know what I am supposed to do with this. But the converse is also true and equations will just make it complex for the neural network to learn.
The neural network might see these symbols and numbers written in such a specific way as a way of classifying the chapters which might be beneficial in the end. And most of my training data might contain questions like this:
Solve : |x − 2| ≥ |x + 5|
Now, solve does not immediately classify any chapter name but the equation does. And for this reason, I will consider using the entire text + equations for building the training dataset.