The Notebooks come in five 'generations': 64-65, 68-72, 79-80, 06 and 07-10
Since a person's handwriting is likely to change throughout the years, we decided to type up pages from all generations as the basis of our model. The following number of pages were typed up from each generation.
Our first model was trained on the first four (64, 68, 79 and 06) and tested on 07 and gave a CER on Validation set of 12.99%. This gives us a sense of the results for 'out of generation' training.
Our second model was trained on the first 18 pages from each generation and tested on the rest and gave a CER on Validation set of 6.61%. This indicates what the level of recognition will be for all the data.
Out third model was trained on all the data. We did not have a test set for this but the CER on Train set was 2.81%. This is the model we will use for recognition on all the data. It is likely to be better than our second model since we trained on all available data.