Step 3 in detail: Processing
As explained in this previous article the processing is quite a complicated and very important element of the project architecture, so I wanted to explain it in detail.
What are the different elements of the processing?
As said, there are four main elements:
This is particularly important as we try to make the better math as possible so the search is accurate.
Name Entity Recognition
Once the keywords correctly match, we use a very common strategy in Natural Language Processing, Name Entity Recognition.
This allows making the summarization to be more precise.
It will probably be a lot of duplicates due to the high amount of papers, therefore it is important to apply a strategy of duplicate elimination.
The architecture of this part of the system can be seen below :