This code takes in an input dataframe (assuming that the server code already processed json to python dataframe), and applies our mapping + threshold + de_duplicate code.