settingsLogin | Registersettings

Duplicates with different x-status not treated as duplicates in log, but not in final folder counts.

0 votes

In the appraisal module, I had some differences in what log counted as total messages -duplicates vs. what showed up in the final folder counts. Digging down into to it, I found that duplicate messages, one of which had been replied to and therefore had an X-status was not considered "already in archive" or listed in the dups count. However, only one of the messages showed up in the archive after ingest and was counted in the folder totals. So for example for this folder, log would list that 118 emails had been indexed with no duplicates, but only 117 would show up in the folder total.

This makes it hard as I'm tracking the logs from PST to Aid4mail to epadd and the numbers make it look like I've lost emails along the way. It's extremely cumbersome with accounts that have tens of thousands of messages to track these down one by one, is there a way it could be logged?

asked Dec 7, 2016 by Sarah

2 Answers

0 votes

Hi Sarah,

Thanks for letting us know. We're discussing this and will post here once we have more info.


Josh Schneider
ePADD Community Manager

answered Dec 9, 2016 by Josh_Schneider (5,040 points)
0 votes

If ePADD show 1 duplicate message and show you 117 unique message, the total message is 117+1 = 118 which match to your count. I am not sure I understand the issue?

answered Dec 10, 2016 by Peter_Chan (2,770 points)

The issue is that epadd doesn't show 1 duplicate message. This message is not included in the duplicate count.

So ingesting this folder only the log would show

EmailFetcherThread INFO - total # of messages: 118 reduced # of messages: 118
EmailFetcherThread INFO - 118 messages will be fetched for indexing
EmailFetcherThread INFO - 0 message(s) already in the archive

and then at the end only 117 messages would be shown and it would not be counted in the dups.

Seems like it's not enough of a dup (because of the x-status difference) to qualify as a dup in the log, but enough of a dup that it doesn't ingest it a second time or if it does it overwrites what was previously there.

You mentioned that "only 117 would show up in the folder total" and there is 118 messages. ePADD consider one of the messages as duplicate and show 117.
What do you mean by saying "The issue is that epadd doesn't show 1 duplicate message."?

Could you tell us where did you get the "EmailFetcherThread INFO"?
ePADD shows duplicate messages in http://localhost:9099/epadd/report

ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.