Call Search
     

New to Ham Radio?
My Profile

Community
Articles
Forums
News
Reviews
Friends Remembered
Strays
Survey Question

Operating
Contesting
DX Cluster Spots
Propagation

Resources
Calendar
Classifieds
Ham Exams
Ham Links
List Archives
News Articles
Product Reviews
QSL Managers

Site Info
eHam Help (FAQ)
Support the site
The eHam Team
Advertising Info
Vision Statement
About eHam.net

   Home   Help Search  
Pages: [1] 2 3 4 5 6 Next   Go Down
  Print  
Author Topic: Interesting LoTW Data  (Read 12884 times)
N7SMI
Member

Posts: 366




Ignore
« on: December 14, 2012, 02:45:47 PM »

Out of curiosity, I have logged hourly snapshots of LoTW data (both the queue status and front page statistics) over the last 48 hours. Here are some items of interest:

QSO Rate
In 48 hours, LoTW added 373,641 QSOs, or an average of 7784 per hour or 2.16 per second. Interestingly, the rate of new QSOs added varied greatly from a minimum of 1768 per hour (.5 per second) to a maximum of 19544 per hour (5.4 per second).

At the current processing rate it would take LoTW 48 days to process the 9 million QSOs currently in the queue, assuming no duplicates. Assuming that an exorbitant 50% of QSOs currently in the queue are duplicates, it would still be approximately 24 days before logs uploaded today will be processed.

Queue Size
In 48 hours, the number of log files in the queue increased 3%, from 28,303 to 29,142. The number of QSOs in the queue increased only slightly. This may suggest that many folks are probably re-uploading very recent, smaller logs wondering why they haven't yet shown up. This exacerbates the slowdowns.

There seems to be some queue cleanup occurring. During one point yesterday, the queue size dropped by 150,000 QSOs in one hour, but only 7921 QSOs were added that same hour. Either files were dropped from the queue, or perhaps it simply processed LOTS of duplicates that hour. This pattern occurred a couple times.

Queue Delay and Projections
The queue delay (meaning how far back the currently processed logs were uploaded) increased by 9 hours per day to a current delay of 10 days 2.4 hours. At this rate of slowdown, logs uploaded January 1st would be processed around January 18th. Logs uploaded on February 1st (7 weeks from now, when the ARRL has indicated new hardware will be installed) would be processed around March 1st. This assumes no upgrades and that things do not get even slower as the queue size significantly increases (and these data suggests things are getting slower).

Log File Size
LoTW processed an average of 84 log files per hour (one every 43 seconds). The average log file contained 2356 QSOs. These numbers are greatly skewed because at one time this morning LoTW spent 8 solid hours processing 1 very large log file from LP1H which resulted in many 10s of thousands of QSOs dating back at least a full year. Excluding this log, the average is around 218 QSOs per file.

Observations
Logs are clearly coming in much faster than they are currently being processed. It appears that increased backlogs result in further slowdowns.

Duplicate uploads are a significant issue. For example, while processing the single LP1H log (clearly no duplicates here), the new QSO rate was over 16000 per hour, which is twice as fast as the average. This suggests that the system probably spends much of its time processing duplicate records that don't result in QSOs. Simply removing duplicate records from the queue (meaning not queuing log files that are already in the queue) would almost certainly make a significant difference - perhaps doubling the throughput. Additionally, LoTW could do a much better job of informing users to not re-upload recent logs.

Disclaimer
This is a small sampling of data and is likely not representative, but I believe it demonstrates the predicament LoTW is in if it is not optimized. The current processing speed is excruciatingly slow (only 2 QSOs per second!?!) and I am confident that hardware upgrades are not the solution, but only a temporary fix.
Logged
AA6YQ
Member

Posts: 1807


WWW

Ignore
« Reply #1 on: December 14, 2012, 02:58:04 PM »


Disclaimer
This is a small sampling of data and is likely not representative, but I believe it demonstrates the predicament LoTW is in if it is not optimized. The current processing speed is excruciatingly slow (only 2 QSOs per second!?!) and I am confident that hardware upgrades are not the solution, but only a temporary fix.

Please explain the source of your confidence that the current LotW throughput bottleneck is not caused by the write rate of its storage subsystem. Have your examined the code? Reviewed the database schema? Instrumented the application?
Logged
AB8MA
Member

Posts: 762




Ignore
« Reply #2 on: December 14, 2012, 04:28:20 PM »

Dave, I see that you might get involved in optimizing this code. I think that is fantastic.

I must admit that I have been spoiled by LoTW and miss the good old days when an upload was processed within 5 minutes.
Logged
N7SMI
Member

Posts: 366




Ignore
« Reply #3 on: December 14, 2012, 05:09:02 PM »

Please explain the source of your confidence that the current LotW throughput bottleneck is not caused by the write rate of its storage subsystem. Have your examined the code? Reviewed the database schema? Instrumented the application?

None of the above. Having worked with databases of similar size and load I can simply suggest that the system shouldn't be nearly this slow even on relatively minimal hardware *IF* the system is optimized and designed properly. My intention is not to suggest fixes, but to simply point out that things do not appear to be in a very good way at the moment. I hope the hardware upgrade does the trick.
Logged
AA6YQ
Member

Posts: 1807


WWW

Ignore
« Reply #4 on: December 14, 2012, 07:22:17 PM »

Please explain the source of your confidence that the current LotW throughput bottleneck is not caused by the write rate of its storage subsystem. Have your examined the code? Reviewed the database schema? Instrumented the application?

None of the above. Having worked with databases of similar size and load I can simply suggest that the system shouldn't be nearly this slow even on relatively minimal hardware *IF* the system is optimized and designed properly. My intention is not to suggest fixes, but to simply point out that things do not appear to be in a very good way at the moment. I hope the hardware upgrade does the trick.

If you've seriously worked on any database system, then you must know that as perfectly optimized as such a system may be, the ingestion rate can be limited by a sufficiently slow write speed.
Logged
N7SMI
Member

Posts: 366




Ignore
« Reply #5 on: December 14, 2012, 07:34:06 PM »

If you've seriously worked on any database system, then you must know that as perfectly optimized as such a system may be, the ingestion rate can be limited by a sufficiently slow write speed.

Of course. But you have to admit that 2.16 QSOs writes per second to a database of only a couple hundred million is very, very slow. Like I said, I truly hope that this is all that is wrong.

The bigger concern is that these data suggest that by the time the hardware upgrades occur they may be looking at well over a month of backups.
Logged
N3QE
Member

Posts: 2367




Ignore
« Reply #6 on: December 14, 2012, 07:35:47 PM »

Duplicate uploads are a significant issue. For example, while processing the single LP1H log (clearly no duplicates here), the new QSO rate was over 16000 per hour, which is twice as fast as the average. This suggests that the system probably spends much of its time processing duplicate records that don't result in QSOs. Simply removing duplicate records from the queue (meaning not queuing log files that are already in the queue) would almost certainly make a significant difference - perhaps doubling the throughput. Additionally, LoTW could do a much better job of informing users to not re-upload recent logs.

LOTW already does a pretty good job at quickly and efficiently rejecting log files that are exact copies of previously submitted and processed files.

Typical rejection of an already submitted file takes just a few seconds regardless of size of file.

e.g.

2012-11-24 13:47:32 LOTW_QSO: Processing file: msg-6709-1.tq8
2012-11-24 13:47:32 LOTW_QSO: User file: nov10.tq8
2012-11-24 13:47:32 LOTW_QSO: Certificate found for N3QE - UNITED STATES OF AMERICA (291)
2012-11-24 13:47:35 LOTW_QSO: Processing ABORTED: Log file was previously processed with this result:
2012-11-24 13:47:35 LOTW_QSO: =====================================================
2012-11-24 13:47:35 LOTW_QSO: 2012-11-20 02:18:34 LOTW_QSO: Processing file: msg-5561-1.tq8
2012-11-24 13:47:35 LOTW_QSO: 2012-11-20 02:18:34 LOTW_QSO: User file: nov10.tq8
2012-11-24 13:47:35 LOTW_QSO: 2012-11-20 02:18:34 LOTW_QSO: Certificate found for N3QE - UNITED STATES OF AMERICA (291)


I think the big problem is, when 10000 previously processed QSO's are included with 100 new QSO's. It knows that the previously processed QSO's are already in the system but doesn't reject them nearly so quickly as when it rejects an entire file for being a duplicate.
Logged
AA6YQ
Member

Posts: 1807


WWW

Ignore
« Reply #7 on: December 14, 2012, 09:27:15 PM »

If you've seriously worked on any database system, then you must know that as perfectly optimized as such a system may be, the ingestion rate can be limited by a sufficiently slow write speed.

Of course.


Then how can you be confident that a hardware upgrade that mitigates the write performance bottleneck "is only a temporary fix"?
Logged
W9KEY
Member

Posts: 1138




Ignore
« Reply #8 on: December 15, 2012, 01:37:06 AM »

If you've seriously worked on any database system, then you must know that as perfectly optimized as such a system may be, the ingestion rate can be limited by a sufficiently slow write speed.

Of course.


Then how can you be confident that a hardware upgrade that mitigates the write performance bottleneck "is only a temporary fix"?

are you confident the database system is perfectly optimized and that a hardware upgrade will be more than a temporary fix?
Logged
AA6YQ
Member

Posts: 1807


WWW

Ignore
« Reply #9 on: December 15, 2012, 10:34:20 PM »

If you've seriously worked on any database system, then you must know that as perfectly optimized as such a system may be, the ingestion rate can be limited by a sufficiently slow write speed.

Of course.


Then how can you be confident that a hardware upgrade that mitigates the write performance bottleneck "is only a temporary fix"?

are you confident the database system is perfectly optimized and that a hardware upgrade will be more than a temporary fix?

I did not say that LotW is perfectly optimized; I said that even if a system were perfectly optimized, it's performance could still be throttled by its storage subsystem's update rate.

All transaction-oriented software systems have a performance bottleneck that limits their throughput. Eliminating that bottleneck via hardware upgrades or software improvements always reveals the next bottleneck.

The ARRL believes that the current LotW implementation's throughput is being throttled by its storage subsystem's update rate. They have ordered a significant upgrade for this subsystem. This upgrade should eliminate the current bottleneck, and thus be permanent, rather than temporary. However, the new bottleneck may not allow throughput to rise to an acceptable level; it that's the case, it will be necessary to repeat the process of identifying and eliminating bottlenecks until acceptable throughput is attained.

In summary, the storage subsystem upgrade is necessary, but may or may not be sufficient. We'll know in 4-6 weeks.

     73,

         Dave, AA6YQ
Logged
W9KEY
Member

Posts: 1138




Ignore
« Reply #10 on: December 15, 2012, 10:57:27 PM »

Thanks for the reply.  The point of my question is only to emphasize that until the software is optimized, upgrading the hardware may or may not solve the issue. Given that the hardware is weeks away, it seems to me to make sense to get the software vetted now rather than pushing back the problem 4 to 6 weeks should the hardware upgrade not do the trick. 

Admittedly I am outside my expertise here, but from what I have read from those in-the-know, getting the software up-to-speed seems crucial -- especially given that it sounds like the system is pretty old and things change at a fast pace in this field.
Logged
WD4ELG
Member

Posts: 877




Ignore
« Reply #11 on: December 16, 2012, 01:39:25 AM »

N3QE, I don't see these types of status messages.  Can we plz email offline so I can get schooled on how to see queue status of logs submitted previously?  Thanks
Logged
AA6YQ
Member

Posts: 1807


WWW

Ignore
« Reply #12 on: December 16, 2012, 08:47:24 AM »

Thanks for the reply.  The point of my question is only to emphasize that until the software is optimized, upgrading the hardware may or may not solve the issue. Given that the hardware is weeks away, it seems to me to make sense to get the software vetted now rather than pushing back the problem 4 to 6 weeks should the hardware upgrade not do the trick. 

Admittedly I am outside my expertise here, but from what I have read from those in-the-know, getting the software up-to-speed seems crucial -- especially given that it sounds like the system is pretty old and things change at a fast pace in this field.

Anyone who claims to be "in the know", but hasn't reviewed LotW's architecture and database schema, isn't.

If eliminating the write performance bottleneck isn't sufficient, we'll know soon enough.
Logged
N3QE
Member

Posts: 2367




Ignore
« Reply #13 on: December 16, 2012, 09:08:38 AM »

N3QE, I don't see these types of status messages.  Can we plz email offline so I can get schooled on how to see queue status of logs submitted previously?  Thanks

I don't think it's a secret. You can see the results after the processing each file you submit, if you log in to LOTW, go to "Your Account", then "Account Status", then "Your Activity", and for each processed log you can see the results of processing (including QSO's and how long it took to process).

Another thing is this recently added status page that shows statistics about yet unprocessed logs: http://www.arrl.org/logbook-queue-status
Logged
W9KEY
Member

Posts: 1138




Ignore
« Reply #14 on: December 16, 2012, 12:36:25 PM »

If eliminating the write performance bottleneck isn't sufficient, we'll know soon enough.

4 to 6 weeks delay to try a potential fix on a system which is already nearly 2 weeks back-logged is a lot of time in today's world.
I suppose being state-of-the-art is no longer a goal for amateur radio, but if there is even a 25% chance the problem is the software, and the software should be tweaked regularly anyway -- does it not make sound sense to optimize it in the interim?
Logged
Pages: [1] 2 3 4 5 6 Next   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!