Call Search
     

New to Ham Radio?
My Profile

Community
Articles
Forums
News
Reviews
Friends Remembered
Strays
Survey Question

Operating
Contesting
DX Cluster Spots
Propagation

Resources
Calendar
Classifieds
Ham Exams
Ham Links
List Archives
News Articles
Product Reviews
QSL Managers

Site Info
eHam Help (FAQ)
Support the site
The eHam Team
Advertising Info
Vision Statement
About eHam.net

   Home   Help Search  
Pages: [1]   Go Down
  Print  
Author Topic: Figuring out LOTW database indices  (Read 716 times)
N3QE
Member

Posts: 2422




Ignore
« on: December 12, 2012, 08:21:02 PM »

Others here have been making comments/speculation as to what database engine and/or layout is at the core of LOTW.

There are some things that LOTW seems to be very efficient at, and must indicate that there's a substantial number of keys available for searches.

e.g. I have 30000+ QSO's in my LOTW account, and 14000+ QSL's.

If I search for my "most recent QSL's", this search completes in 0.001 second. Pretty decent. Obviously some combination of my callsign with an ordered date field (QSL_DATE) is an important index or key.

If I search for my "most recent QSO's", this also completes in 0.001 second. Pretty decent. Obviously some combination of my callsign with an ordered date field (QSO_DATE) is an important index or key.

If I search for QSO's of mine, that were between Nov 1 2010 and Nov 7 2010, the search comes back fast, again 0.001 second ballpark. OK, it seems that searching for QSO's a few years old is just as fast as searching for my most recent QSO's.

If I search for my most recent QSO's with 4O3A, this completes in 0.013 seconds. OK, this is a little less efficient. But still not so bad. If I search for my most recent confirmed QSL's with 4O3A, 0.019 seconds. Again not so bad.

If I limit my search to QSO's with 4O3A between 13:00 and 14:00 on May 27 2010, this comes back real fast. 0.001 seconds. There is one QSO in that range. If I search between 12:00 and 13:00 on that day, this takes a little longer, 0.002 seconds, and returns no QSO's. I think the search result is correct, moderately interesting that it takes a little longer to find no record than it does to find one record, but still not bad.

If I search for QSO's with 4O3A between 13:00 and 14:00 on 15M  CW on May 27 2010, this comes back real fast too, 0.001 seconds.

If I do a search for all my QSO's confirmed with Afghanistan, this comes back real fast. 0.0008 seconds. (2 QSO/QSL's in result). Obviously there's some index to help make lookups by DXCC entitiy real fast too.

My conclusion, is that as far as looking up QSO's, the LOTW QSO/QSL database seems to have a lot of indices that make the most common queries (including the queries that would have to be done on every insert to establish "yes QSL confirmed" or "no QSL not yet confirmed") very lightweight. This seems to be pretty good design.

Yet... LOTW is very very slow inserting new records into databases. Others here have noticed that the insert rate from new logs, is as low as a few QSO's per second. Maybe there's some really large number of indices that it maintains, a high cost at insert time, but making lookups later very efficient. Maybe there's a very large transaction-based logging overhead on doing any insert, that is not a cost at all on a simple lookup. Maybe inserts and writes into the database involve large number of expensive table locks. If there were only one inserter at a time, some databases would let you run lean and not have to do expensive table locks on inserts. But others would pooh-pooh this lean design and insist that tables do have to be locked on any insert. There may be some large number of tables that I'm unaware of, tying all the records and searches together in very efficient ways, that all have to be locked on an insert. Maybe there's a table of everyone's DXCC record, maybe there's a table of everyone's WAS record, maybe there's a table of everyone's WPX record, and on any QSO/QSL all these are locked. Maybe that could add up pretty fast.

Wow, armchair quarterback database design is fun. Too bad I can't figure out why inserts take so long. If I was doing LOTW with  a modern database I could do an EXPLAIN on the costly parts of each database action to help figure this out. As it is, just from running the queries that would seem to be necessary to do QSO insertion with a QSL time-range check, LOTW seems pretty dang good.
Logged
KY6R
Member

Posts: 3291


WWW

Ignore
« Reply #1 on: December 12, 2012, 08:53:26 PM »

N3QE:

Excellent analysis, and I agree that the indexing sure helps query performance and degrades bulk data loading. You must be a DBA (I have been one since Oracle V3), and have also supported PostgreSQL, MySQL and Teradata - and now HBase.

I tuned ETL at Lithium Technologies where it took > 24 hours to load the previous customers day worth of data into a data warehouse (MySQL with silly MyISAM - great for query speed - awful for data loading - couldn't run parallel loads).

To fix their problem - I horizontally partitioned the database into US / EU / ASIAPac Data Warehouses and did it by customer id (we supported multi-tennant customer databases in the cloud.

Then I multithreaded the ETL "within process" using Pentaho PDI.

Dropped the load time from > 24 hours to 10 hours. Then moved them off MySQL to HBase - a poor mans Teradata (hi hi).

This stuff costs a lot of money and time to do though.

I'd love to know the LOTW architecture.
Logged
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!