Actions:
|
2013-07-09 08:01 AEST by Glen Starrett - Add new option for rlog to output just unique tags. To be used in place of
sending back full output of rlog -h and then parsing it out on the client, for
the benefit of very large repositories.
The TortoiseCVS function "refresh list" (for a list of available tags) with the
'search subfolders' option checked runs "cvs ... rlog -h MODULE" then parses
that output for tag names, then sorts them to get unique entries. The
additional processing takes approximately 2 orders of magnitude longer than
simply running the command at a command line (e.g. 9 minutes in TCVS vs 12
seconds at the command line for a large-ish number of files, or 5 hours vs 10
minutes for a very large repository).
Moving this processing back to the server should improve the performance
greatly: reduce the amount of data sent back to the client and reduce
processing on the client. |
|
2013-07-09 08:05 AEST by Glen Starrett - Created an attachment (id=2645)
Log files and test data
I spent some time (wow... 2 hours) this morning collecting data and traces on
this, but I'm pretty much at the conclusion that he's got a TON of data and
should use that "update list" button very, very sparingly (or switch to EVS).
The output from the rlog command that I'm using is just shy of a million lines
(978,974). From the command line it takes ~16 seconds to save it off to a
file, or 206 seconds to output it to the screen. I thought about that (after
doing other stuff) and timed the time to just "type rlog-out.txt" to the
console and that covers about 90% of the difference in those times. In other
words, it just takes time to output that much data.
Now, Tortoise isn't outputting it. It's scanning it for tags -- so it should
be more efficient than output to console, but not as good as redirected to
file. It isn't in that range, it's actually over the output to console timing.
My guess is that it's scanning method isn't as efficient as it could be and
the list storage is POOR (I think it takes disproportionally longer as the list
size grows, which is why the customer is seeing 32:1 vs my test 16:1 TCVS to
CLI to file time ratios).
That's my guess anyway. What do you think? Should I increase the size of my
test set to correspond more closely to the customer's timing to see if the
ratio holds true vs. increases (and if so I need a better way to time the TCVS
"update list" window because watching it is TEDIOUS).
Notes follow, traces attached. The trace files correspond to the numbers next
to the "trace" entries in the notes below.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
MH: Picking up again on the slow performance of TCVS 'fetch list' (rlog
command). Running on the VM (just 1 pass each, I'm not trying to establish
statistical regularity here, but I did throw away the first to allow the
repository to be somewhat loaded in cache).
Working with a output length of 978,974 lines.
CLI Redirect to file 16.3 sec
CLI Redirected to file (no focus on VM) 15.6 sec
CLI to screen 206.5 sec
CLI traced (to file) 2412 & 2624 15.8 sec
CVS Suite TortoiseCVS regular 258 sec
CVS Suite TortoiseCVS traced 2440 & 1672 655 sec
OSS TortoiseCVS regular 220 sec
Start 7:42:00am end 7:46:18 >> 4min 18s >> 4*60+18=258
Start 7:59:00am end 8:09:55am >> 10 55 >> 10*60+55=655
NOTE: TCVS is also using the "-x" option on the command, which encrypts all
net traffic. I haven't set that option in the prefs (it's at default values)
so I've no idea why that is on but it's not included in my CLI tests so I need
to re-check those to see if that affects the timing.
CLI Redirected to file + encrypted + traced 17.28s
CLI Redirected to file + encrypted 19.08s, 17.5s
CLI to screen + encrypted 214.9s
>> Doesn't seem significant when compared to above.
Uninstalling CVS Suite / Installing OSS CVS Suite
>> Saved VM state with bookmark, reverted to install OSS TCVS.
Installing TortoiseCVS 1.12.5.
Start 8:36:00am end 8:39:40am >> 3min 40 sec >> 3*60+40=220 sec
The results seem to be more tied to "doing something on client end" than
straight output. I'm wondering if the I run the output to file, then do a type
on the file, if that will have a total that looks like the 'output to screen'
results. E.g.:
Cvs -d … rlog mingw > log.txt && type log.txt
Does that equal the "CLI to screen" thing?
Type rlog-out.txt 161.7s
Total for CLI to file + type output 16+161.7=177.7
Difference in totals 177.7-206.5=-28.8
>> Looks like MOST of the difference in time is due to straight console output
windows delays.
TCVS to CLI ratio in TEST: 258/16=16.125
Customer tests are 5-10 min at CLI and 5.25 hours with TCVS.
5.25 hours to seconds: 5.25*60*60=18,900
TCVS to CLI ratio by Customer: 18900/600=31.5
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|