From 99bd27ebe745c56f6a43ce342d5f803a8ab9e9d0 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Thu, 16 Feb 2006 01:32:23 -0800 Subject: [PATCH] Update TOpic script to show how old they are. Signed-off-by: Junio C Hamano --- ClonePlus.txt | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++++ ResettingPaths.txt | 64 +++++++++++++++++++++++++++ TO | 16 ++++--- 3 files changed, 199 insertions(+), 6 deletions(-) create mode 100644 ClonePlus.txt create mode 100644 ResettingPaths.txt diff --git a/ClonePlus.txt b/ClonePlus.txt new file mode 100644 index 00000000..7417623a --- /dev/null +++ b/ClonePlus.txt @@ -0,0 +1,125 @@ +From: Junio C Hamano +Subject: Re: Make "git clone" less of a deathly quiet experience +Date: Sun, 12 Feb 2006 19:36:41 -0800 +Message-ID: <7v4q3453qu.fsf@assigned-by-dhcp.cox.net> +References: + <7vwtg2o37c.fsf@assigned-by-dhcp.cox.net> + + <1139685031.4183.31.camel@evo.keithp.com> <43EEAEF3.7040202@op5.se> + <1139717510.4183.34.camel@evo.keithp.com> + <46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com> +Content-Type: text/plain; charset=us-ascii +Cc: Keith Packard , Andreas Ericsson , + Linus Torvalds , + Git Mailing List , + Petr Baudis +Return-path: +In-Reply-To: <46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com> + (Martin Langhoff's message of "Mon, 13 Feb 2006 15:06:42 +1300") + +Martin Langhoff writes: + +> +1... there should be an easy-to-compute threshold trigger to say -- +> hey, let's quit being smart and send this client the packs we got and +> get it over with. Or perhaps a client flag so large projects can +> recommend that uses do their initial clone with --gimme-all-packs? + +What upload-pack does boils down to: + + * find out the latest of what client has and what client asked. + + * run "rev-list --objects ^client ours" to make a list of + objects client needs. The actual command line has multiple + "clients" to exclude what is unneeded to be sent, and + multiple "ours" to include refs asked. When you are doing + a full clone, ^client is empty and ours is essentially + --all. + + * feed that output to "pack-objects --stdout" and send out + the result. + +If you run this command: + + $ git-rev-list --objects --all | + git-pack-objects --stdout >/dev/null + +It would say some things. The phases of operations are: + + Generating pack... + Counting objects XXXX... + Done counting XXXX objects. + Packing XXXXX objects..... + +Phase (1). Between the time it says "Generating pack..." upto +"Done counting XXXX objects.", the time is spent by rev-list to +list up all the objects to be sent out. + +Phase (2). After that, it tries to make decision what object to +delta against what other object, while twenty or so dots are +printed after "Packing XXXXX objects." (see #git irc log a +couple of days ago; Linus describes how pack building works). + +Phase (3). After the dot stops, the program becomes silent. +That is where it actually does delta compression and writeout. + +You would notice that quite a lot of time is spent in all +phases. + +There is an internal hook to create full repository pack inside +upload-pack (which is what runs on the other end when you run +fetch-pack or clone-pack), but it works slightly differently +from what you are suggesting, in that it still tries to do the +"correct" thing. It still runs "rev-list --objects --all", so +"dangling objects" are never sent out. + +We could cheat in all phases to speed things up, at the expense +of ending up sending excess objects. So let's pretend we +decided to treat everything in .git/objects/packs/pack-* (and +the ones found in alternates as well) have interesting objects +for the cloner. + +(1) This part unfortunately cannot be totally eliminated. By + assume all packs are interesting, we could use the object + names from the pack index, which is a lot cheaper than + rev-list object traversal. We still need to run rev-list + --objects --all --unpacked to pick up loose objects we would + not be able to tell by looking at the pack index to cover + the rest. + + This however needs to be done in conjunction with the second + phase change. pack-objects depends on the hint rev-list + --objects output gives it to group the blobs and trees with + the same pathnames together, and that greatly affects the + packing efficiency. Unfortunately pack index does not have + that information -- it does not know type, nor pathnames. + Type is relatively cheap to obtain but pathnames for blob + objects are inherently unavailable. + +(2) This part can be mostly eliminated for already packed + objects, because we have already decided to cheat by sending + everything, so we can just reuse how objects are deltified + in existing packs. It still needs to be done for loose + objects we collected to fill the gap in (1). + +(3) This also can be sped up by reusing what are already in + packs. Pack index records starting (but not end) offset of + each object in the pack, so we can sort by offset to find + out which part of the existing pack corresponds to what + object, to reorder the objects in the final pack. This + needs to be done somewhat carefully to preserve the locality + of objects (again, see #git log). The deltifying and + compressing for loose objects cannot be avoided. + + While we are writing things out in (3), we need to keep + track of running SHA1 sum of what we write out so that we + can fill out the correct checksum at the end, but I am + guessing that is relatively cheap compared to the + deltification and compression cost we are currently paying + in this phase. + +NB. In the #git log, Linus made it sound like I am clueless +about how pack is generated, but if you check commit 9d5ab96, +the "recency of delta is inherited from base", one of the tricks +that have a big performance impact, was done by me ;-). + + diff --git a/ResettingPaths.txt b/ResettingPaths.txt new file mode 100644 index 00000000..d101814e --- /dev/null +++ b/ResettingPaths.txt @@ -0,0 +1,64 @@ +From: Junio C Hamano +Subject: Resetting paths +Date: Thu, 09 Feb 2006 20:40:15 -0800 +Message-ID: <7vlkwjzv0w.fsf@assigned-by-dhcp.cox.net> +Content-Type: text/plain; charset=us-ascii +Return-path: + +While working on "assume unchanged" git series, I found one +thing missing from the current set of tools. + +While I worked on parts of the system that deals with the cached +lstat() information, I needed a way to debug that, so I hacked +ls-files -t option to show entries marked as "always matches the +index" with lowercase tag letters. This was primarily debugging +aid hack. + +Then I committed the whole thing with "git commit -a" by +mistake. In order to rewind the HEAD to pre-commit state, I can +say "git reset --soft HEAD^", but after doing that, now I want +to unupdate the index so that ls-files.c matches the pre-commit +HEAD. + +"git reset --mixed" is a heavy-handed tool for that. It reads +the entier index from the HEAD commit without touching the +working tree, so I would need to add the modified paths back +with "git update-index". + +The low-level voodoo to do so for this particular case is this +single liner: + + git ls-tree HEAD ls-files.c | git update-index --index-info + +Have people found themselves in similar need like this? This +could take different forms. + + * you did "git update-index" on a wrong path. This is my + example and the above voodoo is a recipe for recovery. + + * you did "git add" on a wrong path and you want to remove it. + This is easier than the above: + + git update-index --force-remove path + + * you did the above recovery from "git add" on a wrong path, + and you want to add it again. The same voodoo would work in + this case as well. + + git ls-tree HEAD path | git update-index --index-info + +We could add "git reset path..." to reduce typing for the above, +but I am wondering if it is worth it. + +BTW, this shows how "index centric" git is. With other SCM that +has only the last commit and the working tree files, you do not +have to worry any of these things, so it might appear that index +is just a nuisance. But if you do not have any "registry of +paths to be committed", you cannot do a partial commit like what +I did above ("commit changes to all files other than +ls-files.c") without listing all the paths to be committed, or +fall back on CVS style "one path at a time", breaking an atomic +commit, so there is a drawback for not having an index as well. + + + diff --git a/TO b/TO index 6fb39243..e8e17f49 100755 --- a/TO +++ b/TO @@ -37,7 +37,7 @@ sed -n \ -e '/^[^\/][^\/]\//p' | while read topic do - rebase= done= not_done= trouble= + rebase= done= not_done= trouble= date= # (1) only_next_1=`git-rev-list ^master "^$topic" ${next} | sort` @@ -55,16 +55,14 @@ do # (2) not_in_master=` - git-rev-list --pretty=oneline ^master "$topic" | - sed -e 's/^[0-9a-f]* //' + git-rev-list ^master "$topic" ` test -z "$not_in_master" && done="${LF}Fully merged -- delete." # (3) not_in_next=` - git-rev-list --pretty=oneline ^${next} "$topic" | - sed -e 's/^[0-9a-f]* / - /' + git-rev-list --pretty=oneline ^${next} "$topic" ` if test -n "$not_in_next" then @@ -72,6 +70,12 @@ do then trouble="${LF}### MODIFIED AFTER COOKED ###" fi + last=`expr "$not_in_next" : '\([0-9a-f]*\) '` + date=` + git-rev-list -1 --pretty "$last" | + sed -ne 's/^Date: *\(.*\)/ (\1)/p' + ` + not_in_next=`echo "$not_in_next" | sed -e 's/^[0-9a-f]* / - /'` not_done="${LF}Still not merged in ${next}$rebase.$LF$not_in_next" elif test -n "$done" then @@ -80,7 +84,7 @@ do not_done="${LF}Up to date." fi - echo "*** $topic ***$trouble$done$not_done" + echo "*** $topic ***$date$trouble$done$not_done" if test -z "$trouble$not_done" && test -n "$done" && -- 2.11.0