1. sync everything except for repomd.xml 2. then sync repomd.xml files only, and invalidate caches 3. gently wait a bit to give current downloads a chance 4. delete old metadata, shouldn't be needed --- roles/s3-mirror/files/s3-sync-path.sh | 99 ++++++++++++++------------- 1 file changed, 53 insertions(+), 46 deletions(-)
diff --git a/roles/s3-mirror/files/s3-sync-path.sh b/roles/s3-mirror/files/s3-sync-path.sh index 79b4d63eb..5a414e3ad 100644 --- a/roles/s3-mirror/files/s3-sync-path.sh +++ b/roles/s3-mirror/files/s3-sync-path.sh @@ -9,58 +9,65 @@ if [[ "$1" == "" ]] || [[ $1 != /pub* ]] || [[ $1 != */ ]]; then exit 1 fi
+aws_sync=( aws s3 sync --no-follow-symlinks ) + # first run do not delete anything or copy the repodata. -CMD1="aws s3 sync \ - --exclude */repodata/* \ - --exclude *.snapshot/* \ - --exclude *source/* \ - --exclude *SRPMS/* \ - --exclude *debug/* \ - --exclude *beta/* \ - --exclude *ppc/* \ - --exclude *ppc64/* \ - --exclude *repoview/* \ - --exclude *Fedora/* \ - --exclude *EFI/* \ - --exclude *core/* \ - --exclude *extras/* \ - --exclude *LiveOS/* \ - --exclude *development/rawhide/* \ - --no-follow-symlinks \ - --only-show-errors \ - " - #--dryrun \ +exclude=( + --exclude "*/repodata/*" + --exclude "*.snapshot/*" + --exclude "*source/*" + --exclude "*SRPMS/*" + --exclude "*debug/*" + --exclude "*beta/*" + --exclude "*ppc/*" + --exclude "*ppc64/*" + --exclude "*repoview/*" + --exclude "*Fedora/*" + --exclude "*EFI/*" + --exclude "*core/*" + --exclude "*extras/*" + --exclude "*LiveOS/*" + --exclude "*development/rawhide/*" + --only-show-errors +)
-# second we delete old content and also copy the repodata -CMD2="aws s3 sync \ - --delete \ - --exclude *.snapshot/* \ - --exclude *source/* \ - --exclude *SRPMS/* \ - --exclude *debug/* \ - --exclude *beta/* \ - --exclude *ppc/* \ - --exclude *ppc64/* \ - --exclude *repoview/* \ - --exclude *Fedora/* \ - --exclude *EFI/* \ - --exclude *core/* \ - --exclude *extras/* \ - --exclude *LiveOS/* \ - --exclude *development/rawhide/* \ - --no-follow-symlinks \ - --only-show-errors \ - " - #--dryrun \ +S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120 + +# First run this command that syncs, but does not delete. +# It also excludes repomd.xml. +CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) + +# Next we run this command which syncs repomd.xml files. Include must precede +# the large set of excludes. Make sure that the 'max-age' isn't too large so +# we know that we can start removing old data ASAP. +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" + --cache-control "max-age=$MAX_CACHE_SEC" ) + +# Then we delete old RPMs and old metadata (but after invalidating caches). +CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
#echo "$CMD /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1" echo "Starting $1 sync at $(date)" >> /var/log/s3-mirror/timestamps -$CMD1 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 -$CMD1 /srv$1/repodata/ s3://s3-mirror-us-west-1-02.fedoraproject.org$1/repodata/ +"${CMD1[@]}" "/srv$1" "s3://$S3_MIRROR$1" +"${CMD2[@]}" "/srv$1" "s3://$S3_MIRROR$1" + # Always do the invalidations because they are quick and prevent issues # depending on which path is synced. -for file in $(echo $1/repodata/* ); do - aws cloudfront create-invalidation --distribution-id E2KJMDC0QAJDMU --paths "$file" > /dev/null +for file in $(echo $1/repodata/repomd.xml ); do + aws cloudfront create-invalidation --distribution-id $DIST_ID --paths "$file" > /dev/null done -$CMD2 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 + +SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT )) +echo "Ready $1 sync, giving dnf downloads ${SLEEP}s before delete, at $(date)" >> /var/log/s3-mirror/timestamps + +# Consider some DNF processes started downloading metadata before we invalidated +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP + +"${CMD3[@]}" "/srv$1" "s3://$S3_MIRROR$1" + echo "Ending $1 sync at $(date)" >> /var/log/s3-mirror/timestamps
1. sync everything except for repomd.xml 2. then sync repomd.xml files only, and invalidate caches 3. gently wait a bit to give current downloads a chance 4. delete outdated RPMs and metadata, shouldn't be needed
Also make the sleep/cache configurable. --- roles/s3-mirror/files/s3-sync-path.sh | 99 ++++++++++++++------------- roles/s3-mirror/files/s3.sh | 19 +++-- 2 files changed, 65 insertions(+), 53 deletions(-)
diff --git a/roles/s3-mirror/files/s3-sync-path.sh b/roles/s3-mirror/files/s3-sync-path.sh index 79b4d63eb..5a414e3ad 100644 --- a/roles/s3-mirror/files/s3-sync-path.sh +++ b/roles/s3-mirror/files/s3-sync-path.sh @@ -9,58 +9,65 @@ if [[ "$1" == "" ]] || [[ $1 != /pub* ]] || [[ $1 != */ ]]; then exit 1 fi
+aws_sync=( aws s3 sync --no-follow-symlinks ) + # first run do not delete anything or copy the repodata. -CMD1="aws s3 sync \ - --exclude */repodata/* \ - --exclude *.snapshot/* \ - --exclude *source/* \ - --exclude *SRPMS/* \ - --exclude *debug/* \ - --exclude *beta/* \ - --exclude *ppc/* \ - --exclude *ppc64/* \ - --exclude *repoview/* \ - --exclude *Fedora/* \ - --exclude *EFI/* \ - --exclude *core/* \ - --exclude *extras/* \ - --exclude *LiveOS/* \ - --exclude *development/rawhide/* \ - --no-follow-symlinks \ - --only-show-errors \ - " - #--dryrun \ +exclude=( + --exclude "*/repodata/*" + --exclude "*.snapshot/*" + --exclude "*source/*" + --exclude "*SRPMS/*" + --exclude "*debug/*" + --exclude "*beta/*" + --exclude "*ppc/*" + --exclude "*ppc64/*" + --exclude "*repoview/*" + --exclude "*Fedora/*" + --exclude "*EFI/*" + --exclude "*core/*" + --exclude "*extras/*" + --exclude "*LiveOS/*" + --exclude "*development/rawhide/*" + --only-show-errors +)
-# second we delete old content and also copy the repodata -CMD2="aws s3 sync \ - --delete \ - --exclude *.snapshot/* \ - --exclude *source/* \ - --exclude *SRPMS/* \ - --exclude *debug/* \ - --exclude *beta/* \ - --exclude *ppc/* \ - --exclude *ppc64/* \ - --exclude *repoview/* \ - --exclude *Fedora/* \ - --exclude *EFI/* \ - --exclude *core/* \ - --exclude *extras/* \ - --exclude *LiveOS/* \ - --exclude *development/rawhide/* \ - --no-follow-symlinks \ - --only-show-errors \ - " - #--dryrun \ +S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120 + +# First run this command that syncs, but does not delete. +# It also excludes repomd.xml. +CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) + +# Next we run this command which syncs repomd.xml files. Include must precede +# the large set of excludes. Make sure that the 'max-age' isn't too large so +# we know that we can start removing old data ASAP. +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" + --cache-control "max-age=$MAX_CACHE_SEC" ) + +# Then we delete old RPMs and old metadata (but after invalidating caches). +CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
#echo "$CMD /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1" echo "Starting $1 sync at $(date)" >> /var/log/s3-mirror/timestamps -$CMD1 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 -$CMD1 /srv$1/repodata/ s3://s3-mirror-us-west-1-02.fedoraproject.org$1/repodata/ +"${CMD1[@]}" "/srv$1" "s3://$S3_MIRROR$1" +"${CMD2[@]}" "/srv$1" "s3://$S3_MIRROR$1" + # Always do the invalidations because they are quick and prevent issues # depending on which path is synced. -for file in $(echo $1/repodata/* ); do - aws cloudfront create-invalidation --distribution-id E2KJMDC0QAJDMU --paths "$file" > /dev/null +for file in $(echo $1/repodata/repomd.xml ); do + aws cloudfront create-invalidation --distribution-id $DIST_ID --paths "$file" > /dev/null done -$CMD2 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 + +SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT )) +echo "Ready $1 sync, giving dnf downloads ${SLEEP}s before delete, at $(date)" >> /var/log/s3-mirror/timestamps + +# Consider some DNF processes started downloading metadata before we invalidated +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP + +"${CMD3[@]}" "/srv$1" "s3://$S3_MIRROR$1" + echo "Ending $1 sync at $(date)" >> /var/log/s3-mirror/timestamps diff --git a/roles/s3-mirror/files/s3.sh b/roles/s3-mirror/files/s3.sh index c157b0cdb..df58ac153 100644 --- a/roles/s3-mirror/files/s3.sh +++ b/roles/s3-mirror/files/s3.sh @@ -88,6 +88,11 @@ excludes=( --exclude "*/updates/testing/29/*" )
+S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120 + # First run this command that syncs, but does not delete. # It also excludes repomd.xml. CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) @@ -95,14 +100,12 @@ CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) # Next we run this command which syncs repomd.xml files. Include must precede # the large set of excludes. Make sure that the 'max-age' isn't too large so # we know that we can start removing old data ASAP. -CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" --cache-control max-age=300 ) +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" + --cache-control "max-age=$MAX_CACHE_SEC" )
# Then we delete old RPMs and old metadata (but after invalidating caches). CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
-S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org -DIST_ID=E2KJMDC0QAJDMU - # Sync EPEL #echo $CMD /srv/pub/epel/ s3://$S3_MIRROR/pub/epel/ echo "Starting EPEL sync at $(date)" >> /var/log/s3-mirror/timestamps @@ -132,10 +135,12 @@ for file in $(echo /srv/pub/fedora/linux/updates/*/*/*/repodata/repomd.xml | sed aws cloudfront create-invalidation --distribution-id "$DIST_ID" --paths "$file" done
+SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT )) + # Consider some DNF processes started downloading metadata before we invalidated -# caches, and started with outdated repomd.xml file. Give it 10 minutes so they -# have chance to download the rest of metadata and RPMs. -sleep 600 +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP
"${CMD3[@]}" /srv/pub/epel/ "s3://$S3_MIRROR/pub/epel/" "${CMD3[@]}" /srv/pub/fedora/ s3://$S3_MIRROR/pub/fedora/
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Besides that I think it is very good change.
Adrian
On Fri, Mar 27, 2020 at 09:33:03AM +0100, Pavel Raiskup wrote:
- sync everything except for repomd.xml
- then sync repomd.xml files only, and invalidate caches
- gently wait a bit to give current downloads a chance
- delete outdated RPMs and metadata, shouldn't be needed
Also make the sleep/cache configurable.
roles/s3-mirror/files/s3-sync-path.sh | 99 ++++++++++++++------------- roles/s3-mirror/files/s3.sh | 19 +++-- 2 files changed, 65 insertions(+), 53 deletions(-)
diff --git a/roles/s3-mirror/files/s3-sync-path.sh b/roles/s3-mirror/files/s3-sync-path.sh index 79b4d63eb..5a414e3ad 100644 --- a/roles/s3-mirror/files/s3-sync-path.sh +++ b/roles/s3-mirror/files/s3-sync-path.sh @@ -9,58 +9,65 @@ if [[ "$1" == "" ]] || [[ $1 != /pub* ]] || [[ $1 != */ ]]; then exit 1 fi
+aws_sync=( aws s3 sync --no-follow-symlinks )
# first run do not delete anything or copy the repodata. -CMD1="aws s3 sync \
- --exclude */repodata/* \
- --exclude *.snapshot/* \
- --exclude *source/* \
- --exclude *SRPMS/* \
- --exclude *debug/* \
- --exclude *beta/* \
- --exclude *ppc/* \
- --exclude *ppc64/* \
- --exclude *repoview/* \
- --exclude *Fedora/* \
- --exclude *EFI/* \
- --exclude *core/* \
- --exclude *extras/* \
- --exclude *LiveOS/* \
- --exclude *development/rawhide/* \
- --no-follow-symlinks \
- --only-show-errors \
- "
- #--dryrun \
+exclude=(
- --exclude "*/repodata/*"
- --exclude "*.snapshot/*"
- --exclude "*source/*"
- --exclude "*SRPMS/*"
- --exclude "*debug/*"
- --exclude "*beta/*"
- --exclude "*ppc/*"
- --exclude "*ppc64/*"
- --exclude "*repoview/*"
- --exclude "*Fedora/*"
- --exclude "*EFI/*"
- --exclude "*core/*"
- --exclude "*extras/*"
- --exclude "*LiveOS/*"
- --exclude "*development/rawhide/*"
- --only-show-errors
+)
-# second we delete old content and also copy the repodata -CMD2="aws s3 sync \
- --delete \
- --exclude *.snapshot/* \
- --exclude *source/* \
- --exclude *SRPMS/* \
- --exclude *debug/* \
- --exclude *beta/* \
- --exclude *ppc/* \
- --exclude *ppc64/* \
- --exclude *repoview/* \
- --exclude *Fedora/* \
- --exclude *EFI/* \
- --exclude *core/* \
- --exclude *extras/* \
- --exclude *LiveOS/* \
- --exclude *development/rawhide/* \
- --no-follow-symlinks \
- --only-show-errors \
- "
- #--dryrun \
+S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120
+# First run this command that syncs, but does not delete. +# It also excludes repomd.xml. +CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" )
+# Next we run this command which syncs repomd.xml files. Include must precede +# the large set of excludes. Make sure that the 'max-age' isn't too large so +# we know that we can start removing old data ASAP. +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}"
--cache-control "max-age=$MAX_CACHE_SEC" )
+# Then we delete old RPMs and old metadata (but after invalidating caches). +CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
#echo "$CMD /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1" echo "Starting $1 sync at $(date)" >> /var/log/s3-mirror/timestamps -$CMD1 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 -$CMD1 /srv$1/repodata/ s3://s3-mirror-us-west-1-02.fedoraproject.org$1/repodata/ +"${CMD1[@]}" "/srv$1" "s3://$S3_MIRROR$1" +"${CMD2[@]}" "/srv$1" "s3://$S3_MIRROR$1"
# Always do the invalidations because they are quick and prevent issues # depending on which path is synced. -for file in $(echo $1/repodata/* ); do
- aws cloudfront create-invalidation --distribution-id E2KJMDC0QAJDMU --paths "$file" > /dev/null
+for file in $(echo $1/repodata/repomd.xml ); do
- aws cloudfront create-invalidation --distribution-id $DIST_ID --paths "$file" > /dev/null
done -$CMD2 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1
+SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT )) +echo "Ready $1 sync, giving dnf downloads ${SLEEP}s before delete, at $(date)" >> /var/log/s3-mirror/timestamps
+# Consider some DNF processes started downloading metadata before we invalidated +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP
+"${CMD3[@]}" "/srv$1" "s3://$S3_MIRROR$1"
echo "Ending $1 sync at $(date)" >> /var/log/s3-mirror/timestamps diff --git a/roles/s3-mirror/files/s3.sh b/roles/s3-mirror/files/s3.sh index c157b0cdb..df58ac153 100644 --- a/roles/s3-mirror/files/s3.sh +++ b/roles/s3-mirror/files/s3.sh @@ -88,6 +88,11 @@ excludes=( --exclude "*/updates/testing/29/*" )
+S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120
# First run this command that syncs, but does not delete. # It also excludes repomd.xml. CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) @@ -95,14 +100,12 @@ CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) # Next we run this command which syncs repomd.xml files. Include must precede # the large set of excludes. Make sure that the 'max-age' isn't too large so # we know that we can start removing old data ASAP. -CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" --cache-control max-age=300 ) +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}"
--cache-control "max-age=$MAX_CACHE_SEC" )
# Then we delete old RPMs and old metadata (but after invalidating caches). CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
-S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org -DIST_ID=E2KJMDC0QAJDMU
# Sync EPEL #echo $CMD /srv/pub/epel/ s3://$S3_MIRROR/pub/epel/ echo "Starting EPEL sync at $(date)" >> /var/log/s3-mirror/timestamps @@ -132,10 +135,12 @@ for file in $(echo /srv/pub/fedora/linux/updates/*/*/*/repodata/repomd.xml | sed aws cloudfront create-invalidation --distribution-id "$DIST_ID" --paths "$file" done
+SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT ))
# Consider some DNF processes started downloading metadata before we invalidated -# caches, and started with outdated repomd.xml file. Give it 10 minutes so they -# have chance to download the rest of metadata and RPMs. -sleep 600 +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP
"${CMD3[@]}" /srv/pub/epel/ "s3://$S3_MIRROR/pub/epel/" "${CMD3[@]}" /srv/pub/fedora/ s3://$S3_MIRROR/pub/fedora/ -- 2.25.1 _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
Adrian
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Looks good. I have not verified everything again, but the excludes are looking correct now.
Looking forward to see if this improves the caching behaviour of repomd.xml.
Adrian
From 9bad0d6a2dcf48b1d611290e11e1ad38c27a6eff Mon Sep 17 00:00:00 2001
From: Pavel Raiskup praiskup@redhat.com Date: Fri, 27 Mar 2020 09:16:12 +0100 Subject: [PATCH] s3-mirror: sync s3-sync-path script with ideas from s3.sh
- sync everything except for repomd.xml
- then sync repomd.xml files only, and invalidate caches
- gently wait a bit to give current downloads a chance
- delete outdated RPMs and metadata, shouldn't be needed
Also make the sleep/cache configurable.
roles/s3-mirror/files/s3-sync-path.sh | 98 ++++++++++++++------------- roles/s3-mirror/files/s3.sh | 19 ++++-- 2 files changed, 64 insertions(+), 53 deletions(-)
diff --git a/roles/s3-mirror/files/s3-sync-path.sh b/roles/s3-mirror/files/s3-sync-path.sh index 79b4d63eb..4913767f7 100644 --- a/roles/s3-mirror/files/s3-sync-path.sh +++ b/roles/s3-mirror/files/s3-sync-path.sh @@ -9,58 +9,64 @@ if [[ "$1" == "" ]] || [[ $1 != /pub* ]] || [[ $1 != */ ]]; then exit 1 fi
+aws_sync=( aws s3 sync --no-follow-symlinks )
# first run do not delete anything or copy the repodata. -CMD1="aws s3 sync \
- --exclude */repodata/* \
- --exclude *.snapshot/* \
- --exclude *source/* \
- --exclude *SRPMS/* \
- --exclude *debug/* \
- --exclude *beta/* \
- --exclude *ppc/* \
- --exclude *ppc64/* \
- --exclude *repoview/* \
- --exclude *Fedora/* \
- --exclude *EFI/* \
- --exclude *core/* \
- --exclude *extras/* \
- --exclude *LiveOS/* \
- --exclude *development/rawhide/* \
- --no-follow-symlinks \
- --only-show-errors \
- "
- #--dryrun \
+exclude=(
- --exclude "*.snapshot/*"
- --exclude "*source/*"
- --exclude "*SRPMS/*"
- --exclude "*debug/*"
- --exclude "*beta/*"
- --exclude "*ppc/*"
- --exclude "*ppc64/*"
- --exclude "*repoview/*"
- --exclude "*Fedora/*"
- --exclude "*EFI/*"
- --exclude "*core/*"
- --exclude "*extras/*"
- --exclude "*LiveOS/*"
- --exclude "*development/rawhide/*"
- --only-show-errors
+)
-# second we delete old content and also copy the repodata -CMD2="aws s3 sync \
- --delete \
- --exclude *.snapshot/* \
- --exclude *source/* \
- --exclude *SRPMS/* \
- --exclude *debug/* \
- --exclude *beta/* \
- --exclude *ppc/* \
- --exclude *ppc64/* \
- --exclude *repoview/* \
- --exclude *Fedora/* \
- --exclude *EFI/* \
- --exclude *core/* \
- --exclude *extras/* \
- --exclude *LiveOS/* \
- --exclude *development/rawhide/* \
- --no-follow-symlinks \
- --only-show-errors \
- "
- #--dryrun \
+S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120
+# First run this command that syncs, but does not delete. +# It also excludes repomd.xml. +CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" )
+# Next we run this command which syncs repomd.xml files. Include must precede +# the large set of excludes. Make sure that the 'max-age' isn't too large so +# we know that we can start removing old data ASAP. +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}"
--cache-control "max-age=$MAX_CACHE_SEC" )
+# Then we delete old RPMs and old metadata (but after invalidating caches). +CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
#echo "$CMD /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1" echo "Starting $1 sync at $(date)" >> /var/log/s3-mirror/timestamps -$CMD1 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1 -$CMD1 /srv$1/repodata/ s3://s3-mirror-us-west-1-02.fedoraproject.org$1/repodata/ +"${CMD1[@]}" "/srv$1" "s3://$S3_MIRROR$1" +"${CMD2[@]}" "/srv$1" "s3://$S3_MIRROR$1"
# Always do the invalidations because they are quick and prevent issues # depending on which path is synced. -for file in $(echo $1/repodata/* ); do
- aws cloudfront create-invalidation --distribution-id E2KJMDC0QAJDMU --paths "$file" > /dev/null
+for file in $(echo $1/repodata/repomd.xml ); do
- aws cloudfront create-invalidation --distribution-id $DIST_ID --paths "$file" > /dev/null
done -$CMD2 /srv$1 s3://s3-mirror-us-west-1-02.fedoraproject.org$1
+SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT )) +echo "Ready $1 sync, giving dnf downloads ${SLEEP}s before delete, at $(date)" >> /var/log/s3-mirror/timestamps
+# Consider some DNF processes started downloading metadata before we invalidated +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP
+"${CMD3[@]}" "/srv$1" "s3://$S3_MIRROR$1"
echo "Ending $1 sync at $(date)" >> /var/log/s3-mirror/timestamps diff --git a/roles/s3-mirror/files/s3.sh b/roles/s3-mirror/files/s3.sh index c157b0cdb..df58ac153 100644 --- a/roles/s3-mirror/files/s3.sh +++ b/roles/s3-mirror/files/s3.sh @@ -88,6 +88,11 @@ excludes=( --exclude "*/updates/testing/29/*" )
+S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org +DIST_ID=E2KJMDC0QAJDMU +MAX_CACHE_SEC=60 +DNF_GENTLY_TIMEOUT=120
# First run this command that syncs, but does not delete. # It also excludes repomd.xml. CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) @@ -95,14 +100,12 @@ CMD1=( "${aws_sync[@]}" "${excludes[@]}" --exclude "*/repomd.xml" ) # Next we run this command which syncs repomd.xml files. Include must precede # the large set of excludes. Make sure that the 'max-age' isn't too large so # we know that we can start removing old data ASAP. -CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}" --cache-control max-age=300 ) +CMD2=( "${aws_sync[@]}" --exclude "*" --include "*/repomd.xml" "${excludes[@]}"
--cache-control "max-age=$MAX_CACHE_SEC" )
# Then we delete old RPMs and old metadata (but after invalidating caches). CMD3=( "${aws_sync[@]}" "${excludes[@]}" --delete )
-S3_MIRROR=s3-mirror-us-west-1-02.fedoraproject.org -DIST_ID=E2KJMDC0QAJDMU
# Sync EPEL #echo $CMD /srv/pub/epel/ s3://$S3_MIRROR/pub/epel/ echo "Starting EPEL sync at $(date)" >> /var/log/s3-mirror/timestamps @@ -132,10 +135,12 @@ for file in $(echo /srv/pub/fedora/linux/updates/*/*/*/repodata/repomd.xml | sed aws cloudfront create-invalidation --distribution-id "$DIST_ID" --paths "$file" done
+SLEEP=$(( MAX_CACHE_SEC + DNF_GENTLY_TIMEOUT ))
# Consider some DNF processes started downloading metadata before we invalidated -# caches, and started with outdated repomd.xml file. Give it 10 minutes so they -# have chance to download the rest of metadata and RPMs. -sleep 600 +# caches, and started with outdated repomd.xml file. Give it few more seconds +# so they have chance to download the rest of metadata and RPMs. +sleep $SLEEP
"${CMD3[@]}" /srv/pub/epel/ "s3://$S3_MIRROR/pub/epel/" "${CMD3[@]}" /srv/pub/fedora/ s3://$S3_MIRROR/pub/fedora/ -- 2.25.1
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
Looks ok to me from a quick glance... lets give it a try. :)
I think also we should remove the test releases one. It's currently failing the invalidate step (because it's repodata is not in the same place as the updates repos) and the only reason we would want say 32_Beta to be there is so we could point people to download isos from it. The repodata will never change.
kevin
On Fri, Mar 27, 2020 at 05:14:48PM -0700, Kevin Fenzi wrote:
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
Looks ok to me from a quick glance... lets give it a try. :)
I think also we should remove the test releases one. It's currently failing the invalidate step (because it's repodata is not in the same place as the updates repos) and the only reason we would want say 32_Beta to be there is so we could point people to download isos from it. The repodata will never change.
Accessing cloudfront I can now see the correct header:
cache-control: max-age=60
and the age header never gets larger than 60
I can see 'age: 59', but each request after that gives me an
x-cache: RefreshHit from cloudfront
From the header returned by cloudfront it seems we are now never seeing repomd.xml files older than 1 minute.
I would like to turn on the cloudfront mirror again to see if COPR still breaks. Any objections?
Adrian
On Sat, Mar 28, 2020 at 11:04:51AM +0100, Adrian Reber wrote:
On Fri, Mar 27, 2020 at 05:14:48PM -0700, Kevin Fenzi wrote:
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
Looks ok to me from a quick glance... lets give it a try. :)
I think also we should remove the test releases one. It's currently failing the invalidate step (because it's repodata is not in the same place as the updates repos) and the only reason we would want say 32_Beta to be there is so we could point people to download isos from it. The repodata will never change.
Accessing cloudfront I can now see the correct header:
cache-control: max-age=60
and the age header never gets larger than 60
I can see 'age: 59', but each request after that gives me an
x-cache: RefreshHit from cloudfront
From the header returned by cloudfront it seems we are now never seeing repomd.xml files older than 1 minute.
I would like to turn on the cloudfront mirror again to see if COPR still breaks. Any objections?
None here, +1
kevin
On Fri, Mar 27, 2020 at 05:14:48PM -0700, Kevin Fenzi wrote:
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
Looks ok to me from a quick glance... lets give it a try. :)
I think also we should remove the test releases one. It's currently failing the invalidate step (because it's repodata is not in the same place as the updates repos) and the only reason we would want say 32_Beta to be there is so we could point people to download isos from it. The repodata will never change.
Just to confirm from the MirrorManager side. MirrorManager is not aware of any repomd.xml files under releases/test. For 32 MirrorManager points to development/32 or updates/32 and updates/testing/32.
So MirrorManager will not redirect any clients to releases/test except for ISOs.
Adrian
On Sat, Mar 28, 2020 at 11:09:42AM +0100, Adrian Reber wrote:
On Fri, Mar 27, 2020 at 05:14:48PM -0700, Kevin Fenzi wrote:
On Fri, Mar 27, 2020 at 12:06:22PM +0100, Pavel Raiskup wrote:
On Friday, March 27, 2020 11:34:28 AM CET Adrian Reber wrote:
The global 'exclude' has '--exclude "*/repodata/*"' and you are using "${excludes[@]}" everywhere. In all three syncs. This looks like 'repodata/*' will never synced.
Good catch! Thank you for the review. I am attaching updated patch.
Pavel
Looks ok to me from a quick glance... lets give it a try. :)
I think also we should remove the test releases one. It's currently failing the invalidate step (because it's repodata is not in the same place as the updates repos) and the only reason we would want say 32_Beta to be there is so we could point people to download isos from it. The repodata will never change.
Just to confirm from the MirrorManager side. MirrorManager is not aware of any repomd.xml files under releases/test. For 32 MirrorManager points to development/32 or updates/32 and updates/testing/32.
So MirrorManager will not redirect any clients to releases/test except for ISOs.
Yeah, so perhaps we should consider just syncing the isos? And it only needs to run once, so perhaps this could be a manually run script/playbook.
kevin
infrastructure@lists.fedoraproject.org