when dealing with loads of URLs for PDFs, but the file names are in headers and possibly colliding, how to preserve the metadata and download them all
Prompt claude.ai using 3.7 Sonnet: ``` one additional issue that the metadata json file is also named as the hash of the PDF, but if there are two different URLs that download a matching PDF, then while the hash will be the same the metadata is actually different for the two, so it will collide there. Lets also get a sha1 hash of the URL string itself, and the metadata json file is named with hash of the PDF, then a hyphen, then hash of the URL. ``` Signed-off-by: Vincent Batts <vbatts@hashbangbash.com> |
||
---|---|---|
.gitignore | ||
go.mod | ||
LICENSE | ||
main.go | ||
README.md |
fetch-content
when dealing with loads of URLs for PDFs, but the file names are in headers and possibly colliding, how to preserve the metadata and download them all