fetch-content

vbatts/fetch-content

Fork 0

Commit graph

Author	SHA1	Message	Date
Vincent Batts	4400ea5bca	main.go: iterated on one fix and it's literally off to the races I'm honestly shocked that it even compiled first time, and remotely did as expected, much less _exactly_ as expected. Prompt to Claude.ai using 3.7 Sonnet: ``` this is awesome. Though on line 303, the os.Rename fails because /tmp is on a different device partition than the downloads/ directory. This should instead Create and write a new file and remove the old one. ``` Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>	2025-04-03 22:04:35 -04:00
Vincent Batts	f78b17c047	main.go: actually ... first iteration from using claude.ai ... Prompt: ``` I'm going to give the transcript of me brainstorming an application that I want to write. I was thinking of golang, but it could be in C++ also. Please organize my brainstorming so that I can begin building out this application: This application is going to need to read in my list of URLs and do a WAIT group for the Go routines to go out and fetch each of those URLs. And from that fetch, there will be an HTTP response header with you content disposition that has the file name intended for that file and then for each one of these files, maybe not individually, but a whole JSON object for the whole download set to. Record the URL of the associated content distribution disposition name and then... Okay, so. Url. It's gonna probably be fetched. With some kind of a watcher and writer so that it's calculating a sha-1 sum for that file. That can't all be done in memory, so it might need to like actually ride it to some temp file. As it's calculating the sha-1 sum and successfully downloads it, and if successfully downloaded. Then it writes it to a file. Based on this Shaw one, check some. And also logs that data structure. I don't know Json or whatever. That Maps that Shaw? One, check some and the file name that it's given to it. Maybe even some information, like the e-tag. And if the headers show any kind of last modified date? That would be relevant. As well as the content disposition headers. So that you can see what the file should have been named, and then also the URL that it was fetched from. So there's. Like, correlating metadata, of which file points to which URL and such. Because I'm pretty sure that. The file names would end up having collisions. Just based on the last part of the URL path. And maybe even some of the file names from the content disposition headers might have collisions. So we can't rely on either one of those. I hope that even the Shawne hashes don't, so it might need to check. For the sha one, hash. Files existence before it starts writing. Maybe even just check other? File metadatas for the e-tag. I wonder. I wonder if I should do some kind of a scan first? Unique e-tags. I'm not sure. I feel like even the metadata. Well, yes, it can be in memory. Might not need to be a Json per file. I don't know, but then it makes it clean to keep it as a Json per file. And. It's just harder to. Traverse all of those if you're querying. How to have all that in memory? As. Also. Per file. So that if you have two files referencing two, two metadata is referencing the same file. Then you can. Sort that out. Heck! At that point, it might just be able to. Get past. Original. Um. Output from the go spider. Which has a very predictable kind of file name, you know, whatever each line is adjacent? And. Maybe just output that same Json structure? But including the new key values. In a new file or to standard out. Okay. As for the concurrency, I think this is just a. From the sync library to have a wait group. And. Obviously, some amount of like default number of concurrent. Connections. It would be nice if it would reuse the same HTTP connection for keepa lives. Uh, I don't know if that's possible. But. That would made me make it faster and less work on the web server that we're hitting. And then. Spinning off all those different go routines that would need a context to be passed around. In case you need to cancel them. Uh, probably also needs a signal Handler in case you need to terminate or kill the whole ordeal. To sit and watch for signal. Handling. Yeah, let's see here. So the setup of this program is going to be to read in the list of URLs. Go ahead and scan for only the unique ones in case there's duplicates. And. Check for a the output metadata file, so you might want to have an output directory. Somehow come up with. Check sum of? The output there, maybe? Check some of? Url string, I don't know. Our list of URL strings. ``` Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>	2025-04-03 21:19:40 -04:00
Vincent Batts	a7b9478145	Initial commit Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>	2025-04-03 15:19:48 -04:00

Author

SHA1

Message

Date

Vincent Batts

4400ea5bca

main.go: iterated on one fix and it's literally off to the races

I'm honestly shocked that it even compiled first time, and remotely did
as expected, much less _exactly_ as expected.

Prompt to Claude.ai using 3.7 Sonnet:
```
this is awesome. Though on line 303, the os.Rename fails because /tmp is
on a different device partition than the downloads/ directory. This
should instead Create and write a new file and remove the old one.
```

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>

2025-04-03 22:04:35 -04:00

Vincent Batts

f78b17c047

main.go: actually ... first iteration from using claude.ai ...

Prompt:
```
I'm going to give the transcript of me brainstorming an application that I want to write. I was thinking of golang, but it could be in C++ also.
Please organize my brainstorming so that I can begin building out this application:

This application is going to need to read in my list of URLs and do a WAIT group for the Go routines to go out and fetch each of those URLs. And from that fetch, there will be an HTTP response header with you content disposition that has the file name intended for that file and then for each one of these files, maybe not individually, but a whole JSON object for the whole download set to.
Record the URL of the associated content distribution disposition name and then...
Okay, so. Url. It's gonna probably be fetched. With some kind of a watcher and writer so that it's calculating a sha-1 sum for that file. That can't all be done in memory, so it might need to like actually ride it to some temp file. As it's calculating the sha-1 sum and successfully downloads it, and if successfully downloaded.
Then it writes it to a file. Based on this Shaw one, check some. And also logs that data structure. I don't know Json or whatever. That Maps that Shaw? One, check some and the file name that it's given to it. Maybe even some information, like the e-tag. And if the headers show any kind of last modified date?
That would be relevant. As well as the content disposition headers. So that you can see what the file should have been named, and then also the URL that it was fetched from. So there's. Like, correlating metadata, of which file points to which URL and such. Because I'm pretty sure that.
The file names would end up having collisions. Just based on the last part of the URL path. And maybe even some of the file names from the content disposition headers might have collisions. So we can't rely on either one of those. I hope that even the Shawne hashes don't, so it might need to check.
For the sha one, hash. Files existence before it starts writing. Maybe even just check other? File metadatas for the e-tag. I wonder. I wonder if I should do some kind of a scan first? Unique e-tags. I'm not sure.
I feel like even the metadata. Well, yes, it can be in memory. Might not need to be a Json per file. I don't know, but then it makes it clean to keep it as a Json per file. And. It's just harder to. Traverse all of those if you're querying.
How to have all that in memory? As. Also. Per file. So that if you have two files referencing two, two metadata is referencing the same file. Then you can. Sort that out. Heck! At that point, it might just be able to. Get past. Original. Um. Output from the go spider.
Which has a very predictable kind of file name, you know, whatever each line is adjacent? And.
Maybe just output that same Json structure? But including the new key values. In a new file or to standard out. Okay. As for the concurrency, I think this is just a.
From the sync library to have a wait group. And. Obviously, some amount of like default number of concurrent. Connections. It would be nice if it would reuse the same HTTP connection for keepa lives. Uh, I don't know if that's possible. But. That would made me make it faster and less work on the web server that we're hitting.
And then. Spinning off all those different go routines that would need a context to be passed around. In case you need to cancel them. Uh, probably also needs a signal Handler in case you need to terminate or kill the whole ordeal. To sit and watch for signal. Handling.
Yeah, let's see here.
So the setup of this program is going to be to read in the list of URLs. Go ahead and scan for only the unique ones in case there's duplicates.
And.
Check for a the output metadata file, so you might want to have an output directory. Somehow come up with. Check sum of?
The output there, maybe? Check some of? Url string, I don't know. Our
list of URL strings.
```

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>

2025-04-03 21:19:40 -04:00

Vincent Batts

a7b9478145

Initial commit

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>

2025-04-03 15:19:48 -04:00

3 commits