|
||
---|---|---|
cookie | ||
download | ||
.envrc | ||
.gitignore | ||
Cargo.lock | ||
Cargo.toml | ||
COPYING | ||
flake.lock | ||
flake.nix | ||
README.md | ||
rust-toolchain.toml | ||
screenshot.jpg |
NZZ Downloader
The NZZ is the Swiss Swiss newspaper of record. Its first issue was all the way back in 1780. Even better you can download every single issue ever released (if you have a subscription of course).
This little tool helps with downloading all released issues in a specified time span.
It was written because the archive website is not very friendly and because it is not possible to download everything within a specified time span.
Installation
You need to be comfortable with the command line to use the nzz downloader and it has only been tested on *nix systems though it should work fine on Windows or macOS.
Prerequisites
Build the binaries with cargo: cargo build --release
(they are created as
target/release/{nzz-cookie,nzz-download}
).
Usage
Authentication
The downloader needs an authentication cookie to work. With the binary
nzz-cookie
(and the huge help of the webdriver api + geckodriver) this can be
generated.
Usage: nzz-cookie --username <USERNAME>
Options:
-u, --username <USERNAME> Username [env: USERNAME=]
-h, --help Print help
-V, --version Print version
Provide the password from stdin.
The resulting cookie is printed to stdout.
Download
Usage: nzz-download [OPTIONS] --from <FROM> --to <TO>
Options:
-f, --from <FROM> Earliest issue to download (like 1780-12-31) [env: FROM=]
-t, --to <TO> Latest issue to download (like 1780-12-31) [env: TO=]
-o, --output-dir <OUTPUT_DIR> Output directory [env: OUTPUT_DIR=] [default: ./nzz]
-h, --help Print help
-V, --version Print version
Provide the authentication cookie from stdin.
Example
Login and use the resulting cookie to download all issues from 2024-06-01 until
2024-06-05 to the default directory "./nzz", reading the password from the file
pw
:
nzz-cookie -u 'myuser@example.com' <pw | nzz-download -f 2024-06-01 -t 2024-06-05
Caveats
There are no retries on a failed downloads so far, it just crashes. Stemming from that I would advise not to try and download big ranges at once until that is fixed.
License
Licensed as AGPL-3.0.