nzz-downloader/README.md

86 lines
2.5 KiB
Markdown
Raw Normal View History

2020-12-23 21:21:47 +00:00
# NZZ Downloader
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
The [NZZ](https://en.wikipedia.org/wiki/Neue_Z%C3%BCrcher_Zeitung) is the Swiss
Swiss newspaper of record. Its first issue was all the way back in 1780. Even
better you can download every single issue ever released (if you have a
subscription of course).
2020-12-23 21:21:47 +00:00
This little tool helps with downloading all released issues in a specified time
span.
2020-12-23 21:21:47 +00:00
It was written because the archive website is not very friendly and because it
is not possible to download everything within a specified time span.
2020-12-23 21:21:47 +00:00
2020-12-23 22:07:53 +00:00
![screenshot](screenshot.jpg)
2020-12-23 21:21:47 +00:00
## Installation
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
You need to be comfortable with the command line to use the nzz downloader and
it has only been tested on *nix systems though it should work fine on Windows or
macOS.
### Prerequisites
2020-12-23 21:21:47 +00:00
- [Firefox](https://www.mozilla.org/en-US/firefox/download/thanks/)
- [geckodriver](https://github.com/mozilla/geckodriver/releases)
- [rust](https://www.rust-lang.org/learn/get-started)
Build the binaries with cargo: `cargo build --release` (they are created as
`target/release/{nzz-cookie,nzz-download}`).
2020-12-23 21:21:47 +00:00
## Usage
2024-07-02 06:38:16 +00:00
### Authentication
The downloader needs an authentication cookie to work. With the binary
`nzz-cookie` (and the huge help of the webdriver api + geckodriver) this can be
generated.
2020-12-23 21:21:47 +00:00
```
Usage: nzz-cookie --username <USERNAME>
2020-12-23 21:21:47 +00:00
Options:
-u, --username <USERNAME> Username [env: USERNAME=]
-h, --help Print help
-V, --version Print version
Provide the password from stdin.
2020-12-23 21:21:47 +00:00
```
The resulting cookie is printed to stdout.
2024-07-02 06:38:16 +00:00
### Download
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
```
Usage: nzz-download [OPTIONS] --from <FROM> --to <TO>
Options:
-f, --from <FROM> Earliest issue to download (like 1780-12-31) [env: FROM=]
-t, --to <TO> Latest issue to download (like 1780-12-31) [env: TO=]
-o, --output-dir <OUTPUT_DIR> Output directory [env: OUTPUT_DIR=] [default: ./nzz]
-h, --help Print help
-V, --version Print version
2020-12-23 21:27:26 +00:00
Provide the authentication cookie from stdin.
```
2020-12-23 21:27:26 +00:00
### Example
2024-07-02 06:38:16 +00:00
Login and use the resulting cookie to download all issues from 2024-06-01 until
2024-06-05 to the default directory "./nzz", reading the password from the file
`pw`:
2024-07-02 06:38:16 +00:00
```
nzz-cookie -u 'myuser@example.com' <pw | nzz-download -f 2024-06-01 -t 2024-06-05
```
2020-12-23 21:58:47 +00:00
2024-07-03 11:43:08 +00:00
## Caveats
There are no retries on a failed download so far, it just crashes. Stemming from
that I would advise not to try and download big ranges at once until that is
fixed.
## License
2024-07-02 06:38:16 +00:00
Licensed as [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html).