nzz-downloader/README.md

75 lines
2.7 KiB
Markdown
Raw Normal View History

2020-12-23 21:21:47 +00:00
# NZZ Downloader
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
The [NZZ](https://en.wikipedia.org/wiki/Neue_Z%C3%BCrcher_Zeitung) is the Swiss
2024-07-02 06:38:16 +00:00
Swiss newspaper of record. Its first issue was all the way back in 1780. It's
even better that you can download every single issue ever released (if you have
a subscription of course).
2020-12-23 21:21:47 +00:00
This little tool helps you with downloading all released issues in a specified
time span.
It was written because the archive website is not very friendly in the author's
2024-07-02 06:38:16 +00:00
opinion and of course because it is not possible to download everything in a
time span.
2020-12-23 21:21:47 +00:00
Because the archive website makes heavy use of javascript this is done with
2024-07-02 06:38:16 +00:00
[selenium](https://www.selenium.dev/) to remote control a browser (firefox in
2020-12-23 21:21:47 +00:00
this case). This is also why it is not all that fast but that is ok.
Please only use this with your own credentials, the journalists deserve to be
paid for their work.
2020-12-23 22:07:53 +00:00
![screenshot](screenshot.jpg)
2020-12-23 21:21:47 +00:00
## Installation
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
You need to be comfortable with the command line to use the nzz downloader and
it has only been tested on linux systems though it should work fine on Windows
or macOS.
- [NodeJS](https://nodejs.org/en/download/) (the LTS version is fine)
- [Firefox](https://www.mozilla.org/en-US/firefox/download/thanks/)
- [geckodriver](https://github.com/mozilla/geckodriver/releases)
2020-12-23 21:27:26 +00:00
- [nzz.js](https://code.vanwa.ch/sebastian/nzz-downloader/-/releases)
2020-12-23 21:21:47 +00:00
## Usage
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
```
Usage: nzz.js -f [date] -t [date] -o [path] -u [usernane] -p [password]
Options:
--version Show version number [boolean]
-h, --help Show help [boolean]
-f, --from Earliest issue to download. [default: "2020-12-23"]
-t, --to Latest issue to download. [default: "2020-12-23"]
-o, --out Download directory. [default: "./nzz"]
-u, --user Username for the nzz archive. [required]
-p, --password Password for the user. [required]
```
2024-07-02 06:38:16 +00:00
### Examples
Download all existing issues from 1780-01-01 until 1780-02-30 to the default
2020-12-23 21:21:47 +00:00
directory "./nzz"
2024-07-02 06:38:16 +00:00
2020-12-23 21:21:47 +00:00
```
./nzz.js -u 'myuser@example.com' -p 'mypassword' -f 1780-01-01 -t 1780-02-30
```
2020-12-23 21:27:26 +00:00
## Caveats
2024-07-02 06:38:16 +00:00
You need a good internet connection, as the program only waits a couple seconds
until a download of an issue can start. This is something that is hard to solve
unfortunately.
If you get strange errors about elements not being visible, wait a bit and try
again, it's usually a network problem.
The proper way of doing this would be to figure out how the calls to the backend
work and do that instead of using the heavy handed approach of instrumenting a
browser.
2020-12-23 21:58:47 +00:00
2020-12-23 21:27:26 +00:00
## Licence
2024-07-02 06:38:16 +00:00
2020-12-23 22:09:26 +00:00
Licensed as [MPL 2.0](https://www.mozilla.org/en-US/MPL/2.0/).