From 3db93bc6f7b46bc322694e6658b8f559433a03c6 Mon Sep 17 00:00:00 2001 From: Yuchen Pei Date: Thu, 19 May 2022 22:23:10 +1000 Subject: Replacing the files with a haskell rewrite. --- README | 80 ------------------------------------------------------------------ 1 file changed, 80 deletions(-) delete mode 100644 README (limited to 'README') diff --git a/README b/README deleted file mode 100644 index afc8a26..0000000 --- a/README +++ /dev/null @@ -1,80 +0,0 @@ - -## Dependencies - - * python-debian - - * python-nose (to run test.py) - - - * python-pandas - - - * pv - -## Overview - -Data is collected from various sources by the "load" scripts and converted to -the Pandas library's "data frame" structure, which is somewhat similar to a -SQL database except that there is no schema. Or to put it another way, it's -like a sparse grid that has named fields along one axis and numbered rows on -the other. This approach means that we can import data fairly direcetly from -fairly messy sources and work out the details at export time. - -These data frames are saved into a pair of HDF (hierarchical data format) -files, `pkg.h5` and `cp.h5`, which contain general package information and -copyright/licensing information respectively. - -We generate Semantic MediaWiki pages from this data using one of a pair -of export scripts. `export.py` exports the pages as a directory -containing one file per page. `export_json.py` exports the list of pages -as a single JSON file named index.json. This JSON file can be converted -to a directory of wiki pages using the `json_to_wiki.py` script. - -To import and export all packages, do - -./doall.sh - -## Importing data from debian - -Loading data from package files: - - $ pv .../Packages python | python load_packages.py - -Packages files can be obtained from Debian mirrors, and are cached by APT in -/var/lib/apt/lists. - -Loading package descriptions: - - $ pv .../Translation-en | python load_descriptions.py - -Loading data from copyright files: - - $ python load_copyright.py main/*/*/current/copyright | tee cp_import.log - - -## Exporting data - -One package: - - $ python export.py pandoc - -All packages, as wiki pages: - - $ python export.py - -(Output is in "output" directory.) - -All packages, as JSON: - - $ python export_json.py - -JSON output can be converted to wiki pages: - - $ python json_to_wiki.py < packages.json - -(Output is in "converted" directory.) - -## Running the test suite - - $ python test.py - -- cgit v1.2.3