aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README80
1 files changed, 0 insertions, 80 deletions
diff --git a/README b/README
deleted file mode 100644
index afc8a26..0000000
--- a/README
+++ /dev/null
@@ -1,80 +0,0 @@
-
-## Dependencies
-
- * python-debian
-
- * python-nose (to run test.py)
- <https://nose.readthedocs.org/en/latest/>
-
- * python-pandas
- <http://pandas.pydata.org/>
-
- * pv
-
-## Overview
-
-Data is collected from various sources by the "load" scripts and converted to
-the Pandas library's "data frame" structure, which is somewhat similar to a
-SQL database except that there is no schema. Or to put it another way, it's
-like a sparse grid that has named fields along one axis and numbered rows on
-the other. This approach means that we can import data fairly direcetly from
-fairly messy sources and work out the details at export time.
-
-These data frames are saved into a pair of HDF (hierarchical data format)
-files, `pkg.h5` and `cp.h5`, which contain general package information and
-copyright/licensing information respectively.
-
-We generate Semantic MediaWiki pages from this data using one of a pair
-of export scripts. `export.py` exports the pages as a directory
-containing one file per page. `export_json.py` exports the list of pages
-as a single JSON file named index.json. This JSON file can be converted
-to a directory of wiki pages using the `json_to_wiki.py` script.
-
-To import and export all packages, do
-
-./doall.sh
-
-## Importing data from debian
-
-Loading data from package files:
-
- $ pv .../Packages python | python load_packages.py
-
-Packages files can be obtained from Debian mirrors, and are cached by APT in
-/var/lib/apt/lists.
-
-Loading package descriptions:
-
- $ pv .../Translation-en | python load_descriptions.py
-
-Loading data from copyright files:
-
- $ python load_copyright.py main/*/*/current/copyright | tee cp_import.log
-
-
-## Exporting data
-
-One package:
-
- $ python export.py pandoc
-
-All packages, as wiki pages:
-
- $ python export.py
-
-(Output is in "output" directory.)
-
-All packages, as JSON:
-
- $ python export_json.py
-
-JSON output can be converted to wiki pages:
-
- $ python json_to_wiki.py < packages.json
-
-(Output is in "converted" directory.)
-
-## Running the test suite
-
- $ python test.py
-