diff options
Diffstat (limited to 'README')
-rw-r--r-- | README | 80 |
1 files changed, 0 insertions, 80 deletions
@@ -1,80 +0,0 @@ - -## Dependencies - - * python-debian - - * python-nose (to run test.py) - <https://nose.readthedocs.org/en/latest/> - - * python-pandas - <http://pandas.pydata.org/> - - * pv - -## Overview - -Data is collected from various sources by the "load" scripts and converted to -the Pandas library's "data frame" structure, which is somewhat similar to a -SQL database except that there is no schema. Or to put it another way, it's -like a sparse grid that has named fields along one axis and numbered rows on -the other. This approach means that we can import data fairly direcetly from -fairly messy sources and work out the details at export time. - -These data frames are saved into a pair of HDF (hierarchical data format) -files, `pkg.h5` and `cp.h5`, which contain general package information and -copyright/licensing information respectively. - -We generate Semantic MediaWiki pages from this data using one of a pair -of export scripts. `export.py` exports the pages as a directory -containing one file per page. `export_json.py` exports the list of pages -as a single JSON file named index.json. This JSON file can be converted -to a directory of wiki pages using the `json_to_wiki.py` script. - -To import and export all packages, do - -./doall.sh - -## Importing data from debian - -Loading data from package files: - - $ pv .../Packages python | python load_packages.py - -Packages files can be obtained from Debian mirrors, and are cached by APT in -/var/lib/apt/lists. - -Loading package descriptions: - - $ pv .../Translation-en | python load_descriptions.py - -Loading data from copyright files: - - $ python load_copyright.py main/*/*/current/copyright | tee cp_import.log - - -## Exporting data - -One package: - - $ python export.py pandoc - -All packages, as wiki pages: - - $ python export.py - -(Output is in "output" directory.) - -All packages, as JSON: - - $ python export_json.py - -JSON output can be converted to wiki pages: - - $ python json_to_wiki.py < packages.json - -(Output is in "converted" directory.) - -## Running the test suite - - $ python test.py - |