GoDot2P: Python software for moving OER out of Google Docs,
preprocessing, and uploading to Pressbooks.
Jonathan A. Poritz
Sometimes we say in the open movement: "Share early, share
often." ...Well, this software is in a very preliminary state, but in
case anyone can get any value from it, be it anything from specific
code fragments to general strategy, I am sharing it right away.
I promise that I will frequently update the code here, and it will always
get cleaner and easier to use! More documentation and also general
description should also be posted here, to help folks who want to use
this software or just benefit from some of its particular approaches.
This software was developed during the course of work done for
Open Oregon Educational Resources, but the
rightsholder is nevertheless
Jonathan A. Poritz, who
is releasing it under a
GPLv3
free software license.
Here are all of the files, individually:
-
OOadd_glossary: puts in PB glossary
activation codes into PB HTML
-
OOcapitalize_figures: capitalizes
all "figures" in an HTML file
-
OOdownload: downloads PB html files from a
PB in the style of the OO PB OER
-
OOfig_finder: makes a CSV files with info
about all "Figures" in HTML file, also building a web pages showing
those figures
-
OOfind_img_tags: makes a report of all
<img> tags in an HTML, subject to certain selection criteria
-
OOfix_activities: fix "Activity" boxes
in html files so they will use PB textboxes when uploaded to PB
-
OOfix_in_focus: fix "In Focus" boxes
in html files so they will use PB textboxes when uploaded to PB
-
OOfix_learn_more: fix "Want to Learn
More?" boxes in html files so they will use PB textboxes when
uploaded to PB
-
OOfix_links: fix internal links in an html
file so they refer to the appropriate PB URLs
-
OOfix_refs: fix paragraphs in <h1>
Reference sections to having hanging indent in an html file
-
OOgloss_down: download PB glossary terms
making a glossary manifest file
-
OOgloss_up: uploads glossary terms to PB as
specified by a glossary manifest file
-
OOimg_list: generates images html file, as
well as corresponding docx file for uploading to PB
-
OOlink_finder: finks all links in an html
file and puts them, plus some context, into new html file
-
OOlist_links: list external links in html
file
-
OOmake_room: processes the HTML files from
an OO PB to move them all one or more steps higher in numbering of
sectins, so section X.Y will become X.(Y+s), where the shift s
defaults to 1
-
OOoutliner: makes an outline showing heading
structure in an html file
-
OOprep: does prepartory work on an HTML file
downloaded from GD
-
OOreup: upload a new version of PB html files
-
OOshow_colors: shows class names and
corresponding colors for GD HTML color-designating class from an
html file
-
OOsplit: process a prepared html file in
various ways to load into PB
-
OOupload: uploads html files to PB in a way
specified in a manifest file
-
wf: documents workflow using these files to get an
OER from GD into PB
-
wf.a11y: documents workflow to create a11y
report for a GD OER
If you want just to download one archive with all of those files in it,
you could use either of the following two choices:
-
GoDot2P.tar
— a tar archive, for Linux users
-
GoDot2P.zip
— a zip archive, for users of other operating systems
If you want to be able to run the software yourself, you will need a
machine with a fairly recent Python installation (actually
Python3), including the packages
-
argparse
-
bs4
-
code
-
csv
-
fileinput
-
os
-
random
-
re
-
requests
-
selenium.webdriver
-
sys
-
time
-
urllib.parse
-
uuid
-
warnings
— which are all quite standard.
tidy, originally from the W3C,
but now found at
the site of the HTML Tidy Advocacy
Community Group, is also used.