[adapted from Poster for drama performance of "Waiting for Godot" by Fewskulchor released under Creative Commons Attribution-Share Alike 3.0 Unported; this version by Jonathan Poritz replaces the original text with the name of this software project and other relevant information, and is released under a Creative Commons Attribution-ShareAlike 4.0 International license]

GoDot2P: Python software for moving OER out of Google Docs, preprocessing, and uploading to Pressbooks.

Jonathan A. Poritz

Sometimes we say in the open movement: "Share early, share often." ...Well, this software is in a very preliminary state, but in case anyone can get any value from it, be it anything from specific code fragments to general strategy, I am sharing it right away.

I promise that I will frequently update the code here, and it will always get cleaner and easier to use! More documentation and also general description should also be posted here, to help folks who want to use this software or just benefit from some of its particular approaches.

This software was developed during the course of work done for Open Oregon Educational Resources, but the rightsholder is nevertheless Jonathan A. Poritz, who is releasing it under a GPLv3 free software license. GPLv3 License

Here are all of the files, individually:

  1. OOadd_glossary: puts in PB glossary activation codes into PB HTML
  2. OOcapitalize_figures: capitalizes all "figures" in an HTML file
  3. OOdownload: downloads PB html files from a PB in the style of the OO PB OER
  4. OOfig_finder: makes a CSV files with info about all "Figures" in HTML file, also building a web pages showing those figures
  5. OOfind_img_tags: makes a report of all <img> tags in an HTML, subject to certain selection criteria
  6. OOfix_activities: fix "Activity" boxes in html files so they will use PB textboxes when uploaded to PB
  7. OOfix_in_focus: fix "In Focus" boxes in html files so they will use PB textboxes when uploaded to PB
  8. OOfix_learn_more: fix "Want to Learn More?" boxes in html files so they will use PB textboxes when uploaded to PB
  9. OOfix_links: fix internal links in an html file so they refer to the appropriate PB URLs
  10. OOfix_refs: fix paragraphs in <h1> Reference sections to having hanging indent in an html file
  11. OOgloss_down: download PB glossary terms making a glossary manifest file
  12. OOgloss_up: uploads glossary terms to PB as specified by a glossary manifest file
  13. OOimg_list: generates images html file, as well as corresponding docx file for uploading to PB
  14. OOlink_finder: finks all links in an html file and puts them, plus some context, into new html file
  15. OOlist_links: list external links in html file
  16. OOmake_room: processes the HTML files from an OO PB to move them all one or more steps higher in numbering of sectins, so section X.Y will become X.(Y+s), where the shift s defaults to 1
  17. OOoutliner: makes an outline showing heading structure in an html file
  18. OOprep: does prepartory work on an HTML file downloaded from GD
  19. OOreup: upload a new version of PB html files
  20. OOshow_colors: shows class names and corresponding colors for GD HTML color-designating class from an html file
  21. OOsplit: process a prepared html file in various ways to load into PB
  22. OOupload: uploads html files to PB in a way specified in a manifest file
  23. wf: documents workflow using these files to get an OER from GD into PB
  24. wf.a11y: documents workflow to create a11y report for a GD OER

If you want just to download one archive with all of those files in it, you could use either of the following two choices:

If you want to be able to run the software yourself, you will need a machine with a fairly recent Python installation (actually Python3), including the packages

  1. argparse
  2. bs4
  3. code
  4. csv
  5. fileinput
  6. os
  7. random
  8. re
  9. requests
  10. selenium.webdriver
  11. sys
  12. time
  13. urllib.parse
  14. uuid
  15. warnings
— which are all quite standard.

tidy, originally from the W3C, but now found at the site of the HTML Tidy Advocacy Community Group, is also used.