% this next line updated by emacs every time the file is saved:
\def\tsval{Time-stamp: < 10 Oct 2022 03:38CEST >}
\documentclass[pdf,9pt,hyperref={colorlinks=true,urlcolor=blue}]{beamer}
\usepackage{amsmath, amsfonts, multirow, mwe, textcomp, tabularx, contour, ulem}
\usepackage{lmodern,datetime,amssymb,graphicx,colortbl,amscd,mathrsfs,xcolor,ccicons}
%macros for dealing with the emacs timestamp
\def\tsgutsb#1{\expandafter\tsgutsa #1\relax}
\def\tsgutsa#1 #2 #3 #4 #5 #6 #7\relax{#3 #4 #5 #6}
\def\timestamp{\tsgutsb{\tsval}}
\newcommand{\fulltitle}{{How many OER are there?}}
\newcommand{\runningtitle}{{How many OER are there?}}
\newcommand{\shorttitle}{{howmanyOER}}
\newcommand{\presentationdate}{{20 October 2022}}
\newcommand{\venue}{{Open Education Conference 2022}}
\newcommand{\footerdatevenue}{{20 Oct 2022, Open Ed 2022}}
\usetheme{CambridgeUS}
\definecolor{niceyellow}{HTML}{F2D920}
\definecolor{nicerust}{HTML}{F36352}
\definecolor{nicegreen}{HTML}{0F543D}
\setbeamercolor{palette primary}{fg=nicerust,bg=niceyellow}
\setbeamercolor{palette secondary}{fg=white,bg=nicegreen}
\setbeamercolor{palette tertiary}{bg=niceyellow,fg=black}
\setbeamercolor{frametitle}{fg=nicerust}
\setbeamercolor{title}{fg=nicerust}
\newcommand{\black}[1]{{\color{black}#1}}
\newcommand{\blue}[1]{{\color{blue}#1}}
\newcommand{\red}[1]{{\color{red}#1}}
\newcommand{\gray}[1]{{\color{gray}#1}}
\newcommand{\fns}{\footnotesize}
\newcommand{\scrs}{\scriptsize}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcommand{\Ee}{{\mathcal E}}
\newcommand{\Mm}{{\mathcal M}}
\newcommand{\NN}{{\mathbb N}}
\newcommand{\QQ}{{\mathbb Q}}
\newcommand{\ZZ}{{\mathbb Z}}
\renewcommand{\ULdepth}{1.8pt}
\newcommand{\XXX}{\hphantom{XXX}}
\newcommand{\Kk}{\mathcal K}
\contourlength{0.8pt}
\newcommand{\myuline}[1]{%
\uline{\phantom{#1}}%
\llap{\contour{white}{#1}}%
}
\begin{document}
\title[\runningtitle]{\fulltitle}
\author[\raisebox{-0.3\height}{\includegraphics[height=2.25mm]{CCcircle.pdf}}\raisebox{-0.3\height}{\includegraphics[height=2.25mm]{BYcircle.pdf}}\raisebox{-0.3\height}{\includegraphics[height=2.25mm]{SAcircle.pdf}}\ \ \href{https://poritz.net/j/share/\shorttitle}{https://poritz.net/j/share/\shorttitle}\ \ \ ]{Jonathan A.~Poritz}
\institute[{}]{\ \\
\begin{tabular}{cc}
\ &
\multirow{6}{*}{\raisebox{1.5\height}{\includegraphics[height=1.5cm]{poritz_net_jonathan_qr-code_colorful.pdf}}}\\
\ \\
\href{mailto:jonathan@poritz.net}{jonathan@poritz.net}&\\
{\tt\tiny \href{https://www.poritz.net/jonathan}{poritz.net/jonathan}}&\\
\ \\
\ \\
\end{tabular}\\
\ \\
\ \\
\ \\
\ \\
\presentationdate\\
\venue
}
\date[\footerdatevenue]{\raisebox{-1\height}{\includegraphics[height=5mm]{CC-BY-SA.pdf}}\ \,{\tiny This slide deck, except where otherwise indicated, is by \href{https://poritz.net/jonathan}{Jonathan Poritz} and is released under a\\[-4mm]\hskip25.5mm\href{http://creativecommons.org/licenses/by-sa/4.0/}{Creative Commons Attribution-ShareAlike 4.0 International License}. This version: \timestamp.\\[.5mm]\hskip24.5mm These slides, also in editable form and accompanied by the data and code used to make the graphs herein,\\[-1.25mm]\hskip-5.75mmare available at \href{https://poritz.net/jonathan/share/\shorttitle/}{https://poritz.net/jonathan/share/\shorttitle/}\ .}}
\begin{frame}
\titlepage
\end{frame}
\begin{frame}{Intro: Land acknowledgement}
\begin{center}
\begin{tabular}{p{9cm}}
Before we begin, I must acknowledge that I began this work while I was
living in the unceded territory of the Ute Peoples. The
earliest documented people in the area also include the Apache, Arapaho,
Comanche, and Cheyenne. An extended list of tribes with a legacy of
occupation there can be found on the
\href{https://www.coloradocollege.edu/other/indigenous-community/Colorado\%20Tribal\%20Acknowledgement\%20List.pdf}{Colorado
Tribal Acknowledgement List}.
\vskip3mm
I am grateful for the chance to have lived and worked in that beautiful
place and will always cherish that memory, even though I am no longer a
resident there.\footnote{\tiny Where I live now there is no tradition of
land acknowledgements of which I am aware.}
\end{tabular}
\end{center}
\end{frame}
\begin{frame}{The question, and what to do about it}
A while ago, I wondered
\vfill
\begin{center}
\textbf{\textit{\red{How many OER are there?}}}
\end{center}
\vfill
{\tiny[Hence the title of this talk.]}
\vfill
In this presentation, I will tell you about how I tried
\vskip2mm
\begin{itemize}
\item[$\bullet$] to decide what kind of answer I would be happy with,
\item[$\bullet$] to make the question a bit more precise,
\item[$\bullet$] to go about getting that answer, and
\item[$\bullet$] to understand what answer I was actually able to get.
\end{itemize}
\end{frame}
\begin{frame}{What kind of answer do I want?}
Whenever I see a single statistic, I feel like it is begging for some
\textit{context}.
\vfill
I also did not specify, when asking the question, a particular date and time.
Since the number of OER is probably constantly changing -- \textit{growing},
one imagines -- the best thing might be to give an answer for all possible
times one might specify.
\vfill
In fact, let's put all these numbers together in a
\textit{graph} and say that
\vfill
\begin{center}
\textbf{\textit{\red{{I want a graph of how many OER there have
been, over time.}}}}
\end{center}
\vfill
In fact, this is closer to what I was originally wondering: I wanted to
write a sentence about the well-known (presumably) trend of growth --
exponential, maybe? something like that -- of the body of existing OER.
But I couldn't find any prior results on the topic!\footnote{\tiny I did
ask smart people! \textit{E.g.,} Nicole Allen responded, and she suggested
that this wasn't actually the right question, that a better question would
be \hphantom{XXx.} about \textit{engagement} with OER. I absolutely agree!
But my less important question is still of some interest to a data geek
like me.} Hence this project....
\end{frame}
\begin{frame}{Making the question precise: What are those ``OER?''}
What are the things I want to count (repeatedly, at different times, to make
a graph)?
\vfill
They are ``Open Educational Resources [OER].''
\vfill
Fortunately, the UNESCO \href{http://portal.unesco.org/en/ev.php-URL_ID=49556&URL_DO=DO_TOPIC&URL_SECTION=201.html}{OER Recommendation} can be taken as
canonical:
\begin{center}
\begin{tabular}{p{10cm}}
\ \vskip-3.5mm
\hskip-6.4mm\textit{``1.\hskip2mmOpen Educational Resources (OER) are learning, teaching and research materials in any format and medium that reside in the public domain or are under copyright that have been released under an open license, that permit no-cost access, re-use, re-purpose, adaptation and redistribution by others.}\\\ \\
\hskip-4.75mm\textit{2.\hskip2mmOpen license refers to a license that respects the intellectual property rights of the copyright owner and provides permissions granting the public the rights to access, re-use, repurpose, adapt and redistribute educational materials.''}\\
\end{tabular}
\end{center}
\end{frame}
\begin{frame}{Making the question precise: UNESCO and \textit{licenses}}
Mapping the UNESCO definition to the
\href{https://creativecommons.org}{Creative Commons} [CC] licenses and public
domain tools (the most common approaches for OER), we get:
\ \vskip-8mm\
\begin{center}
\includegraphics[height=5cm]{spectrum.pdf}
\end{center}
\ \vskip-.8cm\
In particular, then,
\begin{center}
\textbf{\textit{\red{{The OER I want to count must all bear a CC PDM or CC0\\
tool or a CC BY, BY-SA, BY-NC, or BY-NC-SA license.}}}}
\end{center}
\end{frame}
\begin{frame}{Caveat: There could be other licenses or copyright statuses.}
There are other licenses which might meet the criteria expressed in the UNESCO
definition of OER! The \href{https://open.umn.edu/opentextbooks}{Open
Textbook Library [OTL]} from the \href{https://open.umn.edu/oen/}{Open Education
Network [OEN]} has seventeen works which bear the
\href{https://www.gnu.org/licenses/fdl-1.3.html}{GNU Free Documentation
License}, which seems to meet the UNESCO OER definition.
\vfill
Other works might have fallen into the global public domain but not bear the
CC PDM, simply because no competent authority had bothered to put one on a
commonly accessed version of the work. These should nevertheless counted
amount OER.
\vfill
Finally, the Creative Commons does not recommend that its licenses be used for
\textit{software}, saying there are many others which are better adapted to the
particular needs of code: see, \textit{e.g.},
\href{https://opensource.org/licenses}{a list of approved open-source licenses}
from the \href{https://opensource.org/}{Open Source Initiative}. Since more
and more OER -- even ones which are basically ``textbooks'' -- may
incorporate (as interactive elements) or be nearly entirely (as
in \href{https://jupyter.org/}{Jupyter Notebooks} or similar) \textit{code},
the open education community probably needs to stop using slides like the one
on the last slide\footnote{\tiny Yes, I am criticizing myself!} which portray
OER as \textbf{\textit{necessarily}} carrying one of those CC licenses/statuses.
\vskip1mm
\end{frame}
\begin{frame}{Making the question precise: UNESCO and \textit{materials}}
UNESCO says that OER are are \textit{``...learning, teaching and research
\textbf{materials}''} [emphasis added].
\vfill
These could be classroom handouts, test banks, individual diagrams, videos,
pieces of software, \textit{etc.}
\vfill
Amorphous materials like that are hard to count, except perhaps as pages or
megabytes, which I will not do.
\vfill
One can, presumably, count \textit{books}, though. So
\vfill
\begin{center}
\textbf{\textit{\red{{The OER I want to count should all be
``textbooks.''}}}}
\end{center}
\vfill
There is certainly a tradition of doing this in the open education space.
\textit{E.g.}, before the
\href{https://open.umn.edu/oen/}{OEN} broadened its scope to all of Open
Education, it started out as the \textbf{Open Textbook Network [OTN]}! We may
eventually count more significant things like engagement, but shouldn't we
at least start by counting textbooks?
\end{frame}
\begin{frame}{Caveat: What exactly is a ``textbook''?}
There has been some interesting discussion in recent years about ``\href{https://hughmcguire.medium.com/what-is-a-book-in-the-age-of-the-web-part-1-of-5-3a529701e0df}{what is a `book' in the age of the web?},'' what will be ``the textbook
of the future''\footnote{\tiny Just use that phrase with your favorite search
engine!}, and even if textbooks are the best tools for learning, even for
courses that have traditionally used them\footnote{\tiny See, \textit{e.g.,} \href{https://poritz.net/j/share/\#JITERsOct19}{my talk at the last in-person Open Ed conference, on ``Just-In-Time Educational Resources''}.}.
\vfill
But surely at least ``traditional textbooks'' have a clear definition?
\vfill
A number of organizations who run open educational professional development
programs or who otherwise support the movement, including
\href{https://www.rebus.community/}{the Rebus Community} and the
\href{https://open.umn.edu/oen/}{OEN}, often speak about textbooks as being in
some tension with books that might be called ``academic monographs'': textbooks
have to have some fairly fixed structure and common features like chapter
openers and closers, pedagogical elements, exercises or discussion prompts,
learning outcomes, \textit{etc.}
\vfill
In my own education, I took great courses which used academic monographs (or
novels or other forms of books). So I will make a very informal definition of
``textbook'' as anything which its author or some collector or cataloger has
called a textbook. Many ``monographs'' that might be used for education will
therefore be accepted in my OER counting project, for example.
\end{frame}
\begin{frame}{Making the question precise: What will time be in the graph?}
If I am going to count textbooks with certain legal statuses over time
to make a graph, I need to know what is meant by ``time'' in this project.
\vfill
My original motivation was to make that time plot of the number of OER, with
an idea to showing how many OER ``have been available to the community'' over
time. So I really want to know when these textbooks have been made public.
\vfill
Often this \textit{publication date} will be the same thing as the work's
\textit{copyright year} -- copyright springs into existence in the US when
a work is ``fixed in a tangible medium of expression''\footnote{\tiny and in most other countries when the work merely created, even without fixation.} --
and I suspect that most folks who go to the trouble of creating or adapting a
OER do so with the intention of using it, so they make it public just about
as soon as it is created [fixed, in the US].
\vfill
Therefore,
\vfill
\begin{center}
\textbf{\textit{\red{{``Time'' in my graph of the number of OER will be the
publication date or\\
copyright year, whichever is possible to determine for each particular
OER.}}}}
\end{center}
\end{frame}
\begin{frame}{Making the question precise: When should I count OER as different?}
To count things, you must know when two to be considered as the same or
different, in order to know when to increase the count.
\vfill
Since a foundational value, and commonly followed practice, in open education
is to \textit{adapt} existing works, similar OER abound!
\vfill
Fortunately, this is a problem which has already been solved in copyright law,
where often one must ask if two works are enough ``the same'' for one to count
as a copy of the other.
\vfill
That means
\vfill
\begin{center}
\textbf{\textit{\red{{I will count books which copyright law would consider
all different.}}}}
\end{center}
\vfill
In particular, this means that a printout of an ebook is not a new book, nor is
an electronic version which is in a different file format from the original,
nor is a version which fixes a few typos or changes a font.
\vfill
A translation of a book, however, will almost always be considered a new work,
as will essentially any new infusion of original authorship.
\end{frame}
\begin{frame}{A test case: the OEN's OTL}
How about using a limited but high quality dataset to see if the approach
described above makes sense.
In particular, the \href{https://open.umn.edu/oen/}{OEN}'s
\href{https://open.umn.edu/opentextbooks}{OTL} definitely consists of textbooks,
and the \href{https://open.umn.edu/oen/}{OEN} shares a catalog CSV which tells
the works' licenses and copyright years. Removing entries for books which do
not have the correct license or copyright status, extracting the copyright
years, and making the resulting graph, gives this:
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{13}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OTL_scatterplot.pdf}}} & \\[1mm]
& \ \\[1mm]
& for this:\\[1mm]
& The
\href{https://open.umn.edu/opentextbooks/download.csv}{OTL's catalog CSV}
was downloaded to a local copy
\href{https://poritz.net/j/share/howmanyOER/OTL.csv}{OTL.csv}.\\[1mm]
& Rows corresponding to items with incorrect licenses were\\
& removed, the column of copyright years was extracted and\\
& sorted into a file
\href{https://poritz.net/j/share/howmanyOER/otl_copyright_years}{otl\_copyright\_years}.\\[1mm]
& The graph was produced using the command\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d otl\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the OEN's Open Textbook Library" -s 1985}\\
& \hphantom{XXXX}{\tt -t 1200 --endyear 2025}\\
& (all on one line)\\[1mm]
& This used a(n openly licensed (of course)) Python script \href{https://poritz.net/j/share/howmanyOER/cyears_graph.py}{cyears\_graph.py}
\end{tabular}
\end{center}
}
\vfill
\end{frame}
\begin{frame}{Exponential fitting to the OTL graph}
I was hoping, you may recall, that the graph of how many OER there are would
show something like an exponential growth, and that scatterplot does really
seem to take off.
\vskip3mm
Unfortunately, the best exponential fit is not very good:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{6}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OTL_exp_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d otl\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the OEN's Open Textbook Library" -s 1985}\\
& \hphantom{XXXX}{\tt -t 1200 -e '(1987,2024):(1993,250)'}\\
& \hphantom{XXXX}{\tt --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vfill
\
\end{frame}
\begin{frame}{Piecewise linear fitting to the OTL graph}
Looking at the original scatterplot, it in fact seems as if there are two
quite linear regimes:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{7}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OTL_lin_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d otl\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the OEN's Open Textbook Library" -s 1985}\\
& \hphantom{XXXX}{\tt -t 1200 -l '(1987,2008):(1987,50)'}\\
& \hphantom{XXXX}{\tt -l '(2013,2022):(2001,550)'}\\
& \hphantom{XXXX}{\tt --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vskip2.3cm
Trust a data analyst: linearity is very rare in nature. I would guess that
during the whole life of the OTL, there have been too many new OER to be
processed. Instead, in each of the two different linear regimes in this
graph, there were two different systems or groups of personnel who had a fixed
rate (different between the two regimes) of ingesting a certain number of new
OER per day, and they just always operated at capacity.
\end{frame}
\begin{frame}{Another test case: the {B.C.} Open Textbook Collection}
Another limited but high quality dataset is provided by the
\href{https://open.bccampus.ca/}{{B.C.} Open Textbook Collection} from
\href{https://bccampus.ca/}{BCcampus}, consisting of resources which all have
good OER licenses and which again are all clearly textbooks.
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{15}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{BC_scatterplot.pdf}}} & \\[1mm]
& \ \\[1mm]
& for this:\\[1mm]
& The
``Full Set (.mrc)'' of MARC records was downloaded from the page
\href{https://bceln.ca/services-initiatives-resource-sharing-bc-open-textbook-marc}{BC Open Textbook MARC Records} to a local copy
\href{https://poritz.net/j/share/howmanyOER/BCopentextbooks_RDA_fullset_Q2_June30_2022.mrc}{BCopentextbooks\_RDA\_fullset\_Q2\_June30\_2022.mrc}.\\
& \ \ \\
& Using a tool \href{https://pypi.org/project/marc2excel/}{marc2excel}, this was converted to the file\\
& \href{https://poritz.net/j/share/howmanyOER/BCopentextbooks_RDA_fullset_Q2_June30_2022.xlsx}{BCopentextbooks\_RDA\_fullset\_Q2\_June30\_2022.xlsx}\\
& whose column of copyright dates was extracted and sorted into a file
\href{https://poritz.net/j/share/howmanyOER/BCcampus_copyright_years}{BCcampus\_copyright\_years}.\\[1mm]
& \ \ \\
& The graph was produced using the command\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d BCcampus\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the BC Open Textbook Collection" -s 1985}\\
& \hphantom{XXXX}{\tt -t 500 --endyear 2025}\\
& \ \ \\
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\end{frame}
\begin{frame}{Exponential fitting to the BCcampus graph}
Here again, the best exponential fit is not very good:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{6}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{BC_exp_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d BCcampus\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the BC Open Textbook Collection" -s 1985}\\
& \hphantom{XXXX}{\tt -t 500 -e '(2005,2022):(1993,100)'}\\
& \hphantom{XXXX}{\tt --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vfill
\
\end{frame}
\begin{frame}{Piecewise linear fitting to the BCcampus graph}
Once again, looking at the original \href{https://bccampus.ca/}{BCcampus}
scatterplot, it in fact seems as if there are two quite linear regimes:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{7}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{BC_lin_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d BCcampus\_copyright\_years -r}\\
& \hphantom{XXXX}{\tt "the BC Open Textbook Collection" -s 1985}\\
& \hphantom{XXXX}{\tt -t 500 -l '(2012,2022):(2000,200)'}\\
& \hphantom{XXXX}{\tt -l '(2006,2012):(1993,25)'}\\
& \hphantom{XXXX}{\tt --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vskip2.3cm
The same data analyst's hypothesis about what is causing the ``hockey-stick''
shape of this graph apply here as in the OTL case.
\end{frame}
\begin{frame}{Another test case: OpenStax}
Another limited but high quality dataset is provided by the textbooks from
\href{https://openstax.org/}{OpenStax}. While these are all clearly textbooks,
and there are few enough that it is easy to look at their descriptions one by
one to find the relevant information, \href{https://openstax.org/}{OpenStax}
reporting of dates is a bit odd. For each of their books, they give both a
``publication date,'' often a few years ago, and a ``copyright date,'' often in
the last year or so. Why these are different is unclear to me.
For the reasons described above, I used the publication date,
\vskip2mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{10}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OpenStax_scatterplot.pdf}}} & \\[1mm]
& \ \\[1mm]
& publication years extracted from each book's description on\\
& \href{https://openstax.org/}{OpenStax's website} and sorted into a file
\href{https://poritz.net/j/share/howmanyOER/OpenStax_pyears}{OpenStax\_pyears}.\\[1mm]
& then command used was:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d OpenStax\_pyears -r}\\
& \hphantom{XXXX}{\tt "the OpenStax catalog" -s 1985 -t 80}\\
& \hphantom{XXXX}{\tt -v "published" --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vfill
\end{frame}
\begin{frame}{Exponential fitting to the OpenStax graph}
Here again, the best exponential fit is not very good:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{6}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OpenStax_exp_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d OpenStax\_pyears -r}\\
& \hphantom{XXXX}{\tt "the BC Open Textbook Collection" -s 1985}\\
& \hphantom{XXXX}{\tt -t 80 -e '(2012,2022):(1993,10)'}\\
& \hphantom{XXXX}{\tt --endyear 2025 -t 80 -v "published"}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vfill
\
\end{frame}
\begin{frame}{Piecewise linear fitting to the OpenStax graph}
Once again, looking at the original \href{https://openstax.org/}{OpenStax}
scatterplot, it in fact seems as if there are two quite linear regimes:
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{7}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{OpenStax_lin_reg.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d OpenStax\_pyears -r}\\
& \hphantom{XXXX}{\tt "the OpenStax catalog" -s 1985}\\
& \hphantom{XXXX}{\tt -t 80 -l '(2012,2015):(1995,5)'}\\
& \hphantom{XXXX}{\tt -l '(2015,2022):(2000,40)'}\\
& \hphantom{XXXX}{\tt -v "published" --endyear 2025}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vskip2.3cm
The same data analyst's hypothesis about what is causing the ``hockey-stick''
shape of this graph apply here as in the OTL case.
\end{frame}
\begin{frame}{Another test case: The Directory of Open Access Books}
\ \vskip-8mm
The \href{https://doabooks.org/}{Directory of Open Access Books [DOAB]} is a
large and wonderful site containing OA books (and some other resources, which
I will filter out) from a number of sources.
\vskip1mm
For reasons given above, I will count DOAB books, when licensed correctly, as
OER, even though many look like academic monographs rather than textbooks.
\vskip1mm
The DOAB catalog, available from their page
\href{https://doabooks.org/en/resources/metadata-harvesting-and-content-dissemination}{Metadata for Libraries and Aggregators} as the file
\href{https://poritz.net/j/share/howmanyOER/repository-export.csv}{repository-export.csv} gives good information on the date the works were ``issued'' -- which
I will take as the publication date -- and the licenses, from which we can
filter for the UNESCO OER definition-compatible ones. I did this in a Python
script \href{https://poritz.net/j/share/howmanyOER/doab.py}{doab.py}.
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{9}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{DOAB_scatterplot.pdf}}} & \\[1mm]
& \ \\[1mm]
& output of \href{https://poritz.net/j/share/howmanyOER/doab.py}{doab.py} script put in file \href{https://poritz.net/j/share/howmanyOER/DOAB_iyears}{DOAB\_iyears}\\[1mm]
& then command used was:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d DOAB\_iyears -r}\\
& \hphantom{XXXX}{\tt "the DOAB" -s 1985 -v "issued"}\\
& \hphantom{XXXX}{\tt --endyear 2025 -p "Potential OER "}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\ \vfill\ \vfill\
\end{frame}
\begin{frame}{Combined linear and Exponential fitting to the DOAB graph}
The DOAB seems to have a regime with good linear fit and then one with good
exponential fit, that's fun!
\vskip3mm
{\tiny
\begin{center}
\begin{tabular}{r p{5cm}}
\multirow{7}{*}{\raisebox{-0.3\height}{\includegraphics[height=4.5cm]{DOAB_regressions.pdf}}} & \\[1mm]
& \ \\[1mm]
& command used here:\\[1mm]
& \hphantom{XX}{\tt ./cyears\_graph.py -d DOAB\_iyears -r}\\
& \hphantom{XXXX}{\tt "the DOAB" -s 1985 -v "issued"}\\
& \hphantom{XXXX}{\tt -l '(1985,2005):(1987,2500)'}\\
& \hphantom{XXXX}{\tt -e '(2006,2022):(1999,15000)'}\\
& \hphantom{XXXX}{\tt --endyear 2025 -p "Potential OER "}\\[1mm]
& (all on one line)\\[1mm]
\end{tabular}
\end{center}
}
\vfill
\
\end{frame}
\begin{frame}{Let's go!}
OK, it's pretty obvious how to wrap up the whole thing:
\begin{itemize}
\item[\textbf{1.}] Put together a list of all existing OER.
\item[\textbf{2.}] Remove from the list all items that are not ``textbooks.''
\item[\textbf{3.}] Remove from the list all items that do not have the Creative Commons
licenses or copyright statuses we are permitting.
\item[\textbf{4.}] Make another list, consisting of all of the publication or copyright
years of each of the items remaining on the first list.
\item[\textbf{5.}] Sort the list of dates, count how many dates are in each past year, and
make the corresponding graph.
\end{itemize}
\end{frame}
\begin{frame}{Oh, snap.}
I think you've seen the flaws in my approach.
\vfill
How are we going to get ``a list of all existing OER?''
\vfill
Any approach which simply crawls aggregates many known OER repositories will
both miss enormous numbers of useful OER and also catch some OER multiple times.
\vfill
(In principle removing duplicates can be done by hand or with code, but it will
be hard to know that ``Calculus, Second Edition'' and ``Calculus 2e'' are the
same thing, and checking whether enough new creativity has been added to make
a different version of a preexisting book an adaptation and not merely a copy
will be \textbf{very hard}.)
\vfill
I will try to continue aggregating repository catalogs and doing my best to
make an exhaustive list without duplicates, but this is an huge task that
will not have good results particularly soon ... and will probably never be
completely finished. Of course, any partial answer will still give a lower
bound on the number of OER that exist, so that is some good information.
\end{frame}
\begin{frame}{What \textbf{have} we learned? (specifically)}
I think that the data analyst's perspectives on the graphs above lead us
to conclude that, essentially, the body of available OER is growing
exponentially with a doubling time of about 3.8 years.
\vfill
Or, at least, it wants to grow exponentially, at the moment\footnote{\tiny Data analysts also will say that exponential growth is common in nature ... but only for short periods of time, before the environment gets saturated.}. It seems
that in many particular organizations, though, the available capacity is
limiting the growth to be linear, with growth often on the order of one new
OER shared every few days.
\vfill
This suggests that, in the short term, adding capacity to support groups
polishing and sharing OER will result in greater output and greater numbers of
OER available to the community, almost without bound, at present. At some
point we will will hit the end of that exponential growth, but that will
likely be in the saturated environment where just about everything is already
and always OER -- that's a world I'd be happy to live in, even if the growth
curve then levels off to something linear!
\end{frame}
\begin{frame}{What \textbf{have} we learned? (more generally)}
OER are spread out all over the Internet, so it is \textit{very hard} to do
research on them. Of course, we already knew this. It's also not
necessarily a bad thing that they are so spread out -- a prominent figure in
the OER world said a few years ago that they wanted their site to be the
``Facebook of OER.'' (This was before Facebook's recent loses of market
share.) I'm not really happy with that vision, TBH, even though it would
make this current research project much easier.
\vfill
Many OER folks aren't very careful to publish clear metadata, with licenses
and copyright/publication dates clearly shown. This has the potential to be
a big problem in the future, and certainly makes the 5Rs more difficult in an
entirely unnecessary way! So: Please mind your metadata, OER
folks!
\vfill
Clear metadata that is easily findable (in standard places and in standard
formats) will enable research like this project to work by simply crawling the
web and harvesting this metadata. Since we know from the whole history of the
Internet that crawling the web and looking for the things we want works much
better than trying to have curated lists of ``good stuff'' on the 'net, probably
that approach will work better also in the OER space, if only there is good
metadata. So I encourage my wonderful librarian colleagues in open education
not to try to make catalogs and all-encompassing repositories, but rather to
concentrate on helping the community make good metadata easy and the norm.
\end{frame}
\begin{frame}{Thanks}
I'd like to thank the following folks for sharing with me their data and,
more importantly, their insights:
\vfill
\begin{itemize}
\item[$\bullet$] Lauri Aesoph
\item[$\bullet$] Nicole Allen
\item[$\bullet$] Amanda Coolidge
\item[$\bullet$] David Ernst
\item[$\bullet$] Josie Gray
\item[$\bullet$] Delmar Larsen
\item[$\bullet$] Karen Lauritsen
\item[$\bullet$] Ethan Turner
\item[$\bullet$] Steel Wagstaff
\end{itemize}
\vfill
[In no order other than the arbitrary one determined by the alphabet and their
last name.\hskip-1mm\footnote{\tiny Sorry, Steel.}]
\vfill
\textbf{\red{Thank you all so much!}}
\end{frame}
\begin{frame}{Discussion and contact info}
\centerline{\LARGE\bf Discussion!!}
\vskip2mm
{\small Contact info:
\vskip2mm
Email:
\href{mailto:jonathan@poritz.net}{jonathan@poritz.net}\ ;
Tweety-bird: \href{https://twitter.com/poritzj}{@poritzj}\ .
\vskip1mm
Get these slides at
\href{https://poritz.net/j/share/\shorttitle.pdf}{poritz.net/j/share/\shorttitle.pdf}
and all files for remixing\footnote{\tiny subject to \href{http://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA 4.0}} at
\href{https://poritz.net/j/share/\shorttitle/}{poritz.net/j/share/\shorttitle/}\ .
\vskip1mm
If you don't want to write down that full URL, just remember
\begin{tabular}{l}
\href{http://poritz.net/jonathan/share}{poritz.net/jonathan/share}\\
or \href{http://poritz.net/j/share}{poritz.net/j/share}\\
or \href{http://poritz.net/jonathan}{poritz.net/jonathan}\ \
{\scrs[then click \textbf{Always SHARE}]}\\
or \href{http://poritz.net/j}{poritz.net/j}\ \ {\scrs[then click
\textbf{Always SHARE}]}\\
or scan \ \ \ $\xrightarrow{\hspace*{6cm}}$\\
\hphantom{or }{\scrs[then click \textbf{Always SHARE}]}
\end{tabular}
\vskip-2.2cm
\begin{tabular}{p{8cm} c}
\ & \includegraphics[height=3cm]{poritz_net_jonathan_qr-code_colorful.pdf}
\end{tabular}}
\end{frame}
\end{document}