dwww Home | Manual pages | Find package

ltx2crossrefxml(1)            LATEX CROSSREFWARE            ltx2crossrefxml(1)

NAME
       ltx2crossrefxml.pl - create XML files for submitting to crossref.org

SYNOPSIS
       ltx2crossrefxml [-c config_file]  [-o output_file] [-input-is-xml]
                       latex_file1 latex_file2 ...

OPTIONS
       -c config_file
           Configuration file.  If this file is absent, defaults are used.
           See below for its format.

       -o output_file
           Output file.  If this option is not used, the XML is output to
           stdout.

       -rpi-is-xml
           Do not transform author and title input strings, assume they are
           valid XML.

       The usual "--help" and "--version" options are also supported. Options
       can begin with either "-" or "--", and ordered arbitrarily.

DESCRIPTION
       For each given latex_file, this script reads ".rpi" and (if they exist)
       ".bbl" files and outputs corresponding XML that can be uploaded to
       Crossref (<https://crossref.org>). Any extension of latex_file is
       ignored, and latex_file itself is not read (and need not even exist).

       Each ".rpi" file specifies the metadata for a single article to be
       uploaded to Crossref (a "journal_article" element in their schema); an
       example is below. These files are output by the "resphilosophica"
       package (<https://ctan.org/pkg/resphilosophica>), but (as always) can
       also be created by hand or by whatever other method you implement.

       Any ".bbl" files present are used for the citation information in the
       output XML. See the CITATIONS section below.

       Unless "--rpi-is-xml" is specified, for all text (authors, title,
       citations), standard TeX control sequences are replaced with plain text
       or UTF-8 or eliminated, as appropriate. The "LaTeX::ToUnicode::convert"
       routine is used for this (<https://ctan.org/pkg/bibtexperllibs>).
       Tricky TeX control sequences will almost surely not be handled
       correctly. If "--rpi-is-xml" is given, the author and title strings
       from the rpi files are output as-is, assuming they are valid XML; no
       checking is done. Citation text from ".bbl" files is always converted
       from LaTeX to plain text.

       This script just writes an XML file. It's up to you to actually do the
       uploading to Crossref; for example, you can use their Java tool
       "crossref-upload-tool.jar"
       (<https://www.crossref.org/education/member-setup/direct-deposit-xml/https-post>).
       For the definition of their schema, see
       <https://data.crossref.org/reports/help/schema_doc/4.4.2/index.html>
       (this is the schema version currently followed by this script).

CONFIGURATION FILE FORMAT
       The configuration file is read as Perl code. Thus, comment lines
       starting with "#" and blank lines are ignored. The other lines are
       typically assignments in the form (spaces are optional):

           $variable = value ;

       Usually the value is a "string" enclosed in ASCII double-quote or
       single-quote characters, per Perl syntax. The idea is to specify the
       user-specific and journal-specific values needed for the Crossref
       upload. The variables which are used are these:

           $depositorName = "Depositor Name";
           $depositorEmail = 'depositor@example.org';
           $registrant = 'Registrant';  # organization name
           $fullTitle = "FULL TITLE";   # journal name
           $issn = "1234-5678";         # required
           $abbrevTitle = "ABBR. TTL."; # optional
           $coden = "CODEN";            # optional

       For a given run, all ".rpi" data read is assumed to belong to the
       journal that is specified in the configuration file. More precisely,
       the configuration data is written as a "journal_metadata" element, with
       given "full_title", "issn", etc., and then each ".rpi" is written as
       "journal_issue" plus "journal_article" elements.

       The configuration file can also define one Perl function:
       "LaTeX_ToUnicode_convert_hook". If it is defined, it is called at the
       beginning of the procedure that converts LaTeX text to Unicode, which
       is done with the LaTeX::ToUnicode module, from the "bibtexperllibs"
       package (<https://ctan.org/pkg/bibtexperllibs>). The function must
       accept one string (the LaTeX text), and return one string (presumably
       the transformed string). The standard conversions are then applied to
       the returned string, so the configured function need only handle
       special cases, such as control sequences particular to the journal at
       hand.

RPI FILE FORMAT
       Here's the (relevant part of the) ".rpi" file corresponding to the
       "rpsample.tex" example in the "resphilosophica" package
       (<https://ctan.org/pkg/resphilosophica>):

         %authors=Boris Veytsman\and A. U. Th{\o }r\and C. O. R\"espondent
         %title=A Sample Paper:\\ \emph  {A Template}
         %year=2012
         %volume=90
         %issue=1--2
         %startpage=1
         %endpage=1
         %doi=10.11612/resphil.A31245
         %paperUrl=http://borisv.lk.net/paper12
         %publicationType=full_text

       Other lines, some not beginning with %, are ignored (and not shown).
       For more details on processing, see the code.

       The %paperUrl value is what will be associated with the given %doi
       (output as the "resource" element). Crossref strongly recommends that
       the url be for a so-called landing page, and not directly for a pdf
       (<https://www.crossref.org/education/member-setup/creating-a-landing-page/>).
       Special case: if the url is not specified, and the journal is
       Res Philosophica, a special-purpose search url using pdcnet.org is
       returned.  Any other journal must always specify this.

       The %authors field is split at "\and" (ignoring whitespace before and
       after), and output as the "contributors" element, using
       "sequence="first"" for the first listed, "sequence="additional"" for
       the remainder.

       If the %publicationType is not specified, it defaults to "full_text",
       since that has historically been the case; "full_text" can also be
       given explicitly. The other values allowed by the Crossref schema are
       "abstract_only" and "bibliographic_record". Finally, if the value is
       "omit", the "publication_type" attribute is omitted entirely from the
       given "journal_article" element.

       Each ".rpi" must contain information for only one article, but multiple
       files can be read in a single run. It would not be difficult to support
       multiple articles in a single ".rpi" file, but it makes debugging and
       error correction easier when each uploaded XML contains a single
       article.

   MORE ABOUT AUTHOR NAMES
       The three formats for names recognized are (not coincidentally) the
       same as BibTeX:

          First von Last
          von Last, First
          von Last, Jr., First

       The forms can be freely intermixed within a single %authors line,
       separated with "\and" (including the backslash). Commas as name
       separators are not supported, unlike BibTeX.

       In short, you may almost always use the first form; you shouldn't if
       either there's a Jr part, or the Last part has multiple tokens but
       there's no von part. See the "btxdoc" (``BibTeXing'' by Oren Patashnik)
       document for details.

       In the %authors line of a ".rpi" file, some secondary directives are
       recognized, indicated by "|" characters. Easiest to explain with an
       example:

         %authors=|organization|\LaTeX\ Project Team \and Alex Brown|orcid=123

       Thus: 1) if "|organization|" is specified, the author name will be
       output as an "organization" contributor, instead of the usual
       "person_name", as the Crossref schema requires.

       2) If "|orcid=value|" is specified, the value is output as an "ORCID"
       element for that "person_name".

       These two directives, "|organization"| and "|orcid|" are mutually
       exclusive, because that's how the Crossref schema defines them. The "="
       sign after "orcid" is required, while all spaces after the "orcid"
       keyword are ignored. Other than that, the ORCID value is output
       literally. (E.g., the ORCID value of 123 above is clearly invalid, but
       it would be output anyway, with no warning.)

       Extra "|" characters, at the beginning or end of the entire %authors
       string, or doubled in the middle, are accepted and ignored. Whitespace
       is ignored around all "|" characters.

CITATIONS
       Each ".bbl" file corresponding to an input ".rpi" file is read and used
       to output a "citation_list" element for that "journal_article" in the
       output XML. If no ".bbl" file exists for a given ".rpi", no
       "citation_list" is output for that article.

       The ".bbl" processing is rudimentary: only so-called
       "unstructured_citation" references are produced for Crossref, that is,
       the contents of the citation (each paragraph in the ".bbl") is dumped
       as a single flat string without markup.

       Bibliography text is unconditionally converted from TeX to XML, via the
       method described above. It is not unusual for the conversion to be
       incomplete or incorrect.  It is up to you to check for this; e.g., if
       any backslashes remain in the output, it is most likely an error.

       Furthermore, it is assumed that the ".bbl" file contains a sequence of
       references, each starting with "\bibitem{KEY}" (which itself must be at
       the beginning of a line, preceded only by whitespace), and the whole
       bibliography ending with "\end{thebibliography}" (similarly at the
       beginning of a line). A bibliography not following this format will not
       produce useful results. Bibliographies can be created by hand, or with
       BibTeX, or any other method.

       The "key" attribute for the "citation" element is taken as the KEY
       argument to the "\bibitem" command. The sequential number of the
       citation (1, 2, ...) is appended. The argument to "\bibitem" can be
       empty ("\bibitem{}", and the sequence number will be used on its own.
       Although TeX will not handle empty "\bibitem" keys, it can be
       convenient when creating a ".bbl" purely for Crossref.

       The ".rpi" file is also checked for the bibliography information, in
       this same format.

       Feature request: if anyone is interested in figuring out how to
       generate structured citations
       (<https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#citation>)
       instead of these flat text dumps, that would be great.

EXAMPLES
         ltx2crossrefxml.pl ../paper1/paper1.tex ../paper2/paper2.tex \
                             -o result.xml

         ltx2crossrefxml.pl -c myconfig.cfg paper.tex -o paper.xml

AUTHOR
       Boris Veytsman <https://github.com/borisveytsman/crossrefware>

COPYRIGHT AND LICENSE
       Copyright (C) 2012-2021  Boris Veytsman

       This is free software.  You may redistribute copies of it under the
       terms of the GNU General Public License
       <https://www.gnu.org/licenses/gpl.html>.  There is NO WARRANTY, to the
       extent permitted by law.

                                  2021-10-02                ltx2crossrefxml(1)

Generated by dwww version 1.14 on Fri Jan 24 06:34:23 CET 2025.