Wordsplit - an enhanced word splitter
Table of Contents
1 Overview
This package provides a set of C functions for parsing input strings.
Default parsing rules are are similar to those used in Bourne shell.
This includes tilde expansion, variable expansion, quote removal, word
splitting, command substitution, and path expansion. Parsing is
controlled by a number of settings which allow the caller to alter
processing at each of these phases or even to disable any of them.
Thus, wordsplit can be used for parsing inputs in different formats,
from simple character-delimited entries, as in /etc/passwd
, and up to
complex shell statements.
The following code fragment shows the basic usage:
/* This variable controls parsing */ wordsplit_t ws; int rc; /* Provide variable definitions */ ws.ws_env = (const char **) environ; /* Provide a function for expanding commands */ ws.ws_command = runcom; /* Split input_string into words */ rc = wordsplit(input_string, &ws, WRDSF_QUOTE /* Handle both single and double quoted strings as words. */ | WRDSF_SQUEEZE_DELIMS /* Compress adjacent delimiters */ | WRDSF_PATHEXPAND /* Expand pathnames */ | WRDSF_SHOWERR); /* Show errors */ if (rc == 0) { /* Success. The resulting words are returned in the NULL-terminated array ws.ws_wordv. Number of words is in ws.ws_wordc */ } /* Reclaim the allocated memory */ wordsplit_free(&ws);
For a detailed discussion, please see the man page wordsplit(3) included in the package.
2 Description
The package is designed as a drop-in facility for use in larger programs. It consists of the following files:
- wordsplit.h
- Interface header.
- wordsplit.c
- Main source file.
- wordsplit.3
- Manual page.
For most uses, you will need only these three. The rest of files are for building the autotest-based testsuite:
- wsp.c
- Auxiliary test program.
- wordsplit.at
- The source for the testsuite.
3 Incorporating wordsplit into your project
The project is designed to be used as a git submodule. To incorporate
it into your project, first select the location for the wordsplit
directory within your project. Then add the submodule at this
location. The rest is quite straightforward: you need to add
wordsplit.c
to your sources and add both wordsplit.c
and wordsplit.h
to the distributed files.
The following will describe each step in detail. For the rest of this
discussion it is supposed that wordsplit
is the name of the location
selected for the submodule. It is also supposed that your project
uses GNU autotools framework. If you are using plain makefiles, these
instructions are easy to convert to such use as well.
To add the submodule do:
git submodule add git://git.gnu.org.ua/wordsplit.git wordsplit
There are two methods of including the sources to the project: direct
incorporation and incorporation via VPATH
.
3.1 Direct incorporation
Add the subdir-objects
option to the invocation of AM_INIT_AUTOMAKE
macro
in your configure.ac:
AM_INIT_AUTOMAKE([subdir-objects])
In your Makefile.am
, add both wordsplit/wordsplit.c
and
wordsplit/wordsplit.h
to the sources and -Iwordsplit
to the cpp flags.
For example:
program_SOURCES = main.c \ wordsplit/wordsplit.c \ wordsplit/wordsplit.h AM_CPPFLAGS = -I$(srcdir)/wordsplit
You can also put wordsplit.h
in the noinst_HEADERS
variable, if you like:
program_SOURCES = main.c \ wordsplit/wordsplit.c noinst_HEADERS = wordsplit/wordsplit.h AM_CPPFLAGS = -I$(srcdir)/wordsplit
If you are building an installable library and wish to export the
wordsplit API, install wordsplit.h
to $(pkgincludedir)
, e.g.
lib_LTLIBRARIES = libmy.la libmy_la_SOURCES = main.c \ wordsplit/wordsplit.c AM_CPPFLAGS = -I$(srcdir)/wordsplit pkginclude_HEADERS = wordsplit/wordsplit.h
3.2 VPATH-based incorporation
Modify the VPATH
variable in your Makefile.am
:
VPATH = $(srcdir):$(srcdir)/wordsplit
Add wordsplit.c
to the nodist_program_SOURCES
variable:
nodist_program_SOURCES = wordsplit.c
The nodist_
prefix is necessary to prevent make from trying to
distribute this file from the current directory (where it doesn't
exist of course). During compilation it will be located using VPATH
.
Finally, add both wordsplit/wordsplit.c
and wordsplit/wordsplit.h
to
the EXTRA_DIST
variable and modify AM_CPPFLAGS
as shown in the
previous section.
An example Makefile.am
:
program_SOURCES = main.c nodist_program_SOURCES = wordsplit.c VPATH = $(srcdir):$(srcdir)/wordsplit EXTRA_DIST = wordsplit/wordsplit.c wordsplit/wordsplit.h AM_CPPFLAGS = -I$(srcdir)/wordsplit
It is also possible to use LDADD
as shown in the example below:
program_SOURCES = main.c LDADD = wordsplit.o VPATH = $(srcdir):$(srcdir)/wordsplit EXTRA_DIST = wordsplit/wordsplit.c wordsplit/wordsplit.h AM_CPPFLAGS = -I$(srcdir)/wordsplit
4 The testsuite
The package contains two files for building the testsuite: wsp.c
,
which is used to build the auxiliary binary wsp
, and wordsplit.at
,
which can be included to a GNU autotest-based testsuite source.
The discussion below is for those who wish to include wordsplit testsuite into their project. It assumes that the hosting project already has an autotest-based testsuite.
4.1 Additional files
To build the auxiliary tool wsp
, you will need an additional file,
wordsplit-version.h
. Normally, it should contain only a definition
of the macro or variable WORDSPLIT_VERSION
. The following shell
fragment can be used to create it:
version=$(cd wordsplit; git describe) cat > wordsplit-version.h <<EOF #define WORDSPLIT_VERSION "$version" EOF
This file should be listed in the EXTRA_DIST
variable to make sure
it is distributed with the tarball.
4.2 testsuite.at
Include the file wordsplit.at
to your testsuite.at
:
m4_include(wordsplit.at)
4.3 Makefile.am
In the Makefile.am
responsible for creating the testsuite, make sure
that the path to the wordsplit module is passed to the autotest
invocation, so that the above m4_include
statement will work. The
usual make goal to build the testsuite looks as follows:
$(TESTSUITE): package.m4 $(TESTSUITE_AT) $(AM_V_GEN)$(AUTOTEST) \ -I $(srcdir)\ -I $(top_srcdir)/wordsplit\ testsuite.at -o $@.tmp $(AM_V_at)mv $@.tmp $@
Then, add the following fragment to build the auxiliary files:
# ########################### # Wordsplit testsuite # ########################### EXTRA_DIST += wordsplit-version.h $(srcdir)/wordsplit-version.h: $(top_srcdir)/configure.ac $(AM_V_GEN){\ if test -e $(top_srcdir)/libmailutils/wordsplit/.git; then \ wsversion=$$(cd $(top_srcdir)/libmailutils/wordsplit; git describe); \ else \ wsversion="unknown"; \ fi;\ echo "#define WORDSPLIT_VERSION \"$wsversion\"";\ echo '#include <mailutils/wordsplit.h>'; } > \ > $(srcdir)/wordsplit-version.h noinst_PROGRAMS += wsp wsp_SOURCES = nodist_wsp_SOURCES = wsp.c wsp.o: $(srcdir)/wordsplit-version.h VPATH = $(srcdir):$(top_srcdir)/wordsplit
5 History
First version of wordsplit appeared in March 2009 as a part of the Wydawca1 project. Its main usage was to assist in configuration file parsing. The parser subsystem proved to be quite useful and soon evolved into a separate project - Grecs2. Wordsplit had since been used (as a git submodule) in a number of other projects, such as GNU Dico3 and Direvent4, to name a few.
In 2010 the wordsplit sources were incorporated to the GNU
Mailutils5 package, where they replaced the
decommissioned argcv
module. Mailutils has its own configuration
package, therefore using Grecs was not expedient. The wordsplit
sources had been exported from Grecs and incorporated into
Mailutils. Since then Mailutils and Grecs versions or wordsplit were
periodically synchronized.
Several other projects, such as GNU Rush6 and fileserv7, followed suit. It had therefore been decided that it was advisable to have wordsplit as a separate package which could be easily included in another project without incurring unnecessary overhead. This separate package was created on July 7, 2019.
By the end of July 2019, all mentioned packages had switched to using wordsplit as a submodule.
6 Bug reporting
Please send bug reports, questions, suggestions and criticism via email to Sergey Poznyakoff or use the project's bug tracker. When sending bug reports, please make sure to provide the following information:
- Wordsplit invocation flags.
- Input string.
- Produced output.
- Expected output.
7 Copying
Copyright (C) 2009-2020 Sergey Poznyakoff
Permission is granted to anyone to make or distribute verbatim copies of this document as received, in any medium, provided that the copyright notice and this permission notice are preserved, thus giving the recipient permission to redistribute in turn.
Permission is granted to distribute modified versions of this document, or of portions of it, under the above conditions, provided also that they carry prominent notices stating who last changed them.
Footnotes:
Wydawca - an automatic release submission daemon
Home: http://puszcza.gnu.org.ua/software/wydawca
Git: http://git.gnu.org.ua/cgit/wydawca.git
Grecs - a library for parsing structured configuration files
Home: https://puszcza.gnu.org.ua/projects/grecs
Git: http://git.gnu.org.ua/cgit/grecs.git
GNU Dico - a dictionary server
Home: https://puszcza.gnu.org.ua/projects/dico
Git: http://git.gnu.org.ua/cgit/dico.git
GNU Direvent - filesystem event watching daemon
Home: http://puszcza.gnu.org.ua/software/direvent
Git: http://git.gnu.org.ua/cgit/direvent.git
GNU Mailutils - a general-purpose mail package
Home: http://mailutils.org
Git: http://git.savannah.gnu.org/cgit/mailutils.git
GNU Rush - a restricted user shell for remote access
Home: http://puszcza.gnu.org.ua/software/rush
Git: http://git.gnu.org.ua/cgit/rush.git
fileserv - simple http server for serving static files
Home: https://puszcza.gnu.org.ua/projects/fileserv
Git: http://git.gnu.org.ua/cgit/fileserv.git