SELECT/POLL
"Asynchronous I/O" is the ability of a process to
perform input/output on multiple sources at the same time. This term is also
used when the system does I/O when data is actually available or ready to be
sent, versus performing a read/write operation and blocking as a result. There are several channels through which
I/O could be performed (timeouts, signals, data on socket etc) and the key is
to monitor these multiple channels simultaneously.
Applications wishing to use a non blocking I/O use the poll, select, and epoll system calls to watch multiple file descriptors
and see whether they can read from or write to one or more open files without
blocking. These calls can also block a process until any of the file
descriptors that are being waited on, become available for reading or writing.
The functions poll and select pass an array of File
Descriptors (FDs) to the kernel, with an optional timeout value. When there is
activity, or when timeout occurs, the poll/select system call returns. The
application must then scan the result array to see which FDs have an event that
they were interested in receiving. This scheme works well with small numbers of
FDs, but does not scale for thousands of FDs. The epoll
call was added in Linux version 2.5.45 to scale to thousands of file
descriptors.
This article talks about the poll system call.
poll() System Call
The poll()
system call was introduced in Linux 2.1.23. The poll() library call was introduced in libc 5.4.28. To check the
version of glibc on the system, give the following command –
linux$ rpm –qa | grep glibc
#include
int poll (struct pollfd fds[],
nfds_t nfds, int timeout);
nfds is the number of pollfd
structures in the fds array.
timeout is the
timeout value in milli seconds.
For each member of the array pointed to by fds, poll() examines the file
descriptors for the event(s) specified in events
field in the fds structure. The
elements of poll_fd are as follows:
fd specifies an
open file descriptor,
events specifies
the events that need to watched and are bitmasks constructed by OR'ing a
combination flags, some of which are given below.
POLLIN - Data other than
high-priority data may be read without blocking.
POLLOUT - Normal
data may be written without blocking.
POLLERR - An error has occurred on
the device or stream. This flag is only valid in the revents bitmask; it
shall be ignored in the events member.
revents is set
with appropriate bits indicating events occurred on fd
struct pollfd {
int fd;
short events;
short revents;
}
Device Driver
Support for any of these calls i.e. poll, select or epoll,
requires support from the device driver. This support (for all three calls) is
provided through the driver's poll method. This
method has the following prototype:
unsigned int (*poll) (struct file
*filp, poll_table *wait);
This driver method is called whenever the user-space program
performs a poll, select,
or epoll system call involving a file descriptor
associated with the driver. The device method is in charge of these two steps:
1.
Call poll_wait() on wait
queues that could indicate a change in the poll status. If no file descriptors
are currently available for I/O, the kernel causes the process to wait on the
wait queues for all file descriptors passed to the system call.
2. Return
a bit mask via revents field of the pollfd describing the operations that
can be performed without blocking.
The 2nd argument to poll() is the poll_table and this is used as an opaque by
the driver to get a poll_table_entry
structure for its use. It is passed to the driver method so that the driver can
load it with every wait queue that could wake up the process and change the
status of the poll operation. The driver adds a wait queue to the poll_table structure by calling the
function poll_wait().
Function Flow on poll() from user space
User application calls select() or poll(). For poll(), the
function in kernel space that gets called is
do_sys_poll(). For select() he first function to be called
in kernel space is core_sys_select()located
in fs/select.c, and is a wrapper function for calling do_select().We
will look at what poll() function does in this section.
do_sys_poll()
then calls
Calls
do_pollfd()
Drivers
implement the poll() routines
calls __pollwait // Defined in fs/select.c
Calls
poll_get_entry() to get struct poll_table_entry
struct
poll_table_entry
{
}
Adds
filp to entry->filp
Adds
wait_address to entry->wait_address
Calls
init_waitqueue_entry()
to add “current” process to entry->wait->private
Calls
add_wait_queue()
to add struct entry->wait to the list wait_address. Thus, at the end of the
poll_wait(), the process has been added to the event’s waitQ.
When
a task wakes up, it has be removed from all the wait queues it is on. Having a
list of all the wait queues a task is on helps save time.
Data Structure
This section explains some internals of the poll_table_struct.
Whenever a user application calls poll, select, or epoll_ctl,
the kernel invokes the poll method of all files
referenced by the system call, passing the same poll_table to each of them. [3]. The poll_table structure is a wrapper around
a function that builds the actual data structure.
typedef struct poll_table_struct {
poll_queue_proc qproc;
} poll_table;
In the last section it was seen that the function poll_initwait()
called from do_sys_poll()
sets poll_table’s function pt->qproc to __pollwait()
The poll_table_page structure, for poll
and select, is a linked list of memory pages
containing poll_table_entry
structures.
struct poll_table_entry
{
}
struct poll_table_page {
struct poll_table_page * next;
struct poll_table_entry * entry;
struct poll_table_entry entries[0];
}
This structure is maintained by the kernel so that the
process can be removed from all of those queues before poll
or select returns. When the poll call completes, the poll_table structure is deallocated, and all wait queue entries
previously added to the poll table (if any) are removed from the table and
their wait queues.
The following figure is taken from [3].
Figure 1:
Data Structures behind poll
Code Example
We will take the example of signalfd() to explain the poll()
mechanism [4]. signalfd() is available on Linux since kernel 2.6.22
The signalfd function creates a file descriptor that can be
used to accept signals targeted at the caller.
This provides an alternative to the use of a signal handler and has the
advantage that the file descriptor may be monitored by select, poll or epoll.
Synopsis
#include
int signalfd (int fd, const
sigset_t *mask, int flags);
The mask argument specifies the set of signals that the
caller wishes to accept via the file descriptor. The set of signals to be
received via the file descriptor should be blocked using sigprocmask(2), to
prevent the signals being handled according to their default dispositions.
If the fd argument is -1, then the call creates a new file
descriptor and associates the signal set specified in mask with that
descriptor. If fd is not -1, then it
must specify a valid existing signalfd file descriptor, and mask is used to
replace the signal set associated with that descriptor.
signalfd() returns a file descriptor that supports read, close, poll, select and epoll calls.
Driver Code
Add poll() pointer to the file-ops structure and define it.
The following example is taken from fs/signalfd.c
static const struct file_operations signalfd_fops = {
.release = signalfd_release,
.poll = signalfd_poll,
.read = signalfd_read,
};
static unsigned int signalfd_poll(struct file *file, poll_table *wait)
{
struct signalfd_ctx *ctx = file->private_data;
unsigned int events = 0;
poll_wait(file, ¤t->sighand->signalfd_wqh, wait);
if (next_signal(¤t->pending, &ctx->sigmask) ||
next_signal(¤t->signal->shared_pending,
&ctx->sigmask))
events |= POLLIN;
return events;
}
Later, when a signal is available, the driver calls:
wake_up(¤t->sighand->signalfd_wqh);
This will cause the select/poll system call to wake up and
to check all file descriptors again (by calling the f_ops->poll function).
User Code
User Space Poll routine
#include
int poll(struct pollfd
*ufds, unsigned int nfds, int timeout);
The following user code has been tried on a
machine running Suse 11.1. This is the output of “uname –a”.
$ Linux linux-gg13 2.6.27.7-9-default
#1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux
#include
#include
#include
#include
#include
#include
int main (int argc, char *argv[])
{
int
sigfd;
sigset_t
mask;
struct
pollfd fds[1];
int
timeout_msecs = 200000;
int
ret;
/*
handle SIGTERM and SIGINT. */
sigemptyset
(&mask);
sigaddset
(&mask, SIGTERM); // kill -15
sigaddset
(&mask, SIGINT); // signal value =
2, kyb shortcut = ctrl-c
/*
Block signals handled using signalfd() to remove default signal actions */
if
(sigprocmask(SIG_BLOCK, &mask, NULL) < 0) {
perror
("sigprocmask");
return
1;
}
/*
Create a file descriptor from which we will read the signals. */
sigfd
= signalfd (-1, &mask, 0);
if
(sigfd < 0) {
perror
("signalfd");
return
1;
}
fds[0].fd
= sigfd;
fds[0].events
= POLLIN;
fds[0].revents
= 0;
ret
= poll(fds, 1, timeout_msecs);
if
(fds[0].revents && POLLIN) {
printf("\n
revents in fd[0] = 0x%x ", fds[0].revents);
}
if
(ret > 0) {
//
an event has ouccured on the fd
struct
signalfd_siginfo si;
ssize_t
res;
res
= read (sigfd, &si, sizeof(si));
if
(res < 0) {
perror
("read");
return
1;
}
if
(si.ssi_signo == SIGTERM) {
printf
("...SIGTERM\n");
}
else if (si.ssi_signo == SIGINT) {
printf
("...SIGINT\n");
}
}
else {
printf("...Timeout");
}
close
(sigfd);
return
0;
}
Running User Code
When the above user code is run in user space, it gives the
following output, as soon as the program is started and it encounters the
poll() call. The trace is got via a call to dump_stack() at the starting of
signalfd_poll() routine.
Call Trace:
dump_stack
signalfd_poll
do_sys_poll
sys_poll
system_call_fastpath
When a ^C is pressed, sending a SIGINT to the process, the same
stack trace is seen, as the do_sys_poll() again calls signalfd_poll() after
being woken up.
dump_stack
signalfd_poll
do_sys_poll
sys_poll
system_call_fastpath
References:
[1] Linux man
page, http://www.opengroup.org/onlinepubs/000095399/functions/poll.html
[2] Linux man
page http://linux.die.net/man/3/poll
[3] Linux Device
Drivers, 3rd Edition By Jonathan Corbet, Greg Kroah-Hartman,
Alessandro Rubini http://www.makelinux.net/ldd3/chp-6-sect-3.shtml
[4] http://www.kernel.org/doc/man-pages/online/pages/man2/signalfd.2.html
No comments:
Post a Comment