Thursday 31 May 2012

Understanding Netlink Socket

HISTORY
Netlink was added by Alan Cox during Linux kernel 1.3 developments as a character driver Interface to provide multiple kernel and user-space bidirectional communications links. Then, Alexey Kuznetsov extended it during Linux kernel 2.1 development to provide a flexible and extensible messaging interface to the new advanced routing infrastructure.

What is Netlink
Netlink is a datagram-oriented messaging system that allows passing messages from kernel to user-space and vice-versa. It can be also used as an Inter-Process-Communication (IPC) system.

DESIGN
Netlink is implemented on top of the generic BSD socket infrastructure, thus, it supports usual primitives like socket (), bind (), sendmsg () and recvmsg () as well as common socket polling mechanisms.

Netlink socket bus
Netlink allows up to 32 busses in kernel-space. Generally each bus is attached to one kernel sub-system, although several kernel sub-systems may share the same bus.
·         nfnetlink is the netfilter bus used by all the firewalling sub-system available in Linux.
·         rtnetlink  is the networking bus which is used for networking device management, routing, neighbouring, and queuing  respective sub-systems.

Netlink communication types
The Netlink family allows two sorts of communications, unicast and multicast.
  1. Unicast:  Unicast is useful to establish a 1:1 communication channel between a kernel subsystem and one user-space process. Typically, unicast channels are used to send commands to kernel-space, to receive the result of commands, and to request some information to a given kernel subsystem.
  2. Multicast:  Multicast is useful to establish a 1:N communication channels. Typically the sender is the kernel and there are N possible listeners in user-space. This is useful for event-based notification. Before Linux kernel version 2.6.14 the maximum number of groups can be created is 32, but now a day’s its 2^32 (2 rose to the power 32) multicast groups are possible. 
 Netlink message format:
NLM_F_MULTI: this is a multi-part message. A Netlink subsystem replies with a multi-part message if it has previously received a request from user-space with the NLM_F_DUMP flag set.

NLM_F_MULTI: this is a multi-part message. A Netlink subsystem replies with a multi-part message if it has previously received a request from user-space with the NLM_F_DUMP flag set.
NLM_F_ECHO: if this flag is set, the user-space application wants to get a report back via unicast of the request that it has send. However, if the user-space application is also subscribed to event-based notifications, it does not receive any notification via multicast as it already receives it via unicast.
Netlink messages are aligned to 32 bits and, generally speaking, they contain data that is expressed in host-byte order. A Netlink message always starts by a fixed header of 16 bytes defined by struct nlmsghdr in <include/linux/netlink.h>
  • Message length (32 bits): size of the message in bytes, including the header
  • Message type (16 bits): the type of this message. There are two sorts, data and control messages. Data messages depend on the set of actions that the given kernel-space subsystem allows. Control messages are common to all Netlink subsystems, there are currently four types of control messages, although there are 16 slots reserved, those are:
    • NLMSG_NOOP: no operation, this can be used to implement a Netlink ping utility to know if a given Netlink bus is available.
    • NLMSG_ERROR: this message contains an error.
    • NLMSG_DONE: this is the trailing message that is part of a multi-part message. A multi-part message is composed of a set of messages all with the NLM F MULTI flag set.  
  • Message flags (16 bits): several message flags like:
    • NLM_F_REQUEST: if this flag is set, this Netlink message contains a request.Messages that go from from user to kernel-space must set this flag, otherwise the kernel subsystem must report an invalid argument (EINVAL) error to the user-space sender.
    • NLM_F_CREATE: the user-space application wants to issue a command or add a new configuration to the kernel-space subsystem.
    • NLM_F_EXCL: this is commonly used together with NLM_F_CREATE to trigger an error if the configuration that user-space wants to add already exists in kernel-space.
    • NLM F REPLACE: the user-space application wants to replace an existing configuration in the kernel-space subsystem.
    • NLM_F_APPEND: append a new configuration to the existing one. This is used for ordered data, like routing information, where the default is otherwise to prepend.
    • NLM_F_DUMP: the user-space application wants a full resynchronization with the kernel subsystem. The result is a batch of Netlink messages, also known as multi-part messages, which contain the kernel subsystem information.     
  • Sequence number (32 bits): message sequence number. This is useful together with NLM_F_ACK if an user-space application wants to make sure that a request has been correctly issued. Netlink uses the same sequence number in the messages that are sent as reply to a given request. For event-based notifications from kernel-space, this is always zero.
  • Port-ID (32 bits): this field contains a numerical identifier that is assigned by Netlink. Netlink assigns different port-ID values to identify several socket channels opened by the same user-space process. The default value for the first socket is the Process Identifier (PID). Under some circunstances, this value is set to zero, they are: 
    • This message comes from kernel-space.
    • This message comes from user-space, and we want Netlink automatically set the value according to the corresponding port ID assigned to this socket channel 
  • The payload of Netlink messages is composed of a set of attributes that are expressed in Type-Length-Value (TLV) format.
  • Each Netlink attribute header is defined by struct nlattr and it is composed of the following fields:
    • Type (16 bits): the attribute type according to the set of available types in the kernel subsystem. The two most significant bits of this field are used to encode if this is a nested attribute (bit 0), which allows you to embed a set of attribute in the payload of one attribute, and if the attribute payload is represented in network byte order (bit 1). Thus, the maximum number of attributes per Netlink subsystem is limited to 16384.
    • Length (16 bits): size in bytes of the attribute. This includes this header header plus the payload size of this attribute without alignment to 32 bits.
    • Value: this field is variable in size but it is always aligned to 32 bits.  
 
 
Netlink error messages

This header is defined by struct nlmsgerr in <include/linux/netlink.h> and it contains two fields:
  1. Error type (32 bits): this field contains a standarized error value that identifies the sort of error. The error value is defined by errno.h. These errors are perror() interpretable.   
  2. Netlink message which contains the request that has triggered the error.
NB: With regards to message integrity, the kernel subsystems that support Netlink usually report invalid argument (EINVAL) via recvmsg() if user-space sends a malformed message 
Netlink reliability mechanisms
In Netlink-based communications, there are two possible situations that may result in message loss:
  1. Memory exhaustion: there is no memory available to allocate the message.
  2. Buffer overrun: there is no space in the receiver queue that is used to store messages. This may occur in communications from kernel to user-space.
  
Sequence Diagram of Netlink Dump operation

Cause of Buffer overrun issue :
  1.  A user-space listener is too slow to handle all the Netlink messages that the kernel subsystem sends at a given rate.
  2. The queue that is used to store messages that go from kernel to user-space is too small.
SAMPLE PROGRAMs
Kernel Mode Program:
/*
Program Description :: A thread based kernel module which main purpose is to send the ping packets from kernel space to User space via netlink socket. (keep default ping duration i.e.1Sec)
*/

/*
    *************************************************
    * Author            : Sudhansu Sekhar Mishra
    * Module           : Kernel Based Netlink Socket
    * OS version      : 2.6.9-55ELSmp
    * Type                : Open Source(GPL)
    * E_mail            : sudhansu.8454@gmail.com
    * File Name       : Netlink_Kernel_thread.c
    *************************************************
*/

/************ Kernel Header Files *************/
#include <linux/module.h>
#include <linux/skbuff.h>
#include <linux/netdevice.h> 
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_ipv6.h>
#include <linux/netfilter.h>
#include <linux/ip.h>
#include <linux/icmp.h>
#include <linux/udp.h>
#include <linux/tcp.h>
#include <linux/spinlock.h>
#include <linux/socket.h>
#include <linux/in.h>
#include <linux/completion.h>
#include <asm-i386/signal.h>
#include <asm/param.h>
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/time.h>
#include <linux/ctype.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/netlink.h>
#include <linux/kthread.h>
#include <linux/wait.h>
#include <net/sock.h>
#include <linux/timer.h>
#include <linux/jiffies.h>

/************ Module specific *************/
#define BUFSIZE 2048
#define TRIALS 10
#define TRUE 1
#define FALSE 0
#define NETLINK_HELLO 19
#define NETLINK_BYE 21

static struct sock *hello_sock_z = NULL;
static struct sock *bye_sock_z = NULL;

static int usr_pid_hello = 0;
static int usr_pid_bye = 0;

static int hello_thread_id = -1;
static int bye_thread_id = -1;

static DECLARE_WAIT_QUEUE_HEAD(event_wait_hello_z);
static DECLARE_WAIT_QUEUE_HEAD(event_wait_bye_z);

static int nf_registered_hello_ipv4;
static int nf_registered_bye_ipv4;
static int failed_to_send_hello = 0;
static int failed_to_send_bye = 0;

static spinlock_t send_hello_lock = SPIN_LOCK_UNLOCKED;
static spinlock_t send_bye_lock = SPIN_LOCK_UNLOCKED;

void register_IPv4_in_hook(void);

//NF HOOK for IPv4 incoming packets
unsigned int hello_IPv4_in_hook (unsigned int hook,
                                 struct sk_buff **pskb,
                                   const struct net_device *indev,
                                   const struct net_device *outdev);
unsigned int bye_IPv4_in_hook (unsigned int hook,
                                 struct sk_buff **pskb,
                                   const struct net_device *indev,
                                   const struct net_device *outdev);
static int send_bye_response( void *msg, int size );
static int send_hello_response( void *msg, int size );
int createThread_hello(void);
int createThread_bye(void);
static int process_hello_requests(void *data);
static int process_bye_requests(void *data);
static struct nf_hook_ops hello_IPv4_inHook_ops =
{
        {NULL, NULL},
        hello_IPv4_in_hook,        // hook function
        THIS_MODULE,            //owner
        PF_INET,                // protocol family for IPv4
        NF_IP_PRE_ROUTING,        // hook to be manipulated
        NF_IP_PRI_FILTER + 1    // priority
};
static struct nf_hook_ops bye_IPv4_inHook_ops =
{
        {NULL, NULL},
        bye_IPv4_in_hook,        // hook function
        THIS_MODULE,            //owner
        PF_INET,                // protocol family for IPv4
        NF_IP_PRE_ROUTING,      // hook to be manipulated
        NF_IP_PRI_FILTER + 1    // priority
};
unsigned int hello_IPv4_in_hook (unsigned int hook,
                                 struct sk_buff **pskb,
                                   const struct net_device *indev,
                                   const struct net_device *outdev)
{
    int iLength;
    int i;

    char msg[BUFSIZE];
    unsigned char *p = NULL;

    struct iphdr *iph = NULL;
   
    memset(&msg, 0, BUFSIZE);
    iph = (struct iphdr *)(*pskb)->nh.iph;

    if (iph == NULL){
        printk(KERN_CRIT"hello_IPv4_in_hook :: Failed to decode ip header :: drop the pkt\n");
        return NF_DROP;
    }
    if (iph->protocol != 0x01)
        goto getout;
    iLength = ntohs(iph->tot_len);
    printk(KERN_CRIT"Hello, Protocol = %x\n",iph->protocol);
    printk(KERN_CRIT"Hello, packet length = %d\n",iLength);
    p = (unsigned char *)((*pskb)->data);
   
    if (p)
    {
        if (iLength < (BUFSIZE / 2)){
            for (i = 0; i < iLength; ++i){
                sprintf(msg + i * 2, "%02x", p[i]);
            }
        }
        else{
            for (i = 0; i < (BUFSIZE / 2); ++i)
            {
                sprintf(msg + i * 2, "%02x", p[i]);
            }
        }
        //printk(KERN_CRIT"Hello Payload = %s\n",msg);
    }
    spin_lock(&send_hello_lock);
    send_hello_response(msg, strlen(msg));
    spin_unlock(&send_hello_lock);

getout:
    return NF_ACCEPT;
}
unsigned int bye_IPv4_in_hook (unsigned int hook,
                               struct sk_buff **pskb,
                               const struct net_device *indev,
                               const struct net_device *outdev)
{
        int iLength;
        int i;

        char msg[BUFSIZE];
        unsigned char *p = NULL;

        struct iphdr *iph = NULL;

        memset(&msg, 0, BUFSIZE);
        iph = (struct iphdr *)(*pskb)->nh.iph;

        if (iph == NULL){
                printk(KERN_CRIT"bye_IPv4_in_hook :: Failed to decode ip header :: drop the pkt\n");
                return NF_DROP;
        }
        if (iph->protocol != 0x01)
                goto getout;
        iLength = ntohs(iph->tot_len);
        printk(KERN_CRIT"BYE, Protocol = %x\n",iph->protocol);
        printk(KERN_CRIT"BYE, packet length = %d\n",iLength);
        p = (unsigned char *)((*pskb)->data);

        if (p)
        {
            if (iLength < (BUFSIZE / 2)){
                for (i = 0; i < iLength; ++i){
                    sprintf(msg + i * 2, "%02x", p[i]);
                }
            }
            else{
                for (i = 0; i < (BUFSIZE / 2); ++i)
                {
                    sprintf(msg + i * 2, "%02x", p[i]);
                }
            }
            //printk(KERN_CRIT"Bye Payload = %s\n",msg);
        }
        spin_lock(&send_bye_lock);
        send_bye_response(msg, strlen(msg));
        spin_unlock(&send_bye_lock);

getout:
        return NF_ACCEPT;
}

static int send_hello_response( void *msg, int size )
{
    int result;
    struct sk_buff *skb = NULL;
    struct nlmsghdr *nlh = NULL;
    if (0 == usr_pid_hello){
        //printk(KERN_CRIT"User spce Application is not ready\n");
        return -1;
    }
    skb = alloc_skb( NLMSG_SPACE( size ), GFP_ATOMIC );
    if( !skb )
    {
        printk(KERN_CRIT"send_hello_response :: Failed to allocate memory\n");
        return -ENOMEM;
    }
    result = -EINVAL;
    nlh = NLMSG_PUT( skb, 0, 0, 0, size );
    memset(nlh, 0, size);
    memcpy( NLMSG_DATA( nlh ), msg, size );
    if (failed_to_send_hello)
        goto nlmsg_failure;
    if ((hello_sock_z != NULL) && (usr_pid_hello != 0)){
        //Sudhansu :: on failure netlink_unicast will free the buffer (skb)
        result = netlink_unicast( hello_sock_z, skb, usr_pid_hello, 0 );
    }
    else{
        printk(KERN_CRIT"Hello >> Comunication between U-Plane and C-Plane closed\n");
        goto nlmsg_failure;
    }
    if( result < 0 )
    {
            printk(KERN_CRIT"Hello :: netlink_unicast fails\n");
            failed_to_send_hello = 1;
            skb = NULL;
            goto nlmsg_failure;
    }
    return result;

nlmsg_failure: /* Required by NLMSG_PUT */
    printk(KERN_CRIT"Hello :: netlink_unicast fails return code = %d\n",result);
    if (skb != NULL){
        kfree_skb(skb);
        skb = NULL;
    }
    return result;
}
static int send_bye_response( void *msg, int size )
{
    int result;
    struct sk_buff *skb = NULL;
    struct nlmsghdr *nlh = NULL;
    if (0 == usr_pid_bye){
        //printk(KERN_CRIT"User spce Application is not ready\n");
        return -1;
    }
    skb = alloc_skb( NLMSG_SPACE( size ), GFP_ATOMIC );
    if( !skb )
    {
        printk(KERN_CRIT"send_bye_response :: Failed to allocate memory\n");
        return -ENOMEM;
    }
    result = -EINVAL;
    nlh = NLMSG_PUT( skb, 0, 0, 0, size );
    memset(nlh, 0, size);
    memcpy( NLMSG_DATA( nlh ), msg, size );
    if (failed_to_send_bye)
        goto nlmsg_failure;
    if ((bye_sock_z != NULL) && (usr_pid_bye != 0)){
        //Sudhansu :: on failure netlink_unicast will free the buffer (skb)
        result = netlink_unicast( bye_sock_z, skb, usr_pid_bye, 0 );
    }
    else{
        printk(KERN_CRIT"Hello >> Comunication between U-Plane and C-Plane closed\n");
        goto nlmsg_failure;
    }
    if( result < 0 )
    {
            printk(KERN_CRIT"Hello :: netlink_unicast fails\n");
            failed_to_send_bye = 1;
            skb = NULL;
            goto nlmsg_failure;
    }
    return result;

nlmsg_failure: /* Required by NLMSG_PUT */
    printk(KERN_CRIT"Hello :: netlink_unicast fails return code = %d\n",result);
    if (skb != NULL){
        kfree_skb(skb);
        skb = NULL;
    }
    return result;
}
int process_hello_commands(char *msg, int len)
{
    int iRet = -1;

    spin_lock(&send_hello_lock);
    iRet = send_hello_response(msg, len);
    spin_unlock(&send_hello_lock);
}
int process_bye_commands(char *msg, int len)
{
    int iRet = -1;

    spin_lock(&send_bye_lock);
    iRet = send_bye_response(msg, len);
    spin_unlock(&send_bye_lock);
}
static int process_hello_requests(void *data)
{
    int result;
    struct sk_buff *skb = NULL;
    struct nlmsghdr *nlh = NULL;
    char msg[BUFSIZE];
   
    DECLARE_WAITQUEUE(wait, current);
    daemonize("Hello_Server_z");
    allow_signal(SIGKILL);
   
    set_current_state( TASK_INTERRUPTIBLE );
    add_wait_queue(&event_wait_hello_z, &wait);

    if( !skb_queue_len( &hello_sock_z->sk_receive_queue ) )
    {
        schedule();
    }
    set_current_state(TASK_RUNNING);
    remove_wait_queue(&event_wait_hello_z, &wait);

    skb = skb_dequeue(&hello_sock_z->sk_receive_queue );
    if( skb )
    {
        nlh = (struct nlmsghdr *)skb->data;
        usr_pid_hello = nlh->nlmsg_pid;
       
        printk(KERN_CRIT"Hello Server, user pid = %d\n",usr_pid_hello);
        printk(KERN_CRIT"Data from Hello Client = %s\n",(char *)NLMSG_DATA(nlh));
       
        memset(msg, 0, BUFSIZE);
        memcpy( msg, NLMSG_DATA(nlh), BUFSIZE );
        strncpy( msg, "Hello Server Good day Client", BUFSIZE);
       
        process_hello_commands(msg, BUFSIZE);
       
        if (skb != NULL)
            kfree_skb( skb );
       
        result = 0;
    }
    else
    {
        result = -1;
    }
    return result;
}
static int process_bye_requests(void *data)
{
    int result;
    struct sk_buff *skb = NULL;
    struct nlmsghdr *nlh = NULL;
    char msg[BUFSIZE];
   
    DECLARE_WAITQUEUE(wait, current);
    daemonize("Bye_Server_z");
    allow_signal(SIGKILL);
   
    set_current_state( TASK_INTERRUPTIBLE );
    add_wait_queue(&event_wait_bye_z, &wait);

    if( !skb_queue_len( &bye_sock_z->sk_receive_queue ) )
    {
        schedule();
    }
    set_current_state(TASK_RUNNING);
    remove_wait_queue(&event_wait_bye_z, &wait);

    skb = skb_dequeue(&bye_sock_z->sk_receive_queue );
    if( skb )
    {
        nlh = (struct nlmsghdr *)skb->data;
        usr_pid_bye = nlh->nlmsg_pid;
       
        printk(KERN_CRIT"Bye Server, user pid = %d\n",usr_pid_bye);
        printk(KERN_CRIT"Bye Client message  = %s\n",(char *)NLMSG_DATA(nlh));
   
        memset(msg, 0, BUFSIZE);
        memcpy( msg, NLMSG_DATA(nlh), BUFSIZE );
        strncpy( msg, "BYE Server Good Evening Client", BUFSIZE);
       
        process_bye_commands(msg, BUFSIZE);
       
        if (skb != NULL)
            kfree_skb( skb );
       
        result = 0;
    }
    else
    {
        result = -1;
    }
    return result;
}
int createThread_hello()
{
    hello_thread_id = kernel_thread(process_hello_requests, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD | CLONE_KERNEL);
    if (hello_thread_id == 0)
        return -EIO;
    return hello_thread_id;
}
int createThread_bye()
{
    bye_thread_id = kernel_thread(process_bye_requests, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD | CLONE_KERNEL);
    if (bye_thread_id == 0)
        return -EIO;
    return bye_thread_id;
}
static void Uplane_receive_hello( struct sock *sk, int len )
{
    wake_up_interruptible( &event_wait_hello_z);
}
static void Uplane_receive_bye( struct sock *sk, int len )
{
    wake_up_interruptible( &event_wait_bye_z);
}

int init_module(void)
{
    //Initializing spin_lock
    spin_lock_init(&send_hello_lock);
    spin_lock_init(&send_bye_lock);
   
   
    printk(KERN_CRIT"********* Creating Hello Thread *********\n");
    hello_sock_z = netlink_kernel_create( NETLINK_HELLO, Uplane_receive_hello);
    if (NULL == hello_sock_z)
    {
        printk(KERN_CRIT"Failed to create Hello Netlink Socket .. Exiting \n");
        return -1;
    }
    hello_thread_id = createThread_hello();
    if (-EIO == hello_thread_id)
    {
        printk(KERN_CRIT"Failed to create Hello Thread, return %d\n",hello_thread_id);
        sock_release( hello_sock_z->sk_socket );
        return -1;
    }
    printk(KERN_CRIT"Hello Thread ID = %d\n",hello_thread_id);

    printk(KERN_CRIT"********* Creating Bye Thread *********\n");
    bye_sock_z = netlink_kernel_create( NETLINK_BYE, Uplane_receive_bye);
    if (NULL == bye_sock_z)
    {
        printk(KERN_CRIT"Failed to create Bye Netlink Socket .. Exiting \n");
        cleanup_module();
        return -1;
    }
    bye_thread_id = createThread_bye();
    if (-EIO == bye_thread_id)
    {
        printk(KERN_CRIT"Failed to create Hello Thread, return %d\n", bye_thread_id);
        cleanup_module();
        return -1;
    }
    printk(KERN_CRIT"Bye Thread ID = %d\n",bye_thread_id);

    register_IPv4_in_hook();

    return 0;
   
}
void cleanup_module(void)
{
    if (hello_thread_id != 0)
    {
        kill_proc(hello_thread_id, SIGKILL, 1);
        msleep(10L);
        printk(KERN_CRIT"Thread Id [%d] killed ....",hello_thread_id);
        hello_thread_id = 0;
    }
    if (bye_thread_id != 0)
    {
        kill_proc(bye_thread_id, SIGKILL, 1);
        msleep(10L);
        printk(KERN_CRIT"Thread Id [%d] killed ....",bye_thread_id);
        bye_thread_id = 0;
    }
    if (hello_sock_z)
    {
        sock_release( hello_sock_z->sk_socket );
        hello_sock_z = NULL;
    }
    if (bye_sock_z)
    {
        sock_release( bye_sock_z->sk_socket );
        bye_sock_z = NULL;
    }
    if (nf_registered_hello_ipv4)
    {
        nf_unregister_hook(&hello_IPv4_inHook_ops);
        nf_registered_hello_ipv4 = 0;
    }
    if (nf_registered_bye_ipv4)
    {
        nf_unregister_hook(&bye_IPv4_inHook_ops);
        nf_registered_bye_ipv4 = 0;
    }
}
void register_IPv4_in_hook()
{
    if (!nf_registered_hello_ipv4)
    {
        nf_register_hook(&hello_IPv4_inHook_ops);
        nf_registered_hello_ipv4 = 1;
    }
    if (!nf_registered_bye_ipv4)
    {
        nf_register_hook(&bye_IPv4_inHook_ops);
        nf_registered_bye_ipv4 = 1;
    }
}
MODULE_AUTHOR("Sudhansu Mishra");
MODULE_LICENSE("GPL");
 
/* **************************** Make File ***********************************/
obj-m += Netlink_Kernel_thread.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
 
/* ********************** User Space Program ***************************/

/*
    *************************************************
    * Author             : Sudhansu Sekhar Mishra
    * Module            : User Space Based Netlink Socket
    * OS version       : 2.6.9-55ELSmp
    * Type                 : Open Source(GPL)
    * E_mail             : sudhansu.8454@gmail.com
    * File Name        : Netlink_test.c
    *************************************************
*/
#include <asm/types.h>
#include <sys/socket.h>
#include <linux/netlink.h>


#define NETLINK_TEST 19 //Hello
#define MAX_PAYLOAD 2048 /* maximum payload size*/

struct sockaddr_nl src_addr, dest_addr;
struct nlmsghdr *nlh = NULL;
struct iovec iov;
int sock_fd;
struct msghdr msg;

void main() {
        sock_fd = socket(PF_NETLINK, SOCK_RAW,NETLINK_TEST);

        memset(&src_addr, 0, sizeof(src_addr));
        src_addr.nl_family = AF_NETLINK;
        src_addr.nl_pid = getpid(); /* self pid */
        src_addr.nl_groups = 0; /* not in mcast groups */
        bind(sock_fd, (struct sockaddr*)&src_addr,
                        sizeof(src_addr));

        memset(&dest_addr, 0, sizeof(dest_addr));
        dest_addr.nl_family = AF_NETLINK;
        dest_addr.nl_pid = 0;   /* For Linux Kernel */
        dest_addr.nl_groups = 0; /* unicast */

        nlh=(struct nlmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));
        /* Fill the netlink message header */
        nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);
        nlh->nlmsg_pid = getpid(); /* self pid */
        nlh->nlmsg_flags = 0;
        /* Fill in the netlink message payload */
        strcpy(NLMSG_DATA(nlh), "Sudhansu : User space !!!");

        iov.iov_base = (void *)nlh;
        iov.iov_len = nlh->nlmsg_len;
        msg.msg_name = (void *)&dest_addr;
        msg.msg_namelen = sizeof(dest_addr);
        msg.msg_iov = &iov;
        msg.msg_iovlen = 1;

        sendmsg(sock_fd, &msg, 0);

        while(1){
            memset(nlh, 0, NLMSG_SPACE(MAX_PAYLOAD));
            recvmsg(sock_fd, &msg, 0);
            printf("\n");
            printf("Received data from  Kernel : [%s]", NLMSG_DATA(nlh));
            printf("\n");
        }

        /* Close Netlink Socket */
        close(sock_fd);
}

8 comments:

  1. Good Article.Thanks for the info...

    But have a doubt?

    Why should I use netlink socket instead of ioctl.What is the basic difference between the two?

    Looking forward for an article on ioctl mechanism?

    ReplyDelete
  2. Hi reader,
    Thanks for your valuable time and feedback.

    The basic difference between ioctl and netlink socket is, ioctl handlers cannot send asynchronous messages to user space from kernel space, where as the netlink socket can do. Apart from that below are some disadvantages of ioctl.

     Drivers add another level of indirection and use a single ioctl number for multiple purposes, adding extra complexity in debugging and emulation
     The ioctl number range is not properly assigned and registered in documentation/ioctl-number.txt. This can lead to identical numbers assigned to different subsystems, which complicates debugging and 32 bit emulation.
     The data structures are defined in private headers so that the ’strace’ tool does not recognize them, even when it is built against the latest kernel headers.
     The IO/ IOW/ IOR macros are used incorrectly, meaning that the size of the data is not encoded properly.
     Drivers add another level of indirection and use a single ioctl number for multiple purposes, adding extra complexity in debugging and emulation.
    Very soon you find an article on ioctl system calls. Keep reading and updating your feedbacks.

    Cheers
    Sudhansu

    ReplyDelete
  3. Its indeed a very nice compilation and great study.. keep up great work.

    ReplyDelete
  4. Really a nice article on Netlink socket. I just need some more information on "nlmsg_flags" field of netlink mesaage header. I have seen in the tester program that you are setting it to zero. What does it signify?

    ReplyDelete
  5. where do you free memory for following call in userspace program:
    nlh=(struct nlmsghdr *)malloc(NLMSG_SPACE(MAX_PAYLOAD));

    Please describe.

    ReplyDelete
  6. Hi,
    I am using a CentOS Linux release 7.0.1406 (Core)
    I am using kernel version 3.10.0-123.el7.x86_64

    When i am creating a netlink socket,it is getting failed.
    int sock_fd = socket(PF_NETLINK, SOCK_RAW, 1)
    value of sock_fd is -1


    When i was using 3.1 kernel version of CentOS,above socket function was working.
    I want to create netlink socket on kernel version 3.10.0


    Any help will be greatly admired.

    ReplyDelete
    Replies
    1. I am getting "Unable to create netlink socket — protocol not supported"

      Delete
  7. Hi the below code to create a netlink socket works with Ubuntu 14.04 kernel.

    int soc = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);

    But the same when I used with one of the other embedded platform (arm based cisco router platform), getting "Protocol not supported error"

    What might be the issue?

    ReplyDelete