Discussion:
[Mono-list] Support for marshalling of C# string to unmanaged wchar_t on Linux
Kala B
2004-12-16 13:12:02 UTC
Permalink
Hi,
Does mono support marshalling of C# string to
unmanaged wchar_t on Linux?

It does not seem to work. Consider the following
sample code,
( contents of 3 files )
1. chash.cs
This C# code makes a call to a C API, which takes a
wchar_t[].
2. testlib.c
This is the C code which implements the C API.
3. makefile
to build the library and C# exe.


Contents of chash.cs
--------------------
using System;
using System.Runtime.InteropServices;
using System.Text;
using System.Collections.Specialized;

[StructLayout(LayoutKind.Sequential, CharSet =
CharSet.Unicode)]
public class Id
{
public int len;
[MarshalAs(UnmanagedType.ByValTStr,
SizeConst = 256)]
public string name;
};
class Test
{
[DllImport("libtest.so")]
public static extern int TestFn (Id id);

public static void Main()
{
string name = "testname";
Id id = new Id();
id.name = name;
id.len = 8; //name.Length();
int rcode = TestFn(id);
Console.WriteLine("TestFn returns {0}",rcode);
}
}

Contents of testlib.c
---------------------
#include <stdio.h>
#include <wchar.h>

typedef struct _id
{
int len;
wchar_t name[256];
}Id;

int TestFn(Id *id)
{
printf("%s\t%S\t%d\n",__func__,id->name,id->len);
printf("wcslen returns.. %d\n",strlen(id->name));
return 1;
}

makefile
--------
all: lib test
lib: testlib.c
gcc -fPIC -c testlib.c
ld -x --shared -o libtest.so testlib.o
test:
mcs -debug chash.cs
clean :
rm ./libtest.so
rm ./chash.exe

When chash.exe is run, it prints some junk characters.
Even wcslen() does not print the expected output.

Could you please help?
If marshalling is not supported, could you please
suggest some alternate solution to solve the issue?

Thanks & Regards
Kala B.



________________________________________________________________________
Yahoo! India Matrimony: Find your life partner online
Go to: http://yahoo.shaadi.com/india-matrimony
Jonathan Pryor
2004-12-17 02:22:10 UTC
Permalink
Your code is buggy, but I'll tackle it anyway. :-)
Post by Kala B
Hi,
Does mono support marshalling of C# string to
unmanaged wchar_t on Linux?
No. Actually, I was surprised it ran at all (I was expecting a
g_assert_not_reached() message), but once I ran it, the problem became
clear: When Mono marshals to CharSet.Unicode, it doesn't marshal to a
Unix wchar_t, it marshals to a Windows wchar_t.

Wchar_t on most Unix platforms is 4 bytes (32-bits), while it's 2 bytes
(16-bits) on Windows. This is easy for Mono to do (as it stores all
strings as 16-bit Unicode strings internally), and difficult for lots of
other people.

Though wchar_t has enough problems in Unix/Linux that I've been told to
avoid it. It's more trouble than it's worth -- sticking with UTF-8 is
far easier to do. (Then there's the immortal question of how to
portably convert between wchar_t* and char*. You could use wcstombs,
but the standard doesn't say what encoding it'll use, which makes it
nearly useless for most practical purposes...)
Post by Kala B
It does not seem to work. Consider the following
sample code,
( contents of 3 files )
1. chash.cs
This C# code makes a call to a C API, which takes a
wchar_t[].
2. testlib.c
This is the C code which implements the C API.
3. makefile
to build the library and C# exe.
<snip/>
Post by Kala B
Contents of testlib.c
---------------------
#include <stdio.h>
#include <wchar.h>
typedef struct _id
{
int len;
wchar_t name[256];
Change "name" to the following and things work better:

unsigned char name[256];
Post by Kala B
}Id;
int TestFn(Id *id)
{
printf("%s\t%S\t%d\n",__func__,id->name,id->len);
This printf is broken -- %S isn't valid. %ls is the correct
standardized way to print a wchar_t string, but since name isn't a
wchar_t[] it won't work anyway. So we'll change it to make it nicer:

printf ("%s\t%d\n", __func__, id->len);
for (int i = 0; i < id->len; ++i)
printf ("\t%.3i: %c\n", i, (char) id->name[i]);
Post by Kala B
printf("wcslen returns.. %d\n",strlen(id->name));
Besides, you're not even using wcslen here, you're using strlen here.
OF COURSE it'll return "1" -- it'll hit the "null" embedded in the first
wide character.
Post by Kala B
Could you please help?
If marshalling is not supported, could you please
suggest some alternate solution to solve the issue?
Alternate solution: Use UTF-8. Lots of libraries support it, it
requires minimal changes and support, and most other libraries/platforms
are migrating toward it (see Gnome and KDE). The only reason to not use
UTF-8 is portability with Windows, which makes it easier to use the
WCHAR type, but Windows still supports UTF-8 in its conversion
functions.

- Jon
Jonathan Pryor
2004-12-17 11:48:54 UTC
Permalink
Corrections...

On Thu, 2004-12-16 at 21:22 -0500, Jonathan Pryor wrote:
<snip/>
Post by Jonathan Pryor
Post by Kala B
Contents of testlib.c
---------------------
#include <stdio.h>
#include <wchar.h>
typedef struct _id
{
int len;
wchar_t name[256];
unsigned char name[256];
This should be "unsigned short", not "unsigned char", obviously.

<snip/>
Post by Jonathan Pryor
Post by Kala B
printf("wcslen returns.. %d\n",strlen(id->name));
Besides, you're not even using wcslen here, you're using strlen here.
OF COURSE it'll return "1" -- it'll hit the "null" embedded in the first
wide character.
Thinking about it, it'll be 0 or 1, depending on the endian-ness of your
platform. The character 'p' will be 0x0070, so on little-endian
architectures strlen will return 1 while it will return 0 on big-endian
architectures. Silly me.

- Jon
Kala B
2004-12-17 12:59:32 UTC
Permalink
Hi,
Thanks for your suggestions.

I had actually used wsclen() in the sample code. I was
trying with strlen( and passing the string as
Charset.Auto) as well as wcslen( and passing the
string as Charset.Unicode). So, it was a copy-paste
problem. Sorry about that.

Thanks again,
Regards
Kala B.
Post by Jonathan Pryor
Corrections...
On Thu, 2004-12-16 at 21:22 -0500, Jonathan Pryor
<snip/>
Post by Jonathan Pryor
Post by Kala B
Contents of testlib.c
---------------------
#include <stdio.h>
#include <wchar.h>
typedef struct _id
{
int len;
wchar_t name[256];
Change "name" to the following and things work
unsigned char name[256];
This should be "unsigned short", not "unsigned
char", obviously.
<snip/>
Post by Jonathan Pryor
Post by Kala B
printf("wcslen returns..
%d\n",strlen(id->name));
Post by Jonathan Pryor
Besides, you're not even using wcslen here, you're
using strlen here.
Post by Jonathan Pryor
OF COURSE it'll return "1" -- it'll hit the "null"
embedded in the first
Post by Jonathan Pryor
wide character.
Thinking about it, it'll be 0 or 1, depending on the
endian-ness of your
platform. The character 'p' will be 0x0070, so on
little-endian
architectures strlen will return 1 while it will
return 0 on big-endian
architectures. Silly me.
- Jon
_______________________________________________
http://lists.ximian.com/mailman/listinfo/mono-list
________________________________________________________________________
Yahoo! India Matrimony: Find your life partner online
Go to: http://yahoo.shaadi.com/india-matrimony

Loading...