这一次央视的屠刀又批向了谷歌中国。对于央视乃至Z*F最近一连串的所谓过滤或屏蔽对儿童不良信息的举措我实在是觉得有矫枉过正之嫌。尽管我一直认为谷歌中国和美国的Google不在同一个数量级,但还是对他们的”搜索自动补全“进行了一些比较:
谷歌中国的自动补全
Google的自动补全
可见, Google英文搜索比中文搜索生猛的多。为过滤不良信息,上图中的生词,还是自己查一下吧,我就不 翻译了。
在谷歌中国输入”s*ex“时的提示
我们的生活越来越倚重于Internet,而Internet也成为现实社会的一种反映,如何在信息泛滥、良莠不齐的网络生活中保护儿童健康快乐的成长,这是全世界共同面临的一个教育难题。而现在相关部门却把它简化为一个技术问题,未免本末倒置。
这让我不仅联想起大禹治水的故事。禹的父亲鲧采用”堵“的方法,洪水不但不退反尔愈发泛滥,以失败而告终;禹治水采用”疏“的方法对洪水加以引导,终于治水成功。所谓”不良信息“,不也和洪水一样吗?小孩子终归要长大,总有洞房花烛的那一天,在他们对X好奇的时候,是大人们想堵就能堵的住的吗?
再者说来,小孩子绝对不像大人们想象的那么脆弱,我认识的很多人都是出污泥而不染的,我相信下一代也可以濯清涟而不妖。
如果要在UNIX/Linux下查看某个进程的详细参数, 可以用命令:
ps -axuww|grep <process>
但需要注意的是,在Solaris下,存在两个版本的ps:一是/usr/bin/ps;一是/usr/bin/ps。只有/usr/bin/ps可以用于上述命令。
"/usr/ucb/ps" 是从BSD UNIX中继承而来的,而/usr/bin/ps从AT&T SVR4继承而来的。二者之间存在较大区别, 如果有些
参数在某个ps中不能应用,可以考虑换用另外一个。
本文参考了如下网页:
http://www.sunmanagers.org/pipermail/summaries/2006-March/007182.html
在美国体验猪流感
赴美之前,很多人就提醒我美国的猪流感正流行,而且我所要去的Illinois更是重灾区。我自己也很担心,尤其是当中国也有了猪流感疑似病例之后。
公司正在逐渐把server升级为Solaris 10, 对于新平台我非常陌生(当然了,对于以前的Solaris 9也不怎么懂),遇到了几个值得记录下来的问题。
1. $HOME目录几乎所有文件(夹)的owner和group都被改为nobody
昨天在工作时遇到了问题,自己无法解决,求助于同事。同事帮忙查了半天,发现了我的$HOME下的所有的文件和文件夹的owner和group都被改为nobody了。我们都很诧异为什么会出现这种情况,求助于公司的IT,IT们说这是因为我们部门有两个NFS storage,其中一个和NFS的版本和Solaris 10兼容,另外一个不兼容。而我的$HOME恰好就在不兼容的Storage上,结果权限就变了。
今天上午IT把这个问题FIX了,但我对于NFS还是不清楚,对于权限如何变的也不知道。有时间再去仔细研究NFS的问题。
2. ld.so.1: cp:fatal: libsec.so.1: open failed: No such file or directory
ld.so.1: cp: fatal: libsec.so.1: version `SUNW_1.2' not found (required by file /usr/bin/cp)
这是一个动态链接库的问题。这个问题Google了一下发现很多人也有类似的问题,但没有看到如何解决。貌似
Sun推出了一个patch来解决这个问题。但具体情况不了解。
前几天早晨遇到了这个问题,和同事查了一上午也没发现原因,结果当我们绝望的时候这个问题自己消失了。莫名其妙。
但这个问题估计在往后系统重启以后还会发生。
<SCRIPT language=javascript> ax=0; function writeTable() { ax=Math.round(Math.random()*26); alphaArray=new Array("a", "n", "b", "d", "f", "h", "{", "i", "l", "v", "x", "z", "I", "J", "M", "N", "o", "O", "R", "S", "T", "U", "m", "6", "^", "u", "_", "[", "]") table="<table border=0 cellspacing=1 cellpadding=1 width='100%'><tr>" j=1; for ( i = 99 ; i >= 0 ; i-- ) { a=Math.round(Math.random()*26); if ( i%9 == 0 && i < 89 ) a=ax; table+="<td class='numtd'>"+i+"</td><td class='symtd'>"+alphaArray[a]+"</td>" if ( j%10 == 0 ) table+="</tr><tr>" j++ } table+="</table>" sym.innerHTML=table sh.innerHTML="" } function showAnswer() { sh.innerHTML=alphaArray[ax] sym.innerHTML="<span class=big><a href="javascript:writeTable()" mce_href="javascript:writeTable()" class=hot>Try Next 再试一次</a></font></span><br><br>" } </SCRIPT>
看了又看也没发现这段代码有何神奇之处,无非就是打印输出表格,用到了随机数改变输出的图像顺序而已。
如果用户熟悉Linux下的sed、awk、grep或vi,那么对正则表达式这一概念肯定不会陌生。由于它可以极大地简化处理字符串时的复杂度,因此现在已经在许多Linux实用工具中得到了应用。千万不要以为正则表达式只是Perl、Python、Bash等脚本语言的专利,作为C语言程序员,用户同样可以在自己的程序中运用正则表达式。
标准的C和C++都不支持正则表达式,但有一些函数库可以辅助C/C++程序员完成这一功能,其中最著名的当数Philip Hazel的Perl-Compatible Regular Expression库,许多Linux发行版本都带有这个函数库。
为了提高效率,在将一个字符串与正则表达式进行比较之前,首先要用regcomp()函数对它进行编译,将其转化为regex_t结构:
int regcomp(regex_t *preg, const char *regex, int cflags);
参数regex是一个字符串,它代表将要被编译的正则表达式;参数preg指向一个声明为regex_t的数据结构,用来保存编译结果;参数cflags决定了正则表达式该如何被处理的细节。
如果函数regcomp()执行成功,并且编译结果被正确填充到preg中后,函数将返回0,任何其它的返回结果都代表有某种错误产生。
一旦用regcomp()函数成功地编译了正则表达式,接下来就可以调用regexec()函数完成模式匹配:
int regexec(const regex_t *preg, const char *string, size_t nmatch,regmatch_t pmatch[], int eflags);
typedef struct {
regoff_t rm_so;
regoff_t rm_eo;
} regmatch_t;
参数preg指向编译后的正则表达式,参数string是将要进行匹配的字符串,而参数nmatch和pmatch则用于把匹配结果返回给调用程序,最后一个参数eflags决定了匹配的细节。
在调用函数regexec()进行模式匹配的过程中,可能在字符串string中会有多处与给定的正则表达式相匹配,参数pmatch就是用来保存这些匹配位置的,而参数nmatch则告诉函数regexec()最多可以把多少个匹配结果填充到pmatch数组中。当regexec()函数成功返回时,从string+pmatch[0].rm_so到string+pmatch[0].rm_eo是第一个匹配的字符串,而从string+pmatch[1].rm_so到string+pmatch[1].rm_eo,则是第二个匹配的字符串,依此类推。
无论什么时候,当不再需要已经编译过的正则表达式时,都应该调用函数regfree()将其释放,以免产生内存泄漏。
void regfree(regex_t *preg);
函数regfree()不会返回任何结果,它仅接收一个指向regex_t数据类型的指针,这是之前调用regcomp()函数所得到的编译结果。
如果在程序中针对同一个regex_t结构调用了多次regcomp()函数,POSIX标准并没有规定是否每次都必须调用regfree()函数进行释放,但建议每次调用regcomp()函数对正则表达式进行编译后都调用一次regfree()函数,以尽早释放占用的存储空间。
如果调用函数regcomp()或regexec()得到的是一个非0的返回值,则表明在对正则表达式的处理过程中出现了某种错误,此时可以通过调用函数regerror()得到详细的错误信息。
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
参数errcode是来自函数regcomp()或regexec()的错误代码,而参数preg则是由函数regcomp()得到的编译结果,其目的是把格式化消息所必须的上下文提供给regerror()函数。在执行函数regerror()时,将按照参数errbuf_size指明的最大字节数,在errbuf缓冲区中填入格式化后的错误信息,同时返回错误信息的长度。
最后给出一个具体的实例,介绍如何在C语言程序中处理正则表达式。
#include <stdio.h>; #include <sys/types.h>; #include <regex.h>; /* 取子串的函数 */ static char* substr(const char*str, unsigned start, unsigned end) { unsigned n = end - start; static char stbuf[256]; strncpy(stbuf, str + start, n); stbuf[n] = 0; return stbuf; } /* 主程序 */ int main(int argc, char** argv) { char * pattern; int x, z, lno = 0, cflags = 0; char ebuf[128], lbuf[256]; regex_t reg; regmatch_t pm[10]; const size_t nmatch = 10; /* 编译正则表达式*/ pattern = argv[1]; z = regcomp(®, pattern, cflags); if (z != 0){ regerror(z, ®, ebuf, sizeof(ebuf)); fprintf(stderr, "%s: pattern '%s' /n", ebuf, pattern); return 1; } /* 逐行处理输入的数据 */ while(fgets(lbuf, sizeof(lbuf), stdin)) { ++lno; if ((z = strlen(lbuf)) > 0 && lbuf[z-1] == '/n') lbuf[z - 1] = 0; /* 对每一行应用正则表达式进行匹配 */ z = regexec(®, lbuf, nmatch, pm, 0); if (z == REG_NOMATCH) continue; else if (z != 0) { regerror(z, ®, ebuf, sizeof(ebuf)); fprintf(stderr, "%s: regcom('%s')/n", ebuf, lbuf); return 2; } /* 输出处理结果 */ for (x = 0; x < nmatch && pm[x].rm_so != -1; ++ x) { if (!x) printf("%04d: %s/n", lno, lbuf); printf(" %d='%s'/n", x, substr(lbuf, pm[x].rm_so, pm[x].rm_eo)); } } /* 释放正则表达式 */ regfree(®); return 0; }
unset ?-nocomplain? ?--? varName varName2 ...
8. The existence of a variable can be tested with the info exists command.
9. You should always group expressions in curly braces and let expr do command and variable substitutions.
10. In TCL, the # must occur at the beginning of a command.
11. A surprising property of Tcl comments is that curly braces inside comments are still counted for the purposes of finding matching brackets.
12. Command arguments are separated by white space, unless arguments are grouped with curly braces or double quotes as described below. If you forget the space, you will get syntax errors about unexpected characters after the closing brace or quote.
13. The Tcl shells pass the command-line arguments to the script as the value of the argv variable. The number of command-line arguments is given by the argc variable. The name of the program, or script, is not part of argv nor is it counted by argc. Instead, it is put into the argv0 variable. argv is a list, so you can use the lindex command to extract items from it:
set arg1 [lindex $argv 0]
14. Some command-line options are interpreted by wish, and they do not appear in the argv variable. The general form of the wish command line is:
wish ?options? ?script? ?arg1 arg2?
If no script is specified, then wish just enters an interactive command loop.
15. You can use the info script command to find out where the CGI script is, and from that load the supporting file. The file dirname and file join commands manipulate file names in a platform-independent way.
set dir [file dirname [info script]]
set datafile [file join $dir guestbook.data]
16. You can ask string for valid operations by giving it a bad one:
string junk
=> bad option "junk": should be bytelength, compare, equal, first, index, is, last, length
, map, match, range, repeat, replace, tolower, totitle, toupper, trim, trimleft, trimright,
wordend, or wordstart
17. The string map command translates a string based on a character map. The map is in the form of a input, output list. Wherever a string contains an input sequence, that is replaced with the corresponding output. For example:
string map {f p d l} food
=> pool
The inputs and outputs can be more than one character and they do not have to be the same length:
string map {f p d ll oo u} food
=> pull
18. A position specifier is i$, which means take the value from argument i as opposed to the normally corresponding argument. The position counts from 1. If a position is specified for one format keyword, the position must be used for all of them. If you group the format specification with double quotes, you need to quote the $ with a backslash:
set lang 2
format "%${lang}/$s" one un uno
=> un
19. The binary command provides conversions between strings and packed binary data representations. The binary format command takes values and packs them according to a template. The resulting binary value is returned:
binary format template value ?value ...?
The binary scan command extracts values from a binary string according to a similar template. It assigns values to a set of Tcl variables:
binary scan value template variable ?variable ...?
20. Tcl commands described are: list, lindex, llength,
lrange, lappend, linsert, lreplace,
lsearch, lset, lsort, concat, join,
and split.
21. Tcl commands related to list are: list, lindex, llength, lrange, lappend, linsert, lreplace, lsearch, lset, lsort, concat, join, and split.
22. Procedure
22.1 Default parameter values
proc P2 {a {b 7} {c -2} } {
expr $a / $b + $c
}
P2 6 3
=> 0
22.2 A procedure can take a variable number of arguments by
specifying the args keyword as the last parameter. When the procedure
is called, the args parameter is a list that contains all the remaining
values.
23. The rename command changes the name of a command:
rename foo foo.orig
The other thing you can do with rename is completely remove a command by renaming it to the empty string:
rename exec {}
24. Use the upvar command when you need to pass the name of a variable, as
opposed to its value, into a procedure.
25. Call by Name Using upvar
Use the upvar command when you need to pass the name of a variable, as opposed to its value, into a procedure. The upvar command associates a local variable with a variable in a scope up the Tcl call stack. The syntax of the upvar command is:
upvar ?level? varName localvar
The level argument is optional, and it defaults to 1, which means one level up the Tcl call stack. You can specify some other number of frames to go up, or you can specify an absolute frame number with a #number syntax. Level #0 is the global scope.
26. You can use any string as an index for array, but avoid using a string that contains spaces.
27. The array get and array set operations are used to convert between an array and a list. The list returned by array get has an even number of elements. The first element is an index, and the next is the corresponding array value. The list elements continue to alternate between index and value. The list argument to array set must have the same structure.
28.
PERL Handling Binary Scalar
I adopt select()...sysread()...syswrite() mechanism to handle socket messages, where messages are sysread() into $buffer (binary) before they are syswritten.
Now I want to change two bytes of the message, which denote the length of the whole message. At first, I use following code:
my $msglen=substr($buffer,0,2); # Get the first two bytes
my $declen=hex($msglen);
$declen += 3;
substr($buffer,0,2,$declen); # change the length
However, it doesn't work in this way. If the final value of $declen is 85, then the modified $buffer will be "0x35 0x35 0x00 0x02...". I insert digital number to $buffer but finally got ASCII!
I also tried this way:
my $msglen=substr($buffer,0,2); # Get the first two bytes,binary
$msglen += 0b11; # Or $msglen += 3;
my $msgbody=substr($buffer,2); # Get the rest part of message, binary
$buffer=join("", $msglen, $msgbody);
Sadly, this method also failed. The result is such as"0x33 0x 0x00 0x02..." I just wonder why two binary scalar can't be joined into a binary scalar?
Can you help me? Thank you!
[Answer 1]:
my $msglen=substr($buffer,0,2); # Get the first two bytes
my $number = unpack("S",$msglen);
$number += 3;
my $number_bin = pack("S",$number);
substr($buffer,0,2,$number_bin); # change the length
[Answer 2]:
比如:
my $buffer = "/x00/x01"; print unpack("H*", $buffer), "/n"; vec($buffer, 0, 16) += 3; print unpack("H*", $buffer), "/n";
I am writing a Perl program that sends and receives messages from two sockets and acta as a switch. I have to modify the received messages received from one socket, prepend 3 bytes to the data, and finally send the modified messages to another socket. I adopt select()...sysread()...syswrite() mechanism to poll for messages between sockets. The received messages are stored in $buffer during modification.
Now I can use following way to get the received messages:
my $hexmsg = unpack("H*", $buffer);
my @msg = ( $hexmsg =~ m/../g );
then I can insert 3 bytes to @msg. However, I don't know how to pack the message in @msg into a scalar(such as $buffer) and send it to another socket by syswrite(). Can anybody help me? Thank you in advance!
BTW, are messages in $buffer binary?
[Answer]:
Yes, messages in $buffer are binary (if I'm guessing what you mean by that correctly). If your only reason for unpacking it into @msg is to insert the bytes, don't. Use substr instead, and just write out the changed $buffer. For instance:
substr( $buffer, 0, 0, "/x01/x02/x03" ); # insert 3 bytes at beginning.
If you are doing other things with @msg, you could continue to use that as well as doing the substr insert before writing it out, or you could use substr or pack or split or vec or a regex to parse out the pieces you need. You'd need to describe what you are doing to get more specific help.
################################################################## # DumpMsg # # Dump received message in hex mode. # # Format: DumpMsg($type, $msglen, $msg); # # input: $type= 'MSC' | 'BSC' # $msglen= msg length # $msg= received msg # # output: Print received msg to stderr and log in hex type ################################################################## sub DumpMsg { my $type=$_[0]; my $msglen=$_[1]; CoolMisc::coolprint("Received message from $type: $msglen bytes/n" ); LogPrint("Received message from $type: $msglen bytes/n"); # change the received message from bin to hex string. # my @msg=split(/../,unpack("H*", $_[2]),$msglen); my $tmp = unpack("H*", $_[2]); my @msg = ( $tmp =~ m/../g ); # print the message to stdout and log file. for(my $i=0;$i<$msglen;$i++) { print "0x$msg[$i] "; syswrite($ROUTELOG,"0x$msg[$i] "); if ($i%8 ==7) { print "/n"; syswrite($ROUTELOG,"/n"); } } print "/n"; syswrite($ROUTELOG,"/n"); }
As always with Perl there is more than one way to do it. Below are a few examples of approaches to making common conversions between number representations. This is intended to be representational rather than exhaustive.
Some of the examples below use the Bit::Vector module from CPAN. The reason you might choose Bit::Vector over the perl built in functions is that it works with numbers of ANY size, that it is optimized for speed on some operations, and for at least some programmers the notation might be familiar.
Using perl's built in conversion of 0x notation:
|
Using the hex function:
|
Using pack:
|
Using the CPAN module Bit::Vector:
|
Using sprint:
|
Using unpack
|
Using Bit::Vector
|
And Bit::Vector supports odd bit counts:
|
Using Perl's built in conversion of numbers with leading zeros:
|
Using the oct function:
|
Using Bit::Vector:
|
Using sprintf:
|
Using Bit::Vector
|
Perl 5.6 lets you write binary numbers directly with the 0b notation:
|
Using pack and ord
|
Using pack and unpack for larger strings
|