2010年6月27日星期日

HOWTO BUILD A CROSS COMPILER WITH EMERGE !

HOWTO BUILD A CROSS COMPILER WITH EMERGE ! ------------------------------------------------------------ Before you read any further, you should try using 'crossdev' in portage. You're less likely to screw something up that way. Simply do `emerge '>=crossdev-0.9'` and then execute `crossdev --help`. The insturctions below are for those who wish to learn more about the details of how crossdev works wrt portage. You could probably give two shits, so just use crossdev ! ------------------------------------------------------------ ------------- PACKAGE NOTES ------------- - binutils Only binutils versions that 'inherit toolchain-binutils' will work. The host system should also being using one of these versions or you'll probably break your system. - linux-headers Any version that has 'inherit kernel-2' should work. - gcc Only gcc versions that 'inherit toolchain' will work. Supported versions: 3.3.6+ 3.4.6+ 4.0.3+ 4.1.1+ Anything else you're on your own. - glibc Supported versions: 2.3.6-r4+ 2.4-r3+ Anything else you're on your own. ------------- GENERAL NOTES ------------- - ARCH / USE sometimes you may have to pass ARCH / USE to the emerge process for things to work properly. For example, glibc will only apply hppa patches if 'hppa' is in ARCH / USE. This is a bug and I am working on fixing it :). - CFLAGS All packages should be built with your HOST's CFLAGS except for glibc. When building glibc, change your CFLAGS to something for the target host. ------------- ON TO THE FUN ------------- (1) Pick your target. Any valid 'CHOST' will do. I will be using 'hppa2.0-unknown-linux-gnu'. (2) Setup 'categories'. $ mkdir -p /etc/portage $ echo cross-hppa2.0-unknown-linux-gnu >> /etc/portage/categories (3) Setup PORTDIR_OVERLAY. For more info, see make.conf(5) manpage. My overlay is /usr/local/portage. $ mkdir -p /usr/local/portage/cross-hppa2.0-unknown-linux-gnu $ cd /usr/local/portage/cross-hppa2.0-unknown-linux-gnu $ ln -s /usr/portage/sys-devel/binutils binutils $ ln -s /usr/portage/sys-devel/gcc gcc $ ln -s /usr/portage/sys-kernel/linux-headers linux-headers $ ln -s /usr/portage/sys-libs/glibc glibc (4) Emerge binutils. $ emerge cross-hppa2.0-unknown-linux-gnu/binutils (5) Emerge gcc [C compiler only]. $ USE=nocxx emerge cross-hppa2.0-unknown-linux-gnu/gcc < run gcc-config for the new compiler > (6) Emerge linux headers. $ emerge cross-hppa2.0-unknown-linux-gnu/linux-headers (7) Emerge glibc. $ USE=hppa emerge cross-hppa2.0-unknown-linux-gnu/glibc (8) Emerge gcc [C and C++ compiler]. $ emerge cross-hppa2.0-unknown-linux-gnu/gcc (9) Build stuff ! When you go to emerge packages, you should just need to do: ROOT=/blah/ CHOST=hppa2.0-unknown-linux-gnu emerge cpio - you should also set CBUILD to your building machine for completeness sake ... most of the time you won't need it, but sometimes you will :) - CTARGET is no longer needed so don't set it ! If you want to cross compile a kernel, do this: make ARCH=hppa CROSS_COMPILE=hppa2.0-unknown-linux-gnu-

2010年6月17日星期四

49个瓶子你选哪个？很准的心理测试~

发信人: lala (这只是个ID), 信区: Picture
标题: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:06:10 2010), 站内

RT.
--

※ 来源:・水木社区 http://newsmth.net・[FROM: 114.249.221.*]

此主题相关图片如下：瓶子.jpg (45KB)

[本篇全文] [本篇作者：lala] [进入讨论区] [返回顶部]

发信人: lala (这只是个ID), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:07:25 2010), 站内

【0号瓶】灵性解救瓶

你是一个神秘的人，与上天有所连结。你可能有很强的第六感或者灵通的能力。代表这个瓶子的塔罗牌是“傻瓜”，代表你有一种天真与信任的质量，可以处于浑沌与混乱的情况中。

你的困难与挑战：目前处于恼怒，混乱的状况。你想要为你的生命找一个出口，但是却无能为力。现在的你，就像处于最深的黑夜。危机就是转机，转个身，最大的平静就隐藏在你背后。

你的未来潜能：是一个情绪平衡的人，探寻灵性方面的真理。

【1号瓶】身体解救瓶

你是一个多才多艺，有很丰富创造力的人。但是你习惯将你自己内在的彩虹隐藏起来。代表这个瓶子的塔罗牌是“魔术师”，代表你是一个善于沟通，有创造力的人。

你的困难与挑战：你习惯将你自己的光隐藏起来，虽然你有丰富的内涵，但你无法跟人分享。长久的隐藏，造成你有习惯性的忧郁。

你的未来潜能：知道他或她的理想，而且能够去实现它们。

【2号瓶】和平瓶

你是一个爱好和平的人，同时拥有很强的直觉力。你的出现，可以抚慰周遭人的心灵。这一瓶像是海洋，所以也暗示了你喜欢接近海洋，海洋总可以带给你平静与放松。代表这一瓶的塔罗牌是“女祭司”，代表你的感受很敏锐，可以体察到很细微的事物，可以连结上天。

你的困难与挑战：你有口难言吗？你有太多的话藏在你的心中吗？因为有太多话没有被说出来，所以你无法得到内心的平静。

你的未来潜能：喜爱和平，跟自己很和谐地相处，承诺要追求和平。善用与喉咙有关的创造力（譬如跟演说有关），支持别人。

【3号瓶】心轮瓶

称为心轮瓶，也是双鱼座的瓶子，所以你是以你的心与感觉来过生活。你的心很柔软，很有爱心，很能站在对方的立场想。代表这一瓶的塔罗牌是“女皇”，代表你习惯照顾他人，具有母爱的特质。

你的困难与挑战：你觉得无法呼吸吗？觉得胸部没有多余的空间让新鲜的空气进入。你最近觉得心受伤吗？或者，你长久以来都觉得自己的心一直都没有被照顾到，没有被疼惜到呢？第二瓶选到这一瓶的你，真的该好好连结你的心，让心中的悲伤释放出来。

你的未来潜能：具有直觉力。能够帮助别人找到他们的方向。具有成为艺术家、老师、和治疗师的能力。

【4号瓶】太阳瓶

4号称为太阳瓶，绰号叫做智能瓶。所以你是一个很阳光的，很有智能的人。你很有创造力，喜欢自己动手作一些东西。对你来说，身体最重要的部位是你的胃，胃掌管你的自信，力量与欲望。代表这个瓶子的塔罗牌是“皇帝”，代表你是一个有领导力。

你的困难与挑战：你是一个过度理性的人，你的理性扼杀了你的感情。你重视目标，忽略了过程，这使得你的人生看似充满希望，但欲望的追求让你的生命像是阳光过剩的沙漠，生活紧凑的步调让你的胃肠常常对你发出警讯。

你的未来潜能：在管理和组织方面具有权威和很好的才能，将这些才能跟智能结合。能够知道来自前世的知识，并且能够利用这些知识。

【5号瓶】日出…日落

5号是狮子座的瓶子，代表你是一个具有领导能力以及王者之风的人。你知道如何善用你所具有的能量，你常带给周遭的人阳光与热情，也知道如何要到你要的东西。代表这个瓶子的塔罗牌是“教皇”，代表你很有智能，透过经验而学习，不是透过书本等死的知识来学习。

你的困难与挑战：你害怕生存不下去吗？你缺乏自信吗？或者你为了掩饰自己生存的恐惧以及自卑，所以你试着掌控你的生活，你的朋友，你的情人，你的下属。为了生存，你常紧张得胃痛。

你的未来潜能：具有很多能量，而且能够将这些能量都表达出来。有活力，而且具有个人特质。

【6号瓶】能量瓶

6号称为能量瓶，也是金牛座的瓶子，所以你很实际，很务实，重视物质生活，也具有很多的能量。代表这个瓶子的塔罗牌是“爱人”，这代表“爱”是你生命重要的课题，你可以了解到真正的爱不只是性，还包括你的心与灵性的爱。

你的困难与挑战：挫折感是你现在的生命主题。你觉得心中有很多的愤怒与无奈，使得你缺乏生命力。你真的需要为你的内在找一个出口，让你内在的能量释放。可能是长久以来不被接受的愤怒，可能是工作上的挫折，可能是长久以来压抑的性……

你的未来潜能：是一个很勇敢的人，在逆境中也准备好要去爱。

【7号瓶】耶稣被出卖而被捕的花园

你是一个很清楚自己要什么，同时你也有能力去要到你要的东西。同时你也是一个平衡的人，凡事不容易走到极端。代表这个瓶子的塔罗牌是“战车”，代表你的男性能量与女性能量很平衡，你很有力量，可以让事情发生。

你的困难与挑战：你最近在担心钱吗？你可能怀疑自己有没有能力可以赚到钱。你可能缺乏自信，对自己有所怀疑，并且因此而紧张焦虑得胃痛。

你的未来潜能：新时代的先驱，是一个了解人类的苦难和需求的理想家、哲学家。

【8号瓶】埃及的犬头神

8号瓶叫做Anubis，Anubis是埃及的犬头神，专门守护进入无意识世界的大门。所以你的灵魂本质跟埃及是有连结的。你常用心中的一把尺去衡量外界与其他的人，同时，你也很重视平衡。这也让你哲思当中，思考什么是真正的价值与平衡。代表这个瓶子的塔罗牌是“调适”或者”正义”，代表你是一个平衡的人，具有内省与向内看自己的能力。

你的困难与挑战：平衡是你目前生命中最重要的课题，凡事太过犹不及皆会带来问题。或许你最近时而忧郁时而亢奋，但始终找不到内在真正的平静。或许你积压了太多事情没有表达，或许你已经很久没有跟人说出你内心真正的话。

你的未来：对时间、平衡、公正有很好的感知；有条理，且对平等有特殊的认同（对所有的人，所有的宗教传统等等）。

【9号瓶】水晶洞…心内之心

你是一个喜欢接近大自然的人，山中海边是你徜徉的地方。你习惯独处，过自己的生活。你也是一个很具有“心能量”的人，你可以连结你的心，你也可以透过你的心来表达，所以你的话语具有感动人的力量。

你的困难与挑战：你能自在地表达你的感情吗？我猜不能。这造成你常常胸闷，心肺功能欠佳。有多久没有找朋友好好地说说话了？有多久没有放开心胸跟人表达了？你的心是不是被严肃的头脑捆绑住了，使你忘了如何游戏？使你忘了你内在幽默的一面？

你的未来潜能：是一个不自私的人，与自己的潜意识和内在声音都有连结。发现生命中隐藏的秘密并能诠释它们。

【在 lala (这只是个ID) 的大作中提到: 】
: RT.

--

※ 来源:・水木社区 http://newsmth.net・[FROM: 114.249.221.*]

[本篇全文] [本篇作者：lala] [进入讨论区] [返回顶部]

发信人: lala (这只是个ID), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:08:28 2010), 站内

【10号瓶】去拥抱一棵树

你有一颗好心肠，很有爱，很柔软，很慈悲。跟你相处可以使人放松，没有压力。你喜欢自然，喜欢简单。代表这个瓶子的塔罗牌是“命运之轮”，代表你是一个自在、流动的人，可以顺着生命之流而改变。

你的困难与挑战：你常莫名地心悸吗？你会有没来由的恐惧吗？如果你仔细倾听你的心，你会发现你的心在告诉你：我好难过！你积压了太多的感觉在你的心，而你一直没有去看顾心中的感觉。

你的未来潜能：是一个只带领而不掌控对方的人，给予他人成长的空间。打从心底关心人类的问题，认知所需要做的事并将它完成。

【11号瓶】一串花朵

11号是xxxxx座的瓶子，所以你是一个纯洁，单纯，干干净净的一个人，宛如处子一般，像个纯�的小天使。代表这个瓶子的塔罗牌是“爱欲”，代表你对生命很有多的爱与热情。

你的困难与挑战：你常觉得缺乏爱，缺乏温暖吗？你在亲密关系中有困难吗？你有没有想过，你可以更爱你自己一些？你可能是个完美主义者，对自己要求甚高，或者，对别人要求甚高。由于过往的制约，你对自己或别人有很多的期待。

你的未来潜能：是一个强壮、有威力的人，散发出温暖、温柔和强烈的同理心；是一个真正懂得如何去爱的人。

【12号瓶】新时代中的和平

代表这个瓶子的塔罗牌是“倒吊者”，代表你是一个很有韧性的人，面对生命中的挑战与苦难，你总可以将其当作成长的机会。你具有水一般的特质，有弹性，有包容性。

你的困难与挑战：代表这个瓶子的塔罗牌是“倒吊者”，意思是“透过苦难而成长”，所以你会不会觉得自己的生命总不顺遂呢？或许你目前的生命就像处在深海一样，有些忧郁，缺乏阳光，像是找不到出路一般。

你的未来潜能：是一个和平、用“心”的感觉来反应的人。与个人的感觉和本能的智力有很深的连结。

【13号瓶】新时代中的改变

13号是巨蟹座的瓶子，所以你是一个顾家，恋家的人。你很重感情，喜欢一群人在一起的感觉。13号瓶叫做“新时代中的改变”，所以面对改变以及分离的情况时，可以让你真正连结到你的心。代表这个瓶子的塔罗牌是“死神”，代表你作事喜欢干净利落，对于事情可以快刀斩乱麻。

你的困难与挑战：你害怕离别吗？你害怕改变吗？你害怕失落吗？回想一下过往关于离别，死亡，改变与失落的经验，你真的让这些经验过去了吗？还是你的心里仍然留有不舍，悲伤与执着于过去？因为无法放手，所以你的心里留着十年前朋友给你的小纸条，留着小时候的旧衣服，留着前前前男友或女友的照片，留着小学老师跟你说过的一句话……

你的未来潜能�是一个清楚且仁慈地传授知识的领导者，同时也能证实你的远见。

【14号瓶】新时代中的智能

代表这个瓶子的塔罗牌是“艺术”，代表你有能力将不同的事物整合在一起，有能力面对极端的状况。你的灵魂本质带着智能的质量，这种智能是属于新时代的智能，而不是老掉牙的传统与想法，所以你可能有不同于一般人的洞见与智能。

你的困难与挑战：你知道聪明与智能的差别吗？前者来自于外在，譬如书本，知识，别人的建议；后者来在于你的内在以及你自己的生命经验，你可能很聪明但缺乏智能。

你的未来潜能：是一个在新时代中受过启示，非常重要的先锋，并与自己内在智能结合的人，散播和谐与平衡的观念，能感受到深度的喜悦。

【15号瓶】新时代中的治疗

你是一个具有治疗天份的人，这里所说的治疗是属于灵性上的治疗，同时你的直觉力也很强，对于周遭人的心情和能量感受灵敏。代表这个瓶子的塔罗牌是“恶魔”，代表你是一个可以面对现实的人，很实际，面对现实生活中的限制，你可以幽默地面对。

你的困难与挑战：你最近有遇到什么让你悲伤的事吗？还是你以前曾经遇过什么让你悲伤的事吗？这些在你内心深处的悲伤，让你感到生命空洞，茫然。

你的未来潜能：是一个强壮而有权威的人（正面意义），具有灵性的力量。即使在最不利的环境中也坚持自己的信念。

【16号瓶】紫袍

你跟灵性，宗教，修行很有缘分。你可能带有很强的直觉力与灵通能力，这种能力可以治疗他人的身心。代表这个瓶子的塔罗牌是“塔” ，代表你有能力释放掉老旧的自我模式与习惯。

你的困难与挑战：你想待在你的身体吗？我的意思是：你想留在地球上吗？或许你常觉得自己像个外星人，你的想法，感觉，行事风格，人生观……都与一般人不同。

你的未来潜能：是一个与神性计划一致的人，并将的生活建立在此观念上，经常获得自发性的“开悟经验”。

【17号瓶】抒情诗人，希望

这个瓶子叫做“抒情诗人”，所以你的灵魂中带着一种诗意以及浪漫的特质。你具有创作的天份，而且创作的来源是你的心以及来自上天的灵感。在创作的时候，你会觉得有股能量来到你身上，是这股能量在创作，而不是你。代表这个瓶子的塔罗牌是“星星”，代表你具有很深的“信任”的质量。

你的困难与挑战：你最近常感到失望，绝望吗？你觉得胸闷，呼吸不顺，有一种快要窒息的感觉吗？再更深的去感觉，你的内在是不是留了一些悲伤，这股悲伤，让你关闭你的心，好让你不需要去感觉。

你的未来潜能：对灵性和隐藏在生命中的奥秘充满兴趣，经由对人的爱和帮助人们与他们的灵性连结而经验喜悦。

【18号瓶】埃及瓶，潮流的改变

这是一个埃及瓶，所以你会不会觉得对埃及有兴趣？或者觉得跟埃及很有连结呢？这个瓶子具有一股王者的质量，你有没有发现你的灵魂本质有着王者与尊贵的品质呢？代表这个瓶子的塔罗牌是“月亮”，代表你给人一股神秘、未知的感觉，摸不透真正的你是如何的。

你的困难与挑战：你总是在追逐欲望，物质上，感情上，心灵上……各式各样的欲望，你试着让自己充满希望，充满阳光般的感觉。

你的未来潜能：按照自己的梦想来安排人生的人，具有改变的勇气，并享受人生目的的实现。

【19号瓶】生活在物质世界里

这是魔羯座的瓶子，所以你是一个实际，务实，理性，稳重，脚踏实地与负责任的人。你重视成就，常能爬到一个较高的位置。你有很好的社交能力，能与许多人建立起关系，并且保持联系。代表这个瓶子的塔罗牌是“太阳”，所以你是一个重视合作关系的人，你可以改变自己让合作的关系保持和谐。

你的困难与挑战：你有两个极端的面，一部分的你实际，理性；另一部分的你灵性，感性。这在你的内心造成一种分裂，使你感到矛盾。

你的未来潜能：跟自己和谐地相处，也把这种感觉传递给别人，拥有很多能量，包括灵通的能量，别人会感觉像磁铁般地为你所吸引。

【在 lala (这只是个ID) 的大作中提到: 】
: RT.

--

※ 来源:・水木社区 http://newsmth.net・[FROM: 114.249.221.*]

[本篇全文] [本篇作者：Alpha7] [进入讨论区] [返回顶部]

发信人: Alpha7 (阿尔法赛文), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:08:30 2010), 站内

下次搞999个，估计测得更准，hoho
【在 lala (这只是个ID) 的大作中提到: 】
: RT.

--
不想当保姆的厨师不是好司机

※ 来源:・水木社区 newsmth.net・[FROM: 119.123.199.*]

[本篇全文] [本篇作者：lala] [进入讨论区] [返回顶部]

发信人: lala (这只是个ID), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:09:05 2010), 站内

【20号瓶】星星小孩

这是双子座的瓶子，所以你的特质是喜欢变化，会在两极端摆荡，思绪敏捷，善于处理讯息，善于沟通。代表这个瓶子的塔罗牌是“重生”，代表你有能力让自己站到一个较高的角度来看事情，也就是说透过内省，你可以放弃原有的僵化思考，接受一个较新的观点。

你的困难与挑战：你能想象你自己的内在有一个小孩子吗？事实上，每个人内在都有一个小孩子，即使是七八十岁的老人。当我们可以照顾好自己内在的小孩时，我们可以变老，变成熟，但同时拥有小孩般的纯真。

你的未来潜能：一个乐观主义者，能够以建设性的方式来善用乐观主义。具有很深的内在祥和，而且已经免于恐惧。

【21号瓶】新开始的爱

你是一个成熟与圆融的人，你可以耐心的等待适当的时机让事情自然发生。你给人一种春天的气息，可以滋养你周遭的人，给人一种生意盎然的感觉。代表这个瓶子的塔罗牌是“宇宙”，代表你是一个放松与无为的人，你有能力让事情自然地完成。

你的困难与挑战：你的心受伤了吗？你需要被爱吗？你真的该好好让过往心中的伤口被疗愈，否则，你如何有空间让新的爱进来呢？你的心需要爱来滋养，但首先从对自己的爱做起。

你的未来潜能：一个带着最高的觉知力和拥有真正成熟个性的人，爱神也爱人。是一个给予和接受爱的人，用建设性的方式传达爱（例如，由写作、艺术、公开演说，同时也透过家族的圈子中）。【22号瓶】重生之人的瓶子，唤醒

你是一个阳光与温柔的人，你像太阳一般给人开心的能量，也同时给人温暖的照顾。代表这个瓶子的塔罗牌是第二轮回的“傻瓜”，代表你具有傻瓜天真与单纯的质量，但这种天真与单纯却保有成熟与历练过的质量，而不是给人幼稚的感觉。

你的困难与挑战：你会不会觉得自己的自信心不足呢？你会不会觉得自己的内心缺乏阳光的感觉？你会不会觉得你不知道自己要什么？就算你要到你要的，心里还是有股不满足？让我们再走深一些，这一切都源自于你缺乏对自己的爱。

你的未来潜能：一个新时代的教师，过着很着根于大地的灵xxxxx。与神性的智能连结，所以有被解放的感觉。

【23号瓶】爱与光

你是一个很柔软的人，具有女性温柔的特质，你待人很和善，很包容，愿意去照顾与体贴他人。

你的困难与挑战：“我要爱”，就是这一瓶代表的讯息。为什么你这么渴望爱呢？你是不是期待某个浪漫的夜晚，有一个迷人的伴侣会送你一束花，然后共进浪漫的晚餐呢？如果现在那个人还没出现，何不多爱你自己一些，送自己一束美丽的人，带自己去吃一顿好吃的晚餐呢？你可能会问：如何爱自己呢？那我会告诉你：接受所有的你自己，不管好的或不好的你。

你的未来潜能：迫切地想要了解自己和实现自我认知（例如，透过治疗、静心等等），以正向和务实的态度接受命运。

【24号瓶】新的讯息

这个瓶子跟金星的能量息息相关，所以你的灵魂跟爱与美的能量有很深的连结。这个瓶子也暗示你有当媒人的天分，你可以帮助周遭的人找到他们的灵魂伴侣。这个瓶子中显示你是一个浪漫的人，喜欢美丽的，优雅的，享受的感觉。

你的困难与挑战：你会头痛，失眠，思绪烦乱吗？好好想一想，你最近是不是积压了很多的话，很多的感觉没有表达出来呢？如果最近没有，那可能是你一直都习惯压抑自己的感情，这可会导致你胸闷，呼吸不顺呢。

你的未来潜能：是一个能因材施教有洞见的人，能唤醒他人的自爱，具有非凡的个人特质、正直和成熟，把精神贯注在喜悦上。

【25号瓶】恢复期的瓶子，南丁格尔

这个瓶子叫做南丁格尔，显示出你天生具有照顾他人的天份，代表真正的你是一个暧暧内含光的人，你习惯把自己的光隐藏起来。

你的困难与挑战：你习惯隐藏自己喔，你把你丰富的内涵都封闭起来，怕被人看到。为什么？你在害怕什么呢？在小时候，当你被看到，被注意到之后会发生什么事呢？因为你习惯将自己隐藏起来，所以你内在容易累积很多能量，这股能量可能会让你常常感到恼怒，思绪不断，悲伤，头痛，失眠，甚至可能让别人觉得你很神秘，难以亲近。

你的未来潜能：是一个有很多能量可以给予他人的治疗者，具有往内观照的能力。能完成计划中的任务。

【26号瓶】惊吓瓶

你是一个喜欢游戏与欢乐的人，你常常像小孩一样咯咯地笑，同时你的笑也会感染到周围的人。你的感官非常灵敏，重视生活的品味与享受。

你的困难与挑战：你缺乏安全感，可能在最近或过去有受过惊吓的经验，而且那个惊吓的经验到现在还影响着你。由于缺乏安全感，所以你可能会过度敏感，无法信任他人或者新的环境，无法真实地展现你所有的一切。

你的未来潜能：一个非常独立的人，富有创造力，具有很深的、直觉性的智能，藉由教别人而学习，很聪明，而且很小心，做事不会太过火，凡事采取主动，能够感觉很深的喜悦。

【27号瓶】罗宾汉

这一瓶是射手座的瓶子，所以你爱好自由，不喜欢受约束。这一瓶的名字叫做罗宾汉，代表你是有一颗好心肠，有正义感，常替人打抱不平。你透过对人的热情来表现你的心。

你的困难与挑战：你是否觉得自己是女人住在男人的身体，或者男人住在女人的身体里呢？如果你为此而困扰的话，这一瓶罗宾汉可以帮助你。选到庖黄康哪憧赡芤蛭质只蚶牖樾槎唷?br />你的未来潜能：是一个成功的人、努力工作、很专心、精力充沛、有决策力。坚定而且自己依靠自己。

【28号瓶】少女玛莉莲（罗宾汉之配偶）

你是一个看似柔软的人，心肠很好，但却从你的内在散发出一股坚定，不可侵犯的力量，具有韧性，绿色代表你给人的感觉是舒服的，没有压力的。

你的困难与挑战：你是否觉得自己是女人住在男人的身体，或者男人住在女人的身体里呢？如果你为此而困扰的话，这一瓶少女玛丽莲可以帮助你。

你的未来潜能：是一个经常以新的、清新的观点来看待事情的先锋。也是一个可靠、快乐、对自己诚实的人。

【29号瓶】起来，开步走

你是一个动静皆宜的人，你可以热情的与人交往，也可以静静的独处。这个瓶子的名字叫做“起来，开步走”，所以你是一个具有行动力的人，可以带领周遭的人起身去做一些事。

你的困难与挑战：你是否常觉得自己在两个极端之间摆荡呢？有时十分火热，外向，有时非常的冷静，内向。有时像是火山爆发，有时像是处于深海之中。

你的未来潜能：是一个想要改变习俗的改革者。为和平而付出自己、保护别人，希望能给予人类最好的。

【在 lala (这只是个ID) 的大作中提到: 】
: RT.

--

※ 来源:・水木社区 http://newsmth.net・[FROM: 114.249.221.*]

[本篇全文] [本篇作者：lala] [进入讨论区] [返回顶部]

发信人: lala (这只是个ID), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:10:00 2010), 站内

【30号瓶】把天堂带到人间

你外表看似平静，但有一颗火热的心。就像是海底的滚热岩浆，不似火山爆发的猛烈，带着超高的热度与能量，静静地涌入海里。这一个瓶子叫做“把天堂带入人间”。

你的困难与挑战：你觉得忧郁吗？你觉得自己像是处在很深的海底，感觉幽暗，寒冷，缺乏阳光。你找不到任何人，或许也不想找任何人。

你的未来潜能：把观念蜕变成行动，对事情看得很清楚，具有超凡的洞察力。

【31号瓶】泉源

你最大的资产就是你的心，透过你的心，你可以将你深层的智能与黄金表达出来。你生活中最大的资产就是大自然，透过大自然，你可以取得源源不绝的智能。

你的困难与挑战：你缺乏自信吗？你害怕让自己的力量完全展现，或许是因为你的力量与很大的怒气夹杂在一块，如果你想要寻找人生舞台，往你的自我价值先开始找起。

你的未来潜能：具有外交能力，与人以心相待，建立良好的团队精神，接受责任，不认为它是负担，反而当成是一种喜悦。

【32号瓶】苏菲亚

你的直觉力很强，有时透过你的直觉，你会说出很有智能的话语，你的思考比较是跳跃式，乍听之下有些无厘头，但其实有些玄机存在。

你的困难与挑战：有时候你会陷入莫名的恼怒与混乱，思绪烦乱，无法平静。你其实拥有敏锐的直觉，但是你无法信任自己的直觉，常常以理性与逻辑打压你内在的声音。

你的未来潜能：具有个人魅力和修养，给予别人温暖、支持和注意，自己感觉平静，了解自然的力量，生活在此刻，能够清楚地与人沟通。

【33号瓶】海豚瓶

这个瓶子叫做海豚瓶，所以你具有海豚的质量：喜欢嬉戏玩耍，爱笑，幽默，喜欢与人亲近，同时，你拥有很强的直觉力，也具有创作的天份。

你的困难与挑战：在你内心深处想要游戏与放松，但是你却无法游戏与放松。你的头脑要你不能尽情的表达你的心与感觉，你的头脑让你非常严肃，你的头脑无法允许你自在，你的头脑扼杀了你的想象空间与直觉力。

你的未来潜能：是一个为和平而工作的艺术家，与直觉和内在导引有连结。努力达成人生的目的，是一个不容许权威或权威人士反对自己目标的改革者。

【34号瓶】维纳斯的诞生

这个瓶子叫做“维纳斯的诞生”，维纳斯是掌管爱与美的女神，所以你具有天生的美感，你的外貌，举止，穿着，都散发出美的特质。此外，你也散发出爱的能量，有很好的人缘。

你的困难与挑战：你觉得没有人爱你，没有人了解你吗？你觉得自己孤单，常被误解吗？如果是的话，仔细回想一下，你习惯将你自己内心的感觉与感情表达出来吗？

你的未来潜能：是一个在生活中各方面都自给自足的人，不需要外在的帮助、自力更生、有内在力量。

【35号瓶】仁慈

这一个瓶子叫做“仁慈”，所以你的本质是一个很仁慈的人，愿意包容别人，心胸宽大，不喜欢与人争执。

你的困难与挑战：你觉得没有人爱你吗？你觉得世界缺乏温暖吗？或者你刚刚分手呢？这种缺乏爱的感觉，其实源自于你内在很深的悲伤。

你的未来潜能：透过帮助别人而体验到很深的满足，同时也帮助了自己，拥有治疗的天赋和远程治疗能力（意思是治疗一个不在附近的人）。

【36号瓶】博爱

这个瓶子叫做“博爱”，所以你的灵魂带有博爱的质量，你不只爱你的家人朋友，对于陌生人也不吝于给予。

你的困难与挑战：你常做白日梦吗？你周遭的人是否常对你说，实际一点，别作梦了。你常常思绪不断，脑筋大塞车吗？这一切源自于你不够爱自己。你只接受你脑中认为好的部分，你对自己有很多的期待，所以你内在许多的部分都被你压抑或忽略。

你的未来潜能：觉知到自己人生的目的。无论做什么事情，即使在不太顺利时，也很清楚地知道自己的内在目标。

【37号瓶】守护天使降临大地

这个瓶子叫做“守护天使降临人间”，所以你与天使的能量有所连结，你对于这个世界带有一些理想，你想要与人分享这些理想，并且让这些理想落实在这个世界。

你的困难与挑战：你觉得冷吗？我说的是你心里的冷，生命中有些忧郁，像是独自待在三千公尺下的深海一般；生命中有些不想被碰触悲伤，使你有如飘在天空，胡思乱想，脱离现实。

你的未来潜能：跟随自己的理想，有修持、灵感和很好的直觉，用自己的行为显现出人生是多么成功和有效率。

【38号瓶】抒情诗人，洞察力

这个瓶子叫做“抒情诗人”，指出你具有一颗善感的心，并且可以透过文字，绘画，音乐等创作加以抒发；此外，你也是一个优雅及充满诗意的人，喜欢徜徉于大自然之中，像山中舞动的精灵一般。

你的困难与挑战：你的心好吗？你有多久没有倾听你的心了呢？你了解你自己的感觉吗？你愿意释放你的心吗？你的心可能常对你说：我好难过，我好失望，我好高兴，我好生气，我想要这个，我想去那里……

你的未来潜能：具有自然的威严和很好的个人特质，与自己的心有连结，同时对社会有强烈的情感。

【39号瓶】埃及瓶，演布袋戏的人

这一瓶叫做“埃及瓶”，代表你的灵魂本质与埃及有所连结，所以你可能对古埃及文明会有兴趣或特别向往，你容易连结上灵性的讯息与知识，并且具有教导他人的能力，可能是一个灵性的导师。

你的困难与挑战：你相信你自己拥有智能吗？不是念很多书，听很多人说的那种外来的聪明才智，而是发自你最内在的智能。

你的未来潜能：具有改变世界的力量，同时了解这个改变必须从一个人的内在开始，从行动中找到了自由，享受知识的获得并将它传授给别人。

【在 lala (这只是个ID) 的大作中提到: 】
: RT.

--

※ 来源:・水木社区 http://newsmth.net・[FROM: 114.249.221.*]

[本篇全文] [本篇作者：lala] [进入讨论区] [返回顶部]

发信人: lala (这只是个ID), 信区: Picture
标题: Re: 49个瓶子你选哪个？很准的心理测试~
发信站: 水木社区 (Thu Jun 17 22:10:44 2010), 站内

【40号瓶】我是

这是牡羊座的瓶子，代表你是一个热情，直接，行动力强与负责任的人，同时具有领导者的特质。这个瓶子的名字叫做“我是”，所以你独立，勇于做自己，就是呈现出自我原本的模样。

你的困难与挑战：你无法做自己，总是担心别人的看法，受别人的意见影响，这源自于你缺乏自信，缺乏勇气去做自己。缺乏自信的你可能会处于另一个极端：因为恐惧而控制，指责别人，自我中心。

你的未来潜能：内在带着很深的智能，并能将它表达出来。已很接近觉醒时刻，意思是在自我的探索中已经走过了一段很长的旅程。

【41号瓶】智能瓶

这个瓶子叫做“智能瓶”，代表你是一个很有智能的人，可以独立思考，不盲从，带着清晰的眼光来看待世界。

你的困难与挑战：你常被欺负吗？你的界线常被人侵入吗？你觉得自己不受重视吗？如果是的话，请问你：你重视自己吗？你觉得自己有价值吗？

你的未来潜能：在生活的每个层面里，是一个农夫，意思是耕作、收成，很和谐地生活在大自然的韵律中，用“自然”的角度来看待事情。

【42号瓶】收成

你是一个阳光般的人，常常可以带给周遭的人快乐，就像太阳一般，溶化人冰冷的心，你是一个聪明与理性的人，擅长逻辑性的思考，你清楚知道自己要的是什么。

你的困难与挑战：你只重视目标不重视过程，你喜欢追求欲望却忽略了享受过程。你是一个理性的人，你善于计划，重视逻辑，却忽略了你的感情，你的直觉，你的心。

你的未来潜能：是一个仁慈、和谐的“阳光”人物，从行动中也可以反映出这点（例如，可能具有舞蹈天份），跟随着自然的流、跟随着生命的流。

【43号瓶】创造力

这一瓶是水瓶座的瓶子，代表你内在有一个部分是理性的，爱好自由的，你重视人道主义，对于环境保护等新时代的议题具有热诚。这个瓶子叫做“创造力”，代表你拥有无穷的创造力，可能是在艺术方面，也可能是在生活或工作上展现。

你的困难与挑战：你的心想唱歌，可是你不给它唱；你的心想跳舞，可是你不给它跳；你的心想游戏，可是你不给它玩；你的心想表达，可是你掐住它的脖子。

你的未来潜能：是一个追求真理的人，经常检讨自己，感觉很和谐，展现“头脑的伸缩性”，意思是你是个富有弹性的人，容易接纳新的事物。

【44号瓶】守护天使

这个瓶子叫做“守护天使”，代表你跟天使的能量有所连结，你可能跟天使或外星人有过第三类接触，你的人就像这个瓶子的感觉：干干净净的，很可爱。你跟灵性，宗教或修行有缘分，如果你有接触静心，你很容易可以感受到很深的平静。

你的困难与挑战：你有着淡淡的忧愁，小小的自闭，做事情总缺少了那一分力气，无精打采。你有些梦幻，想法总是不切实际。你不想跟人联络，什么事都不想做，但独自一人却感到寂寞，无法享受单独，无法得到平静。

你的未来潜能：是一个蜕变过的人，解放自己也解放别人，肩负和平的任务，与上帝有连结，并以务实的方法蜕变神的旨意与神共同创造。

【45号瓶】爱的气息

这个瓶子叫做“爱的气息”，你的灵魂正有着这样的质量：连结着心，连结着爱。你容易为一些生活中的小事而感动：一个小婴儿的笑，一顿好吃的晚餐，情人对你的一个小小的心意……

你的困难与挑战：你习惯隐藏你自己，你不喜欢被人看见。慢慢的，你越来越不了解自己，不知道自己想要什么，想说什么，感觉是什么。然后，你开始觉得很难表达自己的感情，很难连结到你的心，在亲密关系上有困难。

你的未来潜能：接触到古代知识并对这些知识非常了解，是一个敏感的人，具有很强的直觉力，喜爱人生中所有美的事物。

【46号瓶】漂泊者

你是一个归于中心，稳重，有个禅一般的简单与朴实的质量，同时也连结你的心与感受，你是一个随遇而安的人，可以以四海为家。

你的困难与挑战：你是不是优柔寡断，常常难以做决定呢？你是否找不到人生的方向，自己的舞台呢？因为，你其实不知道自己要什么。造成这个现象的深层原因是：你习惯将自己隐藏起来。

你的未来潜能：是一个忠贞的人，过着归于中心的生活；虽然人生中有盛衰浮沉，但仍信任整个宇宙存在的运作，“努力工作也努力玩”，无论是何种任务，都将会很完善和彻底地完成。

【47号瓶】古老的灵魂

你是一个喜欢笑的人，你的笑像柠檬一般，给人清新的感觉，你有很强的直觉力，可能会无师自通会一些事情，你理性又不失感性，冷静又不失幽默。

你的困难与挑战：你是不是常在两极之间摆荡，有时很high，有时却显得忧郁；有时开怀大笑，有时突然陷入莫名的恼怒。

你的未来潜能：有超感应的天赋，与更高的我有连结，拥有奥秘学的知识，人生的方向清晰并以身作则来教导别人。

【48号瓶】治疗的双翅

你是一个非常洁净的灵魂，常常像天使一般守护你的家人及朋友。你的内在有一部分非常的敏感与纤细，这份敏感与纤细使你可以体察他人的情绪以及内心世界，就如同这个瓶子的名字“治疗的双翅”。

你的困难与挑战：你就像一面心灵的镜子，可以如实的反映他人的状态，你的内在就像彩虹般的丰富，可以与上天连结而感到喜悦，你很容易适应环境，很快的可以融入。一个新的环境当中，你的内在隐藏着无尽的泪水，可能是因为最近发生的事情，也可能是过往甚至儿时经验累积下来的。这些隐藏的泪水使你头痛，头晕，思绪混乱，精神无法集中。

你的未来潜能：职业往往与灵疗、心理学或心理治疗有关。热爱生命并有很多与他人沟通的能量。内在有一份很深的相信，可能是很虔诚于宗教的人。感觉自己与上帝和存在“同一体”。觉知到自己人生的目的。

【在 lala (这只是个ID) 的大作中提到: 】
: RT.

2010年6月12日星期六

G1手机上的VOIP之旅 - SIP Server + SipDroid

前两天帮朋友用手机打了两个国际长途，用了错误的 17951 作为IP长途号（中移动17951国内长途有效，国际长途应该用12593），结果发现一分钟要￥3.6，即使没有人接电话，也会收。

于是在 G1 上玩 Android Market 留意了一下VOIP。

Skype 发布了 Android 平台版本。但是对于语音通话，和 iSkoot 一样，是用 PSTN（既普通的语音通话）呼叫一个本地的电话号码，来进行转发的。不是VOIP。

后来找到了 SipDroid ，这个软件带我走入了 VOIP 的世界。

SIP VOIP 应该怎么玩呢？我是这样玩的：

Gizmo5.com - SIPDroid 这个软件需要一个 SIP 服务商。经过比较 n 多后（如 nonoh.net 等），选中了看起来比较靠谱的 gizmo5.com （另外一个很大的原因是只有 gizmo5.com 能注册到 zixia 这个帐号……）。
花费 $10 购买了呼出功能，打电话到全球大多数地区的费率是一两毛钱人民币吧，语音效果不错；
花费$4购买了 Caller ID 功能，可以在部分支持的地区，显示自己的手机号；
GTalk2Voip.com - 可以绑定 GTalk 到这个网站，然后在GTalk里面即可拨打 SIP 地址的电话，或是 PSTN 电话。
同时还支持 SIP 电话拨打 GTalk，我的 gtalk SIP 地址是：zixia_at_zixia.net@gtalk2voip.com （如果你像我一样不是标准 gmail 用户，也是 Google Apps 或者 Jabber 用户，那么这个地址格式也许能帮到你）;
还可以在全球购买电话号码绑定到自己的 GTalk 上。
花费 $10 开通了语音呼出功能，打电话到全球大多数地区的费率是一两毛钱人民币吧，语音效果不错；
花费 $4 购买了一个美国本地的电话号码，拨打这个电话号码就可以用 GTalk 语音接收了（$2 设置费，$2使用费/月）；
IPTel.org - 免费提供 SIP for Your Domain 的服务，也就是让你用自己的域名可以接收 SIP 呼叫。
设置 zixia.net SIP 先需要修改DNS增加SIP设置，然后在 IPTel.org 上进行域名绑定即可。
这样开通了我的 zixia@zixia.net 作为自己的 SIP 呼叫地址。
Skype - 又给 Skype 充值了 $10 。
Skype的充值会过期，以前好几个 $10 都是过期了，太奸商了。
PBXes.com - 免费提供程控交换机的功能。你可以设置自己的内线号码，设置绑定多个 SIP 服务提供商，并可设置拨打某些国家的电话号码时，使用指定的 SIP 服务（比如最便宜的，或效果最好的），并有语音信箱功能；

最终的效果：

我拥有了属于自己的 SIP 地址： zixia@zixia.net 。为了方便，设置了呼叫转移，转移到 SIP://zixia@gizmo5.com
我的 G1 手机上运行了 SIPDroid 软件，可以通过 Wifi 或 Edge 登录到 Gizmo5.com ，随时接收到呼叫 SIP://zixia@zixia.net 的电话；
我在拨打国际长途的时候，G1会自动使用 SIPDroid 通过 Gizmo5.com 的 SIP 服务，使用 VOIP 拨打，费率￥0.1左右/分钟；
我拥有了一个美国本地电话号码：+1-567-251-5265 。如果我的手机上开着 SIPDroid ，拨打这个号码会和普通电话一样，我可以在手机上接听，免费。（欢迎骚扰～）
在没有 Wifi 的区域，使用 Edge 也可以很流畅的进行通话。（我的 Edge 流量是￥50 包 650MB。如果3G当然就更没问题了）
高兴的话，还可以用 PBXes 设置内线号码，集团彩铃，不同国家不同SIP线路，语音信箱，等等等等。（中关村买的电话交换机的功能，PBXes上几乎都有）

随着智能设备的不断普及，当大多数电话都已不再是一个简单的电路板和电磁铁组成，而是运行着Android等嵌入操作系统的智能设备时……

电信运营商，你们还能这么暴利吗？

标签： gphone

2010年6月10日星期四

悼念方老师 (zz)

发信人: sshengguang (历史的吊诡), 信区: DMSE.THU
标题: 悼念方老师
发信站: 水木社区 (Thu Jun 10 07:41:29 2010), 站内

   今天下午知道了方老师去世的消息，一时竟呆在电脑前面，我印象中他老人家身体一直很硬朗。78岁的年纪，无论如何走的也是早了一点。方老师桃李满天下，系里很多老师都曾经是他的学生，无论如何也轮不到我来写悼念文章。但是我在材料系读书的短短三年，方老师的几件小事永远也难以忘怀

   我跟杨老师开始读研究生的时候，方老师已经退休了，每天早上仍然坚持来实验室工作半天，看文献，和研究生meeting，有几次我中午找老谭吃饭，打电话给他都没人接，后来谭师兄告诉我，方老师和他一起在做实验，亲自去打硬度，看组织，一丝不苟。由于种种原因，方老师一直都不是院士，但是在我们这一领域里，我还没见到哪一位院士能像方老师这样，开发出这么多真正有用的钢铁材料，把理论和实践结合的这样紧密。而在钢铁相变理论上，方老师同样有着国际声誉。今天他老人家驾鹤西去，我们历数方老师的无数成绩和贡献，院士这个虚名对他来说，也可以烟消云散了吧。

   我出国的时候找方老师签推荐信，大家的推荐信都是自己写的，一般nice的老师匆匆看一眼签字就完了，我本以为方老师也是这样，拿着信过去恨不得就说您签了得了。结果第一天去找方老师，他说信留在我这里，明天你再来。第二天过去，方老师很认真的跟我说，我理解你们出国都想把这信写的漂亮一点，但是你真的会做TEM吗，据我所知博士生真正会做TEM的都没几个。我大汗淋漓，因为自己写的时候吹会什么实验技术，SEM,XRD,TEM顺着就写出来了。我老老实实说不太会，只跟着看过几遍，做过几个样品。方老师说那这里最好去掉，别的我都没改，我拿到信又发现别的几个小错误他也帮我改过了。还有我写他的title的时候是Dr..,方老师也去掉了。我这才猛然想起他老人家是50年代新中国最早培养的研究生，那个时候我们国家还没有Ph.D。还是那句话，他的学术造诣和对钢铁事业的贡献，已经不需要任何虚名的累赘了。

   仅以这篇小文悼念方老师，我相信在天堂里，他还会醉心他钟爱的钢铁事业，还会想念永远的清华大学和永远的材料系。方老师安息。

--

※ 修改:・sshengguang 于 Jun 10 07:42:12 2010 修改本文・[FROM: 24.92.48.*]
※ 来源:・水木社区 newsmth.net・[FROM: 24.92.48.*]

2010年6月8日星期二

数据分析

印象非常深刻，业内好像只有畅游公司是在做路演的时候强调过他们是非常重视试图依
靠数据分析帮助网游走向成功的。我一直是科学数据分析的强烈支持者。昨天跟一个朋
友聊天的时候，被他给我描述的老外的数据分析雷到了，实在无法不拿出来分享一下：

   很长时间了，一直觉得看不懂webgame和SNS游戏，到底找什么样的人能，用怎样的
设计原则能把这类游戏做得更好？摸不到太多门道，觉得国内的绝大多数成功作品都捆
在某个具体人身上，甚至多数情况下连这个人自己也总结不太清楚。昨天我这个朋友跟
我讲了他去Zynga拜访的感受：他说国外早已经脱离国内这种单纯依靠某个天才的情况了
。看了Zynga的数据统计系统之后除了佩服只有佩服了，也明白为什么人家一家做SNS的
公司一年能挣6亿美金了！

   长话短说，就举一个最有代表性的例子吧：Zynga的数据分析系统能做到每一个更新
内容设计3个不同的版本，从后台同时推送给不同的用户群，在一个小时之内把数据统计
结果发回来，分别得到3种设计情况下用户的使用频率和比例，造成的付费或者病毒推广
效果的变化等等，并根据测试结果和事先设定的判断标准自动选择其中比较好的一种方
案全服更新上去。

   Zynga他们所有的产品经理每天一上班，首要任务就是先要花30%以上的时间进行数
据分析，然后才根据数据分析的结果布置新的任务，他们的数据分析细化到每一张图片
的位置和颜色等等。Zynga现在招聘产品经理基本上不要求一定要有同业工作经验，但一
定是数据统计能力超一流的高才生，经过他们的培养，很快可以成长成为优秀的产品经
理。他们也抄袭别人的游戏，但他们一抄过来就放入他们的数据分析工厂里，加上他们
强大的流程化的研发体系支持，造成他们的抄游戏很快就超过被抄的作品，这也许就是
他们的牧场游戏比我们中国人做的输出过去的同类游戏的ARPU高6倍的原因！

   国内网游开发的同行们：咱们得到的启示是什么？webgame和SNS游戏赚钱难，国内
成功者也多数很擅长数据分析的，但大型网游赚钱太容易了，很少有人抠得这么细，大
概抄一下都赚钱了，因为负责抄的人分析细节的能力有高有低，所以我们现在还停留在
靠“人”的阶段，在这方面我们是不是还有很大的提升余地呢？

号外！国内牛人超酷3D引擎超越虚幻3 (转载)

发信人: Jed (每一天都是新的，美好的！), 信区: GameIndustry
标题: 号外！国内牛人超酷3D引擎超越虚幻3 (转载)
发信站: 水木社区 (Tue Jun 8 18:38:39 2010), 站内

【以下文字转载自 Game 讨论区】
发信人: acseed (advanced CS EE department), 信区: Game
标题: 号外！国内牛人超酷3D引擎超越虚幻3 (转载)
发信站: 水木社区 (Tue Jun 8 18:33:16 2010), 站内

说来我也算是一个网游行业的高级程序员了，但这确实是我第一次在国内看到一个具有国际一流水平的3D引擎。欣喜过望，不敢独享。按作者的说法，“这个引擎整合了各种最先进的渲染技术，在画质上超越了虚幻3部分超越了CE2”，此言非虚。我看到，诸如高阶动态范围，多频率软阴影，屏幕空间环境遮掩，HDR景深，全局光照，体散射，次表面散射，动态天空，多线程渲染等曾今都只在国际一流引擎上看到的身影不仅被全部集成到 Quantumas中，而且作者还提出了多种不同程度的改进。同时我也很佩服作为一个程序员，Dex同时具有一定美术功底，实在难能可贵。

废话少说，该有的东西都在这个demo视频中了：
优酷，中文版（分辨率较低，不推荐）
Quantumas 3D引擎演示
http://v.youku.com/v_show/id_XMTc5NTcwNTMy.html

Vimeo，英文高清版（推荐，记得用HD+全屏）
http://www.vimeo.com/12382186
--
不许笑~~

※ 来源:・水木社区 http://newsmth.net・[FROM: 202.197.66.*]

Regular Expression Matching Can Be Simple And Fast

Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox
rsc@swtch.com
January 2007

Introduction

This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics:



Time to match `a?`ⁿ`a`ⁿ against `a`ⁿ

Let's use superscripts to denote string repetition, so that a?³a³ is shorthand for a?a?a?aaa. The two graphs plot the time required by each approach to match the regular expression a?ⁿaⁿagainst the string aⁿ.

Notice that Perl requires over sixty seconds to match a 29-character string. The other approach, labeled Thompson NFA for reasons that will be explained later, requires twentymicroseconds to match the string. That's not a typo. The Perl graph plots time in seconds, while the Thompson NFA graph plots time in microseconds: the Thompson NFA implementation is a million times faster than Perl when running on a miniscule 29-character string. The trends shown in the graph continue: the Thompson NFA handles a 100-character string in under 200 microseconds, while Perl would require over 10¹⁵ years. (Perl is only the most conspicuous example of a large number of popular programs that use the same algorithm; the above graph could have been Python, or PHP, or Ruby, or many other languages. A more detailed graph later in this article presents data for other implementations.)

It may be hard to believe the graphs: perhaps you've used Perl, and it never seemed like regular expression matching was particularly slow. Most of the time, in fact, regular expression matching in Perl is fast enough. As the graph shows, though, it is possible to write so-called "pathological" regular expressions that Perl matches very very slowly. In contrast, there are no regular expressions that are pathological for the Thompson NFA implementation. Seeing the two graphs side by side prompts the question, "why doesn't Perl use the Thompson NFA approach?" It can, it should, and that's what the rest of this article is about.

Historically, regular expressions are one of computer science's shining examples of how using good theory leads to good programs. They were originally developed by theorists as a simple computational model, but Ken Thompson introduced them to programmers in his implementation of the text editor QED for CTSS. Dennis Ritchie followed suit in his own implementation of QED, for GE-TSS. Thompson and Ritchie would go on to create Unix, and they brought regular expressions with them. By the late 1970s, regular expressions were a key feature of the Unix landscape, in tools such as ed, sed, grep, egrep, awk, and lex.

Today, regular expressions have also become a shining example of how ignoring good theory leads to bad programs. The regular expression implementations used by today's popular tools are significantly slower than the ones used in many of those thirty-year-old Unix tools.

This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations.

Regular Expressions

Regular expressions are a notation for describing sets of character strings. When a particular string is in the set described by a regular expression, we often say that the regular expressionmatches the string.

The simplest regular expression is a single literal character. Except for the special metacharacters *+?()|, characters match themselves. To match a metacharacter, escape it with a backslash: \+ matches a literal plus character.

Two regular expressions can be alternated or concatenated to form a new regular expression: if e₁ matches s and e₂ matches t, then e₁|e₂ matches s or t, and e₁e₂ matches st.

The metacharacters *, +, and ? are repetition operators: e₁* matches a sequence of zero or more (possibly different) strings, each of which match e₁; e₁+ matches one or more; e₁? matches zero or one.

The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators. Explicit parentheses can be used to force different meanings, just as in arithmetic expressions. Some examples: ab|cd is equivalent to (ab)|(cd); ab* is equivalent to a(b*).

The syntax described so far is a subset of the traditional Unix egrep regular expression syntax. This subset suffices to describe all regular languages: loosely speaking, a regular language is a set of strings that can be matched in a single pass through the text using only a fixed amount of memory. Newer regular expression facilities (notably Perl and those that have copied it) have addedmany new operators and escape sequences. These additions make the regular expressions more concise, and sometimes more cryptic, but usually not more powerful: these fancy new regular expressions almost always have longer equivalents using the traditional syntax.

One common regular expression extension that does provide additional power is called backreferences. A backreference like \1 or \2 matches the string matched by a previous parenthesized expression, and only that string: (cat|dog)\1 matches catcat and dogdog but not catdog nor dogcat. As far as the theoretical term is concerned, regular expressions with backreferences are not regular expressions. The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms, like the one Perl uses. Perl (and the other languages) could not now remove backreference support, of course, but they could employ much faster algorithms when presented with regular expressions that don't have backreferences, like the ones considered above. This article is about those faster algorithms.

Finite Automata

Another way to describe sets of character strings is with finite automata. Finite automata are also known as state machines, and we will use "automaton" and "machine" interchangeably.

As a simple example, here is a machine recognizing the set of strings matched by the regular expression a(bb)+a:

DFA for a(bb)+a

A finite automaton is always in one of its states, represented in the diagram by circles. (The numbers inside the circles are labels to make this discussion easier; they are not part of the machine's operation.) As it reads the string, it switches from state to state. This machine has two special states: the start state s₀ and the matching state s₄. Start states are depicted with lone arrowheads pointing at them, and matching states are drawn as a double circle.

The machine reads an input string one character at a time, following arrows corresponding to the input to move from state to state. Suppose the input string is abbbba. When the machine reads the first letter of the string, the a, it is in the start state s₀. It follows the a arrow to state s₁. This process repeats as the machine reads the rest of the string: b to s₂, b to s₃, b to s₂, b to s₃, and finally a to s₄.

DFA execution on abbbba

The machine ends in s₄, a matching state, so it matches the string. If the machine ends in a non-matching state, it does not match the string. If, at any point during the machine's execution, there is no arrow for it to follow corresponding to the current input character, the machine stops executing early.

The machine we have been considering is called a deterministic finite automaton (DFA), because in any state, each possible input letter leads to at most one new state. We can also create machines that must choose between multiple possible next states. For example, this machine is equivalent to the previous one but is not deterministic:

NFA for a(bb)+a

The machine is not deterministic because if it reads a b in state s₂, it has multiple choices for the next state: it can go back to s₁ in hopes of seeing another bb, or it can go on to s₃ in hopes of seeing the final a. Since the machine cannot peek ahead to see the rest of the string, it has no way to know which is the correct decision. In this situation, it turns out to be interesting to let the machine always guess correctly. Such machines are called non-deterministic finite automata (NFAs or NDFAs). An NFA matches an input string if there is some way it can read the string and follow arrows to a matching state.

Sometimes it is convenient to let NFAs have arrows with no corresponding input character. We will leave these arrows unlabeled. An NFA can, at any time, choose to follow an unlabeled arrow without reading any input. This NFA is equivalent to the previous two, but the unlabeled arrow makes the correspondence with a(bb)+a clearest:

Converting Regular Expressions to NFAs

Regular expressions and NFAs turn out to be exactly equivalent in power: every regular expression has an equivalent NFA (they match the same strings) and vice versa. (It turns out that DFAs are also equivalent in power to NFAs and regular expressions; we will see this later.) There are multiple ways to translate regular expressions into NFAs. The method described here was first described by Thompson in his 1968 CACM paper.

The NFA for a regular expression is built up from partial NFAs for each subexpression, with a different construction for each operator. The partial NFAs have no matching states: instead they have one or more dangling arrows, pointing to nothing. The construction process will finish by connecting these arrows to a matching state.

The NFAs for matching single characters look like:

The NFA for the concatenation e₁e₂ connects the final arrow of the e₁ machine to the start of the e₂ machine:

The NFA for the alternation e₁|e₂ adds a new start state with a choice of either the e₁ machine or the e₂ machine.

Alternation NFA

The NFA for e? alternates the e machine with an empty path:

Zero or one NFA

The NFA for e* uses the same alternation but loops a matching e machine back to the start:

Zero or more NFA

The NFA for e+ also creates a loop, but one that requires passing through e at least once:

Counting the new states in the diagrams above, we can see that this technique creates exactly one state per character or metacharacter in the regular expression, excluding parentheses. Therefore the number of states in the final NFA is at most equal to the length of the original regular expression.

Just as with the example NFA discussed earlier, it is always possible to remove the unlabeled arrows, and it is also always possible to generate the NFA without the unlabeled arrows in the first place. Having the unlabeled arrows makes the NFA easier for us to read and understand, and they also make the C representation simpler, so we will keep them.

Regular Expression Search Algorithms

Now we have a way to test whether a regular expression matches a string: convert the regular expression to an NFA and then run the NFA using the string as input. Remember that NFAs are endowed with the ability to guess perfectly when faced with a choice of next state: to run the NFA using an ordinary computer, we must find a way to simulate this guessing.

One way to simulate perfect guessing is to guess one option, and if that doesn't work, try the other. For example, consider the NFA for abab|abbb run on the string abbb:

NFA for abab|abbb

Backtracking execution on abbb

At step 0, the NFA must make a choice: try to match abab or try to match abbb? In the diagram, the NFA tries abab, but that fails after step 3. The NFA then tries the other choice, leading to step 4 and eventually a match. This backtracking approach has a simple recursive implementation but can read the input string many times before succeeding. If the string does not match, the machine must try all possible execution paths before giving up. The NFA tried only two different paths in the example, but in the worst case, there can be exponentially many possible execution paths, leading to very slow run times.

A more efficient but more complicated way to simulate perfect guessing is to guess both options simultaneously. In this approach, the simulation allows the machine to be in multiple states at once. To process each letter, it advances all the states along all the arrows that match the letter.

Parallel execution on abbb

The machine starts in the start state and all the states reachable from the start state by unlabeled arrows. In steps 1 and 2, the NFA is in two states simultaneously. Only at step 3 does the state set narrow down to a single state. This multi-state approach tries both paths at the same time, reading the input only once. In the worst case, the NFA might be in every state at each step, but this results in at worst a constant amount of work independent of the length of the string, so arbitrarily large input strings can be processed in linear time. This is a dramatic improvement over the exponential time required by the backtracking approach. The efficiency comes from tracking the set of reachable states but not which paths were used to reach them. In an NFA with n nodes, there can only be n reachable states at any step, but there might be 2ⁿ paths through the NFA.

Implementation

Thompson introduced the multiple-state simulation approach in his 1968 paper. In his formulation, the states of the NFA were represented by small machine-code sequences, and the list of possible states was just a sequence of function call instructions. In essence, Thompson compiled the regular expression into clever machine code. Forty years later, computers are much faster and the machine code approach is not as necessary. The following sections present an implementation written in portable ANSI C. The full source code (under 400 lines) and the benchmarking scripts are available online. (Readers who are unfamiliar or uncomfortable with C or pointers should feel free to read the descriptions and skip over the actual code.)

Implementation: Compiling to NFA

The first step is to compile the regular expression into an equivalent NFA. In our C program, we will represent an NFA as a linked collection of State structures:

struct State { 	int c; 	State *out; 	State *out1; 	int lastlist; };

Each State represents one of the following three NFA fragments, depending on the value of c.

Possible per-State NFA fragments

(Lastlist is used during execution and is explained in the next section.)

Following Thompson's paper, the compiler builds an NFA from a regular expression in postfix notation with dot (.) added as an explicit concatenation operator. A separate functionre2post rewrites infix regular expressions like "a(bb)+a" into equivalent postfix expressions like "abb.+.a.". (A "real" implementation would certainly need to use dot as the "any character" metacharacter rather than as a concatenation operator. A real implementation would also probably build the NFA during parsing rather than build an explicit postfix expression. However, the postfix version is convenient and follows Thompson's paper more closely.)

As the compiler scans the postfix expression, it maintains a stack of computed NFA fragments. Literals push new NFA fragments onto the stack, while operators pop fragments off the stack and then push a new fragment. For example, after compiling the abb in abb.+.a., the stack contains NFA fragments for a, b, and b. The compilation of the . that follows pops the two b NFA fragment from the stack and pushes an NFA fragment for the concatenation bb.. Each NFA fragment is defined by its start state and its outgoing arrows:

struct Frag { 	State *start; 	Ptrlist *out; };

Start points at the start state for the fragment, and out is a list of pointers to State* pointers that are not yet connected to anything. These are the dangling arrows in the NFA fragment.

Some helper functions manipulate pointer lists:

 Ptrlist *list1(State **outp); Ptrlist *append(Ptrlist *l1, Ptrlist *l2);  void patch(Ptrlist *l, State *s);

List1 creates a new pointer list containing the single pointer outp. Append concatenates two pointer lists, returning the result. Patch connects the dangling arrows in the pointer list l to the states: it sets *outp = s for each pointer outp in l.

Given these primitives and a fragment stack, the compiler is a simple loop over the postfix expression. At the end, there is a single fragment left: patching in a matching state completes the NFA.

State* post2nfa(char *postfix) { 	char *p; 	Frag stack[1000], *stackp, e1, e2, e; 	State *s;  	#define push(s) *stackp++ = s 	#define pop()   *--stackp  	stackp = stack; 	for(p=postfix; *p; p++){ 		switch(*p){ 		/* compilation cases, described below */ 		} 	} 	 	e = pop(); 	patch(e.out, matchstate); 	return e.start; }

The specific compilation cases mimic the translation steps described earlier.

Literal characters: default: s = state(*p, NULL, NULL); push(frag(s, list1(&s->out)); break;
Catenation: case '.': e2 = pop(); e1 = pop(); patch(e1.out, e2.start); push(frag(e1.start, e2.out)); break;
Alternation: case '\|': e2 = pop(); e1 = pop(); s = state(Split, e1.start, e2.start); push(frag(s, append(e1.out, e2.out))); break;
Zero or one: case '?': e = pop(); s = state(Split, e.start, NULL); push(frag(s, append(e.out, list1(&s->out1)))); break;
Zero or more: case '*': e = pop(); s = state(Split, e.start, NULL); patch(e.out, s); push(frag(s, list1(&s->out1))); break;
One or more: case '+': e = pop(); s = state(Split, e.start, NULL); patch(e.out, s); push(frag(e.start, list1(&s->out1))); break;

Implementation: Simulating the NFA

Now that the NFA has been built, we need to simulate it. The simulation requires tracking State sets, which are stored as a simple array list:

struct List { 	State **s; 	int n; };

The simulation uses two lists: clist is the current set of states that the NFA is in, and nlist is the next set of states that the NFA will be in, after processing the current character. The execution loop initializes clist to contain just the start state and then runs the machine one step at a time.

int match(State *start, char *s) { 	List *clist, *nlist, *t;  	/* l1 and l2 are preallocated globals */ 	clist = startlist(start, &l1); 	nlist = &l2; 	for(; *s; s++){ 		step(clist, *s, nlist); 		t = clist; clist = nlist; nlist = t;	/* swap clist, nlist */ 	} 	return ismatch(clist); }

To avoid allocating on every iteration of the loop, match uses two preallocated lists l1 and l2 as clist and nlist, swapping the two after each step.

If the final state list contains the matching state, then the string matches.

 int ismatch(List *l) { 	int i;  	for(i=0; i<l->n; i++) 		if(l->s[i] == matchstate) 			return 1; 	return 0; }

Addstate adds a state to the list, but not if it is already on the list. Scanning the entire list for each add would be inefficient; instead the variable listid acts as a list generation number. When addstate adds s to a list, it records lastid in s->lastlist. If the two are already equal, then s is already on the list being built. Addstate also follows unlabeled arrows: if s is aSplit state with two unlabeled arrows to new states, addstate adds those states to the list instead of s.

void addstate(List *l, State *s) { 	if(s == NULL || s->lastlist == listid) 		return; 	s->lastlist = listid; 	if(s->c == Split){ 		/* follow unlabeled arrows */ 		addstate(l, s->out); 		addstate(l, s->out1); 		return; 	} 	l->s[l->n++] = s; }

Startlist creates an initial state list by adding just the start state:

List* startlist(State *s, List *l) { 	listid++; 	l->n = 0; 	addstate(l, s); 	return l; }

Finally, step advances the NFA past a single character, using the current list clist to compute the next list nlist.

void step(List *clist, int c, List *nlist) { 	int i; 	State *s;  	listid++; 	nlist->n = 0; 	for(i=0; i<clist->n; i++){ 		s = clist->s[i]; 		if(s->c == c) 			addstate(nlist, s->out); 	} }

Performance

The C implementation just described was not written with performance in mind. Even so, a slow implementation of a linear-time algorithm can easily outperform a fast implementation of an exponential-time algorithm once the exponent is large enough. Testing a variety of popular regular expression engines on a so-called pathological regular expression demonstrates this nicely.

Consider the regular expression a?ⁿaⁿ. It matches the string aⁿ when the a? are chosen not to match any letters, leaving the entire string to be matched by the aⁿ. Backtracking regular expression implementations implement the zero-or-one ? by first trying one and then zero. There are n such choices to make, a total of 2ⁿ possibilities. Only the very last possibility—choosing zero for all the ?—will lead to a match. The backtracking approach thus requires O(2ⁿ) time, so it will not scale much beyond n=25.

In contrast, Thompson's algorithm maintains state lists of length approximately n and processes the string, also of length n, for a total of O(n²) time. (The run time is superlinear, because we are not keeping the regular expression constant as the input grows. For a regular expression of length m run on text of length n, the Thompson NFA requires O(mn) time.)

The following graph plots time required to check whether a?ⁿaⁿ matches aⁿ:

regular expression and text size n
a?ⁿaⁿ matching aⁿ

Notice that the graph's y-axis has a logarithmic scale, in order to be able to see a wide variety of times on a single graph.

From the graph it is clear that Perl, PCRE, Python, and Ruby are all using recursive backtracking. PCRE stops getting the right answer at n=23, because it aborts the recursive backtracking after a maximum number of steps. As of Perl 5.6, Perl's regular expression engine is said to memoize the recursive backtracking search, which should, at some memory cost, keep the search from taking exponential amounts of time unless backreferences are being used. As the performance graph shows, the memoization is not complete: Perl's run time grows exponentially even though there are no backreferences in the expression. Although not benchmarked here, Java uses a backtracking implementation too. In fact, the java.util.regex interface requires a backtracking implementation, because arbitrary Java code can be substituted into the matching path. PHP uses the PCRE library.

The thick blue line is the C implementation of Thompson's algorithm given above. Awk, Tcl, GNU grep, and GNU awk build DFAs, either precomputing them or using the on-the-fly construction described in the next section.

Some might argue that this test is unfair to the backtracking implementations, since it focuses on an uncommon corner case. This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy. Also, while examples as dramatic as this one rarely occur in practice, less dramatic ones do occur. Examples include using (.*) (.*) (.*) (.*) (.*) to split five space-separated fields, or using alternations where the common cases are not listed first. As a result, programmers often learn which constructs are expensive and avoid them, or they turn to so-called optimizers. Using Thompson's NFA simulation does not require such adaptation: there are no expensive regular expressions.

Caching the NFA to build a DFA

Recall that DFAs are more efficient to execute than NFAs, because DFAs are only ever in one state at a time: they never have a choice of multiple next states. Any NFA can be converted into an equivalent DFA in which each DFA state corresponds to a list of NFA states.

For example, here is the NFA we used earlier for abab|abbb, with state numbers added:

NFA for abab|abbb

The equivalent DFA would be:

DFA for abab|abbb

Each state in the DFA corresponds to a list of states from the NFA.

In a sense, Thompson's NFA simulation is executing the equivalent DFA: each List corresponds to some DFA state, and the step function is computing, given a list and a next character, the next DFA state to enter. Thompson's algorithm simulates the DFA by reconstructing each DFA state as it is needed. Rather than throw away this work after each step, we could cache theLists in spare memory, avoiding the cost of repeating the computation in the future and essentially computing the equivalent DFA as it is needed. This section presents the implementation of such an approach. Starting with the NFA implementation from the previous section, we need to add less than 100 lines to build a DFA implementation.

To implement the cache, we first introduce a new data type that represents a DFA state:

 struct DState { 	List l; 	DState *next[256]; 	DState *left; 	DState *right; };

A DState is the cached copy of the list l. The array next contains pointers to the next state for each possible input character: if the current state is d and the next input character is c, then d->next[c] is the next state. If d->next[c] is null, then the next state has not been computed yet. Nextstate computes, records, and returns the next state for a given state and character.

The regular expression match follows d->next[c] repeatedly, calling nextstate to compute new states as needed.

int match(DState *start, char *s) { 	int c; 	DState *d, *next; 	 	d = start; 	for(; *s; s++){ 		c = *s & 0xFF; 		if((next = d->next[c]) == NULL) 			next = nextstate(d, c); 		d = next; 	} 	return ismatch(&d->l); }

All the DStates that have been computed need to be saved in a structure that lets us look up a DState by its List. To do this, we arrange them in a binary tree using the sorted List as the key. The dstate function returns the DState for a given List, allocating one if necessary:

DState* dstate(List *l) { 	int i; 	DState **dp, *d; 	static DState *alldstates;  	qsort(l->s, l->n, sizeof l->s[0], ptrcmp);  	/* look in tree for existing DState */ 	dp = &alldstates; 	while((d = *dp) != NULL){ 		i = listcmp(l, &d->l); 		if(i < 0) 			dp = &d->left; 		else if(i > 0) 			dp = &d->right; 		else 			return d; 	} 	 	/* allocate, initialize new DState */ 	d = malloc(sizeof *d + l->n*sizeof l->s[0]); 	memset(d, 0, sizeof *d); 	d->l.s = (State**)(d+1); 	memmove(d->l.s, l->s, l->n*sizeof l->s[0]); 	d->l.n = l->n;  	/* insert in tree */ 	*dp = d; 	return d; }

Nextstate runs the NFA step and returns the corresponding DState:

DState* nextstate(DState *d, int c) { 	step(&d->l, c, &l1); 	return d->next[c] = dstate(&l1); }

Finally, the DFA's start state is the DState corresponding to the NFA's start list:

DState* startdstate(State *start) { 	return dstate(startlist(start, &l1)); }

(As in the NFA simulation, l1 is a preallocated List.)

The DStates correspond to DFA states, but the DFA is only built as needed: if a DFA state has not been encountered during the search, it does not yet exist in the cache. An alternative would be to compute the entire DFA at once. Doing so would make match a little faster by removing the conditional branch, but at the cost of increased startup time and memory use.

One might also worry about bounding the amount of memory used by the on-the-fly DFA construction. Since the DStates are only a cache of the step function, the implementation ofdstate could choose to throw away the entire DFA so far if the cache grew too large. This cache replacement policy only requires a few extra lines of code in dstate and in nextstate, plus around 50 lines of code for memory management. An implementation is available online. (Awk uses a similar limited-size cache strategy, with a fixed limit of 32 cached states; this explains the discontinuity in its performance at n=28 in the graph above.)

NFAs derived from regular expressions tend to exhibit good locality: they visit the same states and follow the same transition arrows over and over when run on most texts. This makes the caching worthwhile: the first time an arrow is followed, the next state must be computed as in the NFA simulation, but future traversals of the arrow are just a single memory access. Real DFA-based implementations can make use of additional optimizations to run even faster. A companion article (not yet written) will explore DFA-based regular expression implementations in more detail.

Real world regular expressions

Regular expression usage in real programs is somewhat more complicated than what the regular expression implementations described above can handle. This section briefly describes the common complications; full treatment of any of these is beyond the scope of this introductory article.

Character classes. A character class, whether [0-9] or \w or . (dot), is just a concise representation of an alternation. Character classes can be expanded into alternations during compilation, though it is more efficient to add a new kind of NFA node to represent them explicitly. POSIX defines special character classes like [[:upper:]] that change meaning depending on the current locale, but the hard part of accommodating these is determining their meaning, not encoding that meaning into an NFA.

Escape sequences. Real regular expression syntaxes need to handle escape sequences, both as a way to match metacharacters ($, $, \\, etc.) and to specify otherwise difficult-to-type characters such as \n.

Counted repetition. Many regular expression implementations provide a counted repetition operator {n} to match exactly n strings matching a pattern; {n,m} to match at least n but no more than m; and {n,} to match n or more. A recursive backtracking implementation can implement counted repetition using a loop; an NFA or DFA-based implementation must expand the repetition: e{3} expands to eee; e{3,5} expands to eeee?e?, and e{3,} expands to eee+.

Submatch extraction. When regular expressions are used for splitting or parsing strings, it is useful to be able to find out which sections of the input string were matched by each subexpression. After a regular expression like ([0-9]+-[0-9]+-[0-9]+) ([0-9]+:[0-9]+) matches a string (say a date and time), many regular expression engines make the text matched by each parenthesized expression available. For example, one might write in Perl:

if(/([0-9]+-[0-9]+-[0-9]+) ([0-9]+:[0-9]+)/){ 	print "date: $1, time: $2\n"; }

The extraction of submatch boundaries has been mostly ignored by computer science theorists, and it is perhaps the most compelling argument for using recursive backtracking. However, Thompson-style algorithms can be adapted to track submatch boundaries without giving up efficient performance. The Eighth Edition Unix regexp(3) library implemented such an algorithm as early as 1985, though as explained below, it was not very widely used or even noticed.

Unanchored matches. This article has assumed that regular expressions are matched against an entire input string. In practice, one often wishes to find a substring of the input that matches the regular expression. Unix tools traditionally return the longest matching substring that starts at the leftmost possible point in the input. An unanchored search for e is a special case of submatch extraction: it is like searching for .*(e).* where the first .* is constrained to match as short a string as possible.

Non-greedy operators. In traditional Unix regular expressions, the repetition operators ?, *, and + are defined to match as much of the string as possible while still allowing the entire regular expression to match: when matching (.+)(.+) against abcd, the first (.+) will match abc, and the second will match d. These operators are now called greedy. Perl introduced ??, *?, and +?as non-greedy versions, which match as little of the string as possible while preserving the overall match: when matching (.+?)(.+?) against abcd, the first (.+?) will match only a, and the second will match bcd. By definition, whether an operator is greedy cannot affect whether a regular expression matches a particular string as a whole; it only affects the choice of submatch boundaries. The backtracking algorithm admits a simple implementation of non-greedy operators: try the shorter match before the longer one. For example, in a standard backtracking implementation, e? first tries using e and then tries not using it; e?? uses the other order. The submatch-tracking variants of Thompson's algorithm can be adapted to accommodate non-greedy operators.

Assertions. The traditional regular expression metacharacters ^ and $ can be viewed as assertions about the text around them: ^ asserts that the previous character is a newline (or the beginning of the string), while $ asserts that the next character is a newline (or the end of the string). Perl added more assertions, like the word boundary \b, which asserts that the previous character is alphanumeric but the next is not, or vice versa. Perl also generalized the idea to arbitrary conditions called lookahead assertions: (?=re) asserts that the text after the current input position matches re, but does not actually advance the input position; (?!re) is similar but asserts that the text does not match re. The lookbehind assertions (?<=re) and (?<!re) are similar but make assertions about the text before the current input position. Simple assertions like ^, $, and \b are easy to accommodate in an NFA, delaying the match one byte for forward assertions. The generalized assertions are harder to accommodate but in principle could be encoded in the NFA.

Backreferences. As mentioned earlier, no one knows how to implement regular expressions with backreferences efficiently, though no one can prove that it's impossible either. (Specifically, the problem is NP-complete, meaning that if someone did find an efficient implementation, that would be major news to computer scientists and would win a million dollar prize.) The simplest, most effective strategy for backreferences, taken by the original awk and egrep, is not to implement them. This strategy is no longer practical: users have come to rely on backreferences for at least occasional use, and backreferences are part of the POSIX standard for regular expressions. Even so, it would be reasonable to use Thompson's NFA simulation for most regular expressions, and only bring out backtracking when it is needed. A particularly clever implementation could combine the two, resorting to backtracking only to accommodate the backreferences.

Backtracking with memoization. Perl's approach of using memoization to avoid exponential blowup during backtracking when possible is a good one. At least in theory, it should make Perl's regular expressions behave more like an NFA and less like backtracking. Memoization does not completely solve the problem, though: the memoization itself requires a memory footprint roughly equal to the size of the text times the size of the regular expression. Memoization also does not address the issue of the stack space used by backtracking, which is linear in the size of the text: matching long strings typically causes a backtracking implementation to run out of stack space:

$ perl -e '("a" x 100000) =~ /^(ab?)*$/;' Segmentation fault (core dumped) $

Character sets. Modern regular expression implementations must deal with large non-ASCII character sets such as Unicode. The Plan 9 regular expression library incorporates Unicode by running an NFA with a single Unicode character as the input character for each step. That library separates the running of the NFA from decoding the input, so that the same regular expression matching code is used for both UTF-8 and wide-character inputs.

History and References

Michael Rabin and Dana Scott introduced non-deterministic finite automata and the concept of non-determinism in 1959 [7], showing that NFAs can be simulated by (potentially much larger) DFAs in which each DFA state corresponds to a set of NFA states. (They won the Turing Award in 1976 for the introduction of the concept of non-determinism in that paper.)

R. McNaughton and H. Yamada [4] and Ken Thompson [9] are commonly credited with giving the first constructions to convert regular expressions into NFAs, even though neither paper mentions the then-nascent concept of an NFA. McNaughton and Yamada's construction creates a DFA, and Thompson's construction creates IBM 7094 machine code, but reading between the lines one can see latent NFA constructions underlying both. Regular expression to NFA constructions differ only in how they encode the choices that the NFA must make. The approach used above, mimicking Thompson, encodes the choices with explicit choice nodes (the Split nodes above) and unlabeled arrows. An alternative approach, the one most commonly credited to McNaughton and Yamada, is to avoid unlabeled arrows, instead allowing NFA states to have multiple outgoing arrows with the same label. McIlroy [3] gives a particularly elegant implementation of this approach in Haskell.

Thompson's regular expression implementation was for his QED editor running on the CTSS [10] operating system on the IBM 7094. A copy of the editor can be found in archived CTSS sources [5]. L. Peter Deutsch and Butler Lampson [1] developed the first QED, but Thompson's reimplementation was the first to use regular expressions. Dennis Ritchie, author of yet another QED implementation, has documented the early history of the QED editor [8] (Thompson, Ritchie, and Lampson later won Turing awards for work unrelated to QED or finite automata.)

Thompson's paper marked the beginning of a long line of regular expression implementations. Thompson chose not to use his algorithm when implementing the text editor ed, which appeared in First Edition Unix (1971), or in its descendant grep, which first appeared in the Fourth Edition (1973). Instead, these venerable Unix tools used recursive backtracking! Backtracking was justifiable because the regular expression syntax was quite limited: it omitted grouping parentheses and the |, ?, and + operators. Al Aho's egrep, which first appeared in the Seventh Edition (1979), was the first Unix tool to provide the full regular expression syntax, using a precomputed DFA. By the Eighth Edition (1985), egrep computed the DFA on the fly, like the implementation given above.

While writing the text editor sam [6] in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on. (In his defense, Spencer knew the routines could be slow, and he didn't know that a more efficient algorithm existed. He even warned in the documentation, "Many users have found the speed perfectly adequate, although replacing the insides of egrep with this code would be a mistake.") Pike's regular expression implementation, extended to support Unicode, was made freely available with sam in late 1992, but the particularly efficient regular expression search algorithm went unnoticed. The code is now available in many forms: as part of sam, as Plan 9's regular expression library, or packaged separately for Unix. Ville Laurikari independently discovered Pike's algorithm in 1999, developing a theoretical foundation as well [2].

Finally, any discussion of regular expressions would be incomplete without mentioning Jeffrey Friedl's book Mastering Regular Expressions, perhaps the most popular reference among today's programmers. Friedl's book teaches programmers how best to use today's regular expression implementations, but not how best to implement them. What little text it devotes to implementation issues perpetuates the widespread belief that recursive backtracking is the only way to simulate an NFA. Friedl makes it clear that he neither understands nor respects the underlying theory.

Summary

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow. With the exception of backreferences, the features provided by the slow backtracking implementations can be provided by the automata-based implementations at dramatically faster, more consistent speeds.

The next article in this series, "Regular Expression Matching: the Virtual Machine Approach," discusses NFA-based submatch extraction. The final article, "Regular Expression Matching in the Wild," examines a production implementation.

Acknowledgements

Lee Feigenbaum, James Grimmelmann, Alex Healy, William Josephson, and Arnold Robbins read drafts of this article and made many helpful suggestions. Rob Pike clarified some of the history surrounding his regular expression implementation. Thanks to all.

References

[1] L. Peter Deutsch and Butler Lampson, "An online editor," Communications of the ACM 10(12) (December 1967), pp. 793–799. http://doi.acm.org/10.1145/363848.363863

[2] Ville Laurikari, "NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions," in Proceedings of the Symposium on String Processing and Information Retrieval, September 2000. http://laurikari.net/ville/spire2000-tnfa.ps

[3] M. Douglas McIlroy, "Enumerating the strings of regular languages," Journal of Functional Programming 14 (2004), pp. 503–518. http://www.cs.dartmouth.edu/~doug/nfa.ps.gz (preprint)

[4] R. McNaughton and H. Yamada, "Regular expressions and state graphs for automata," IRE Transactions on Electronic Computers EC-9(1) (March 1960), pp. 39–47.

[5] Paul Pierce, "CTSS source listings." http://www.piercefuller.com/library/ctss.html (Thompson's QED is in the file com5 in the source listings archive and is marked as 0QED)

[6] Rob Pike, "The text editor sam," Software—Practice & Experience 17(11) (November 1987), pp. 813–845. http://plan9.bell-labs.com/sys/doc/sam/sam.html

[7] Michael Rabin and Dana Scott, "Finite automata and their decision problems," IBM Journal of Research and Development 3 (1959), pp. 114–125.http://www.research.ibm.com/journal/rd/032/ibmrd0302C.pdf

[8] Dennis Ritchie, "An incomplete history of the QED text editor." http://plan9.bell-labs.com/~dmr/qed.html

[9] Ken Thompson, "Regular expression search algorithm," Communications of the ACM 11(6) (June 1968), pp. 419–422. http://doi.acm.org/10.1145/363347.363387 (PDF)

[10] Tom Van Vleck, "The IBM 7094 and CTSS." http://www.multicians.org/thvv/7094.html

Discussion on reddit and perlmonks and LtU

订阅：博文 (Atom)

WanZheng

2010年6月27日星期日

HOWTO BUILD A CROSS COMPILER WITH EMERGE !

2010年6月17日星期四

49个瓶子你选哪个？很准的心理测试~

2010年6月12日星期六

G1手机上的VOIP之旅 - SIP Server + SipDroid

G1手机上的VOIP之旅 - SIP Server + SipDroid

2010年6月10日星期四

悼念方老师 (zz)

2010年6月8日星期二

数据分析

号外！国内牛人超酷3D引擎超越虚幻3 (转载)

Regular Expression Matching Can Be Simple And Fast

Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox
rsc@swtch.com
January 2007

Introduction

Regular Expressions

Finite Automata

Converting Regular Expressions to NFAs

Regular Expression Search Algorithms

Implementation

Implementation: Compiling to NFA

Implementation: Simulating the NFA

Performance

Caching the NFA to build a DFA

Real world regular expressions

History and References

Summary

Acknowledgements

References

博客归档

2010年6月27日星期日

2010年6月17日星期四

2010年6月12日星期六

G1手机上的VOIP之旅 - SIP Server + SipDroid

2010年6月10日星期四

2010年6月8日星期二

Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox rsc@swtch.com January 2007

Introduction

Regular Expressions

Finite Automata

Converting Regular Expressions to NFAs

Regular Expression Search Algorithms

Implementation

Implementation: Compiling to NFA

Implementation: Simulating the NFA

Performance

Caching the NFA to build a DFA

Real world regular expressions

History and References

Summary

Acknowledgements

References

博客归档

Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox
rsc@swtch.com
January 2007