1. 中心简介

1.1. 中文

清华大学北京信息科学与技术国家研究中心/可信软件与大数据研究部/大数据驱动的知识管理与决策团队(原清华大学信息技术研究院 WEB与软件技术研究中心)的工作目标是解决国家信息化建设过程中数据科学与工程相关研究问题,特别是其中的“卡脖子”问题,即具有大数据支撑能力的国产数据库系统研发及核心关键技术。以国家大数据发展战略为导向,注重可持续发展,积极开展国内外产学研交流与合作,研发具有自主知识产权的国产数据库系统和大数据支撑平台,努力成为国家相关产业数字化转型核心技术及支撑软件的领先者。

团队的研究方向包括两个方面。

  1. 国产数据库系统和数据库核心关键技术:提出面向列存储的基于稀疏索引的增量压缩方法,研发了国内第一个列式数据库Huabase;提出行列混存的自适应物理优化方法,在提高相似查询的速度(提高2倍以上)和近似查询的存储开销(降低了1个数量级以上)方面处于国际先进水平,研究成果发表在计算机学会A类期刊TKDE及会议ICDE上。受铁道部委托,考察投标单位IBM基于Z系列的订票系统,提炼其高负载和低延迟等核心技术并开发原型,为12306自主研发提供了核心支撑。与上海证券交易所联合,提出了可折叠的多Paxos的低延迟方案,将订单复制延迟降低到百微秒级。提出CrowdMed-II区块链架构、和PVR与PPVR智能合约,支持基于区块链的医疗健康数据可信共享方案,与性能基准PPR线性增长的Gas消耗比,近乎常量,先后在ICSH、WAIM-APWeb会议上获得Student Best Paper Runner-up奖、申请相关发明专利6项。

  2. 面向领域的大数据管理与分析:面向数据和知识密集型的关键应用领域,包括医疗健康、电子政务、数字图书馆和在线教育,在提供通用大数据管理的基础上,针对不同领域的数据特点和服务需求,提供可配置、易扩展的数据管理和分析功能。具体包括在医疗知识图谱构建方面处于国际领先地位,并在心血管和膝骨关节炎两类特定疾病上的医疗事实图谱构建的案例验证了该框架的有效性,研究成果发表在中科院一区期刊IPM上;支持构建了安贞医院的心血管疾病大数据管理和分析平台,数据体量达到130TB覆盖区域人口超过1100万、2000余万次就诊记录、160余家医疗机构;在区域协同救治方面将涉及医疗机构的平均D2B时间从110分钟缩短至60分钟以内,目前已在86家不同级别医疗机构实际部署使用;提出基于电子档案身份证的EAID-PKI安全模型,比传统模型在安全性和凭证性上得到综合增强;参与国家标准规范建设,提出了目录和交换体系,以及政府信息化架构,规范了共享交换领域占比最多的非结构化数据的管理与服务关键环节共性要求;从共享的基础设施和硬件层面提出了高性能安全存储的方法,形成可信固态硬盘技术方案,对性能的影响控制在运行总开销3%以内。

团队主持和参加了国家和省部级的科研项目20余项,包括国家重点研发计划、新一代人工智能重大项目、国家973项目、国家863项目、国家科技支撑计划、国家自然科学基金、中国下一代互联网示范工程(CNGI)、铁道部基金等。发表论文 200余篇、其中顶级期刊和会议论文15篇,拥有发明专利20余项、出版专著7部。作为清华唯一推荐项目获得工信部2020大数据产业发展示范试点项目。获2020年中国产学研合作创新成果二等奖。部分工作收录于由人民网党委书记、董事长、总裁叶蓁蓁主编的《2020中国区块链应用发展蓝皮书》技术创新篇中的《区块链与大数据融合创新》,并被执行主编人民网副总才潘建在总报告中举例引用“2019年,我国大量的科研机构、产业企业等探索区块链与大数据、物联网、人工智能、5G等技术的融合应用路径。例如:由清华大学信息技术研究院与合作企业推出的2861互联网感知大数据系统就将大数据与区块链技术结合,基于互联网感知数据和区块链可信数据的分布式共享,有助于形成无人为干预、客观、准确、及时的自动化监测体系。”

1.2. English

The PURPOSE of the Web and Software Technology (WeST) R&D center is to provide technical supports for informationization software projects and related industries which are significant for the national economy and the sustainable development of the society based on the national strategy requirements and the national economic development.

The center's STRATEGY is to serve the construction of China's informationization infrastructure and the development of the information industry, emphasize on sustainable development, communicate and collaborate with international partners, and develop large-scale support software and system with independently intellective property right for serving China’s informationization construction.

WeST hosts ten research labs, namely Digital Library/Archives Lab, Smart Traffic Lab of IoT, Big Data and Cloud Platform Lab, Electronic Commerce and Transaction Systems Lab, Defense Middleware Technologies Lab, Data Engineering Lab, Semantic Web and Knowledge Engineering Lab, Software Engineering and Testing Lab.

WeST is currently carrying out researches in five aspects: (1) Key technologies of massive data storage system in complex application environment. (2) Key technologies of big data management and analysis based on different areas. (3) Key technologies of E-Commerce and E-Government that supporting large-scale concurrent transactions. (4) Service platform that supporting data-driven and knowledge engineering. (5) Defense data supporting platform based on distributed file system and service middleware.

The core achievement of WeST is HUADING Big Data Management and Analysis Platform, which includes four parts: (1) HUADING-C that focuses on the structured data management based on the column storage. (2) HUADING-U that focuses on the non-structured data management based on data tag. (3) HUADING-S that focuses on the distributed file management based on cluster. (4) HUADING-K that provides knowledge management, large-scale parallel data mining and analysis.

Application areas of WeST include Digital Archives/Library, Electronic Government, Electronic Commerce, Cloud Computing, Smart Grid, Internet Public Opinion Analysis, Internet of Things (IoT), Smart Traffic and Digital Medical & Health and so on. Now, WeST R&D center has lots of research and development (R&D) cooperation with more than 30 enterprises and institutions at home and abroad.

WeST focuses on cross-disciplinary and integration, and actively carries out R&D cooperation with domestic and foreign institutions and enterprises. WeST has directed over and participated in lots of national and provincial research projects, and has achieved some outstanding R&D achievements. WeST R&D center has undertaken some national research projects such as National Basic Research Program of China (973 Program), National High Technology Research and Development Program of China (863 Program), the National Key Technology R&D Program, the National Natural Science Foundation of China, China Next Generation Internet Program and Foundation of the Ministry of Railways, and so on.

2. 研发情况

Web与软件技术研究中心已经建立了一个12台服务器规模的“云计算”环境,该环境已经部署了已经完成的以前的863研究项目所拥有的20TB的电子档案等大部分非结构化的海量原始数据及其现在实验室正在进行中的973研究项目所拥有的30TB左右的数字城市方面的多媒体数字资源方面的海量数据。

Web与软件技术研究中心在数据密集型计算、云计算及其电子政务方面取得的与本课题有关的研究工作积累总结如下:

Introduction (last edited 2024-10-10 06:57:27 by ZhangYong)