Advances in photonics, flexible electronics, emerging memories, etc. and Si electronics’ integration with these devices have enabled new classes of integrated circuits and systems with enhanced functionality, higher performance, or lower power consumption. Driving greater integration of such heterogeneous hybrid chips/systems can facilitate the continued proliferation of low-cost micro-/nano-systems for a wide range of applications. However, achieving their large-scale integration will require design ecosystem and design automation tools/methodologies much like those that enabled electronic integration in previous decades.
In this talk, I will briefly introduce two recent Manufacturing Innovation Institutes, on Integrated Photonics and on Flexible Hybrid Electronics respectively, and a research center on developing 3D Hybrid CMOS-memristor circuits, which bring together academia, industry, and government partners to increase design and manufacturing competitiveness in these areas. I will then describe some of our recent results and highlight the needs, challenges and opportunities in these areas.
Multicores have been attracting much attention to improve performance and reduce power consumption of computing systems, from embedded to supercomputing systems. To obtain high performance and low power on multicores, co-design of hardware and software is essential. Especially architecture supports for parallelizing and power reducing compiler are very important. This talk first introduces a parallelizing, memory usage optimizing and power reducing compiler and its performance on various multicores from Intel, IBM, arm, Fujitsu, Infineon, and Renesas for various applications including multimedia, automobile, cancer treatment, and earthquake simulation. It next explains architecture supports for the compiler, such as, global address space, data and group barrier synchronization, vector accelerators, data transfer controllers, and power control using DVFS and Clock and Power Gating. The hardware and software co-design allows us not only high performance and low power but also short development period and low development cost of parallel software.
Place and route at advanced process node is becoming much more complicated than ever, to meet both timing and DRC closure need multiple core engines to co-work seamlessly well. Traditionally it is very hard problem to decide a solution finally works or not along the flow, the machine learning method opens a door to give better correlated result in the flow. We have constantly seen the improvement trend in our product development.
Power density will stay a major challenge for the foreseeable future. Despite orders-of-magnitude-improved efficiency, power consumption per area is rising, mainly due to the limits of voltage scaling. To investigate the physical implications of high power densities, we must distinguish between peak and average temperatures and temporal and spatial thermal gradients because they trigger circuit-aging mechanisms and eventually jeopardize the reliability of an on-chip system.
The talk starts by presenting some basic interdependencies in the triangle of power density, circuit aging and reliability and continues with solutions to mitigate the problem via, among others, power density-aware resource management, thermal save power (TSP), efficient power budgeting as well as “Aging Aware Boosting”.
Self-awareness has a long history in biology, psychology, medicine, engineering and (more recently) computing. In the past decade this has inspired new self-aware strategies for emerging computing substrates (e.g., complex heterogeneous MPSoCs) that must cope with the (often conflicting) challenges of resiliency, energy, heat, cost, performance, security, etc. in the face of highly dynamic operational behaviors and environmental conditions. Earlier we had championed the concept of CyberPhysical-Systems-on-Chip (CPSoC), a new class of sensor-actuator rich many-core computing platforms that intrinsically couples on-chip and cross-layer sensing and actuation to enable self-awareness. Unlike traditional MPSoCs, CPSoC is distinguished by an intelligent co-design of the control, communication, and computing (C3) system that interacts with the physical environment in real-time in order to modify the system’s behavior so as to adaptively achieve desired objectives and Quality-of-Service (QoS). The CPSoC design paradigm enables self-awareness (i.e., the ability of the system to observe its own internal and external behaviors such that it is capable of making judicious decision) and (opportunistic) adaptation using the concept of cross-layer physical and virtual sensing and actuations applied across different layers of the hardware/software system stack. The closed loop control used for adaptation to dynamic variation -- commonly known as the observe-decide-act (ODA) loop -- is implemented using an adaptive, reflective middleware layer.
In this talk I will present a case study of this adaptive, reflective middleware layer using a holistic approach for performing resource allocation decisions and power management by leveraging concepts from reflective software. Reflection enables dynamic adaptation based on both external feedback and introspection (i.e., self-assessment). In our context, this translates into performing resource management actuation considering both sensing information (e.g., readings from performance counters, power sensors, etc.) to assess the current system state, as well as models to predict the behavior of other system components before performing an action. I will summarize results leveraging our adaptive-reflective middleware toolchain to i) perform energy-efficient task mapping on heterogeneous architectures, ii) explore the design space of novel HMP architectures, and iii) extend the lifetime of mobile devices.
Shopping is widely considered as a relaxing leisure activity. However, grocery shopping can be a frustrating experience for those with visual impairment. While getting to a grocery shop itself is not as much of a challenge for them, locating and picking the items in the grocery shelf becomes a task as challenging as picking a needle from the haystack. Imagine picking up five items for your dinner recipe from a typical grocery store in the US that carries around 35,000 unique items and can have more than 30 aisles spanning 45,000 square meters. This talk will showcase synergistic advances in algorithms, architectures and interface design for assisting those with visual impairment to do shopping. The talk will focus on multiple energy-efficient solutions that consider the battery life time of the vision system.
Deep learning algorithms such as Convolution Neural Network (CNN) is fast becoming the critical part of image perceptions in embedded vision applications in the automotive, drones, surveillance and industrial vision markets. Applications include multi-object detection, semantic segmentation and image classification. However, when scaling these networks to modern resolutions like HD and 4K, the computational requirements for real-time system could easily go over 10 TFLOPS consuming hundreds of watts of power, which is simply unacceptable for most edge applications. In this talk, we will describe a network/weight pruning methodology that achieves over 10 times performance gain on Zynq Ultrascale+ with very small accuracy loss. The network inference running on Zynq Ultrascale+ has achieved 19 TFLOPS-equivalent of the original SSD network in less than 10W.
Deep Neural Networks (DNNs) are computation intensive. Without efficient hardware implementations of DNNs, many promising AI applications will not be practically realizable. In this talk, we will analyze several challenges facing the AI community for mapping DNNs to hardware accelerators. Especially, we will evaluate FPGA's potential role for accelerating DNNs for both the cloud and edge devices. Although FPGAs can provide desirable customized hardware solutions, they are difficult to program and optimize. We will present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include automated hardware/software co-design, the use of configurable DNN IPs, resource allocation across DNN layers, smart pipeline scheduling, Winograd and FFT techniques, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module (GoogleNet) for face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. We will also present some of our recent work on developing new DNN models and data structures for achieving higher accuracy for several interesting applications such as crowd counting, genomics, and music synthesis.
In the past decades, the computer technology comes to a rapid increasing new ear, and the silicon-based high performance processors give the main impetus. To be a member of the first group of CPU design houses, Phytium Technology Co., Ltd. is a fast-growing Chinese IC design company, and is dedicated to design, manufacture high performance and low power CPU chips as well as services around the products.
Security has become a critical design challenge for modern electronic hardware. With the emergence of the Internet of Things (IoT) regime that promises exciting new applications from smart cities to connected autonomous vehicles, security has come to the forefront of the system design process. Recent discoveries and reports on numerous security attacks on microchips and circuits violate the well-regarded concept of hardware trust anchors. It has prompted system designers to develop wide array of design-for-security and test/validation solutions to achieve high security assurance for electronic hardware, which supports the software stack. At the same time, emerging security issues and countermeasures have also led to interesting interplay between security, verification, and interoperability. Verification of hardware for security and trust at different levels of abstraction is rapidly becoming an integral part of the system design flow. The global economic trend that promotes outsourcing of design and fabrication process to untrusted facilities coupled with the prevalent practice of system on chip design using untrusted 3rd party intellectual property blocks (IPs), has given rise to the critical need of trust verification of IPs, system-on-chip design, and fabricated chips. The talk will also cover spectrum of security challenges for IoTs and describe emerging solutions in creating secure trustworthy hardware that can enable IoT security for the mass.
Thanks to still ever increasing advances in manufacturing technology, we can implement ever more complex systems. System-level design is essential to deal with the enormous complexity of today’s advanced systems. At the same time, fast time to market is as essential as ever. And robustness and reliability are increasing in importance, as the focus of the semiconductor industry shifts to applications with high safety requirements such as automotive. High-level system models, known as Virtual Prototypes, are essential to meet all these challenges. Fault injection is the most common technique to evaluate system robustness.
I will outline some recent progress in Virtual Prototypes for safety evaluations. I will especially discuss how both high performance and high accuracy in fault injection can be achieved at the same time. Robustness evaluation has to be extended to Firmware in addition to Hardware. Future challenges for system design will conclude my presentation.