Vertical Arbitration-Free 3D NoCs
In a 3D network-on-chip (NoC), the vertical inter-layer communication channel plays a critical role in defining the network performance. In this work, an arbitration-free design for the vertical channels-utilizing through silicon vias (TSVs) or inductively coupled wireless links - is proposed. The proposed vertical arbitration-free 3D NoC is compared with other 3D network architectures using both synthetic traffic traces and parallel application benchmark suites. The results of the analysis demonstrate upto 41% increase in the network throughput compared to a 3D NoC utilizing vertical arbitration. The proposed NoC provides savings upto 10% and 22.5% in energy per flit and area, respectively.
| Speaker: | Ankit More - Drexel Univ., Philadelphia, PA |
| Authors: | Ankit More - Drexel Univ., Philadelphia, PA |
| | Siddharth Nilakantan - Drexel Univ., Philadelphia, PA |
| | Mark Hempstead - Drexel Univ., Philadelphia, PA |
| | Baris Taskin - Drexel Univ., Philadelphia, PA |
TLM Modelling of 3D Stacked Wide I/O DRAM Subsystems
Three-dimensional stacked Wide I/O DRAMs have been proposed as a promising solution to overcome the pin-limited memory performance growth, the power vs. bandwidth dilemma and the Memory Wall. This new DRAM architecture and organisation requires a new generation of DRAM memory controllers. We present a new methodology using virtual platforms to model the backend of a 3D-DRAM memory subsystem with special SystemC TLM2.0 phase extensions. This methodology enables us to explore the complete desigspace of memory controllers at the system level at very fast simulation speeds with precise timing accuracy.
| Speaker: | Matthias Jung - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| Authors: | Matthias Jung - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| | Christian Weis - Technische Univ. Kaiserslautern, Kaiserslautern, Germany |
| | Norbert Wehn - Technische Univ. Kaiserslautern, Kaiserslautern, Germany |
Coverage of Compositional Property Sets under Reactive Constraints
Modules of systems-on-chips can be comprehensively verified by property checking together with a completeness analysis for the property set (e.g., by Complete Interval Property Checking, C-IPC). These formal techniques require modeling of the legal behavior of the module’s environment through reactive environment constraints. In this paper we address the validity of complete property suites when composing modules into a system. We provide a compositional reasoning framework determining that a system is completely verified if all modules are verified with C-IPC under reactive constraints. Our method discovered issues that could not be detected by the complete verifications of the submodules alone.
| Speaker: | Binghao Bao - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| Authors: | Binghao Bao - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| | Joerg Bormann - OneSpin Solutions GmbH, Munich, Germany |
| | Markus Wedler - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| | Dominik A. Stoffel - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| | Wolfgang Kunz - Univ. of Kaiserslautern, Kaiserslautern, Germany |
Precision Timed Systems Using TickPAD Memory
A lot of research attention has been spent on the static analysis of conventional architectures, however, the design of a predictable memory hierarchy for concurrent programs has received scant research attention. In particular, there has been minimal development of caches or scratchpads (SPMs) that can exploit the features of a concurrent programming language. We have developed a Precision timed (PRET) machine memory sub-system, called the TPM (TickPAD memory), for predictable and efficient execution of synchronous concurrent programs. TPMs are a hybrid between conventional caches and SPMs: they support dynamic loading of cache lines and static allocation of program points.
| Speaker: | Sidharta Andalam - Univ. of Auckland, Auckland, New Zealand |
| Authors: | Matthew Kuo - Univ. of Auckland, Auckland, New Zealand |
| | Partha S. Roop - Univ. of Auckland, Auckland, New Zealand |
| | Sidharta Andalam - Univ. of Auckland, Auckland, New Zealand |
| | Nitish Patel - Univ. of Auckland, Auckland, New Zealand |
Conducting Fast and Accurate MPSoC Virtual Platform Simulation with Parallel Out-of-Order Execution Approach
Virtual platform simulation is one popular simulation technique for validations of MPSoC. However, the complexity of modern MPSoCs leads to an unacceptably low speed on virtual platform simulation. To speed up the simulation, we propose a parallel out-of-order execution approach on multi-core host machines. The proposed approach includes a distributed memory exclusivity watch technique and a dynamic trace-driven simulation technique to reduce the overhead of synchronization and achieve high degree of parallelism. With the benefits of the proposed approach, the experimental results show that the speed-up can be as high as 123X while compared to the conventional lock-step simulation scheduling approach.
| Speaker: | Yu-Fu Yeh - Industrial Technology Research Institute, Taipei, Taiwan |
| Authors: | Yu-Fu Yeh - Industrial Technology Research Institute, Taipei, Taiwan |
| | Hsin-Cheng Lin - National Taiwan Univ., Taipei, Taiwan |
| | Chung-Yang (Ric) Huang - National Taiwan Univ., Taipei, Taiwan |
Skew-Preserving Discretization Algorithm for Clock Networks with Continuously-Sized Buffers
Clock skew is one of the most important factors in clock network synthesis. Using optimization for buffer sizing is effective in reducing area, power and skew, but the solution includes continuous sizes that need to be discretized. However, skew can change significantly with small changes in buffer sizes. We have developed a new algorithm to discretize buffer sizes while preserving the skew. The proposed algorithm is based on branch-and-bound technique used for integer programming and results in minimal changes in skew. Preliminary results on ISPD09 benchmarks show low power and runtime increase.
| Speaker: | Amin Farshidi - Univ. of Calgary, Calgary, AB, Canada |
| Authors: | Amin Farshidi - Univ. of Calgary, Calgary, AB, Canada |
| | Logan Rakai - Univ. of Calgary, Calgary, AB, Canada |
| | Laleh Behjat - Univ. of Calgary, Calgary, AB, Canada |
| | David Westwick - Univ. of Calgary, Calgary, AB, Canada |
Dependability Improvement by Partial Reconfiguration in SRAM-Based FPGAs for Critical Applications
The problem of adding fault tolerance when COTS devices are used in the implementation of critical systems is a requirement in increasing demand. In special when SRAM-based FPGAs are used to this end, it is an essential need due to the sensitivity of the configurable logic to charged particles. It becomes mandatory if the application level is aerospace where radiation effects can lead to change the implemented hardware and consequently to get erroneous systems. This work deals with an architectural approach to combine dynamic partial reconfiguration as an efficient mechanism to correct faults and to increase flexibility in hardware utilization.
| Speaker: | Luis Andrés Cardona - Univ. Autònoma de Barcelona, Bellaterra, Spain |
| Authors: | Luis Andrés Cardona - Univ. Autònoma de Barcelona, Bellaterra, Spain |
| | Yi Guo - Instituto de Microelectrónica de Barcelona, Spain |
| | Carles Ferrer - Univ. Autònoma de Barcelona, Bellaterra, Spain |
Supporting the Formalization of Requirements Using Techniques from Natural Language Processing
SysML enables the modeling of structure and behavior for circuits and systems and provides a systematic way to organize requirements. Initially, these requirements are usually provided by means of natural language. Afterwards, they are refined until a formal representation is obtained which is necessary for further processing e.g. in verification. In this work, we propose an approach that supports this formalization of requirements utilizing natural language processing techniques. Results of a case study show that the desired formal description can be derived automatically or with few interactions only.
| Speaker: | Mathias Soeken - Univ. of Bremen, Bremen, Germany |
| Authors: | Mathias Soeken - Univ. of Bremen, Bremen, Germany |
| | Robert Wille - Univ. of Bremen, Bremen, Germany |
| | Eugen Kuksa - Univ. of Bremen, Bremen, Germany |
| | Rolf Drechsler - Univ. of Bremen, Bremen, Germany |
Troubleshooting Performance Violations at System Level using Data Mining
Diagnosing performance and throughput violations is one of the biggest challenges in transaction level modeling of systems. We use a data mining approach to infer frequent patterns from transaction traces for localizing the root causes of latency or throughput violations. Our approach involves episode mining. We also extract domain knowledge from transaction traces to restrict the search space and increase the effectiveness of the mining results. We provide a detailed case study for diagnosing performance violations of a system level experimental platform and show that our data mining approach can exactly pinpoint the root cause of various performance bottlenecks.
| Speaker: | Shobha Vasudevan - Univ. of Illinois at Urbana-Champaign, Urbana, IL |
| Authors: | Lingyi Liu - Univ. of Illinois at Urbana-Champaign, Urbana, IL |
| | Xuanyu Zhong - Univ. of Illinois at Urbana-Champaign, Urbana, IL |
| | Xiaotao Chen - Huawei Technologies Co., Ltd., Bridgewater, NJ |
| | Shobha Vasudevan - Univ. of Illinois at Urbana-Champaign, Urbana, IL |
A Design Space Exploration Prototype with Multi-Layer Methodology for Life-Critical Biomedical Engineering Applications
We present an initial implementation of a solution for fast, automatic, and multi-objective exploration of software design space of a set of bio-medical engineering applications. Our solution optimizes the design in least-time while keeping up with standard service-levels of the application layer. The solution can achieve such goals via using our multi-layer methodology and our multi-objective system tuner (MOST). The initial application is the analysis of patients’ heart electrocardiogram (EKG) signals. The prototype can generate a cluster of operating designs, and our preliminary results show that it decreases the design time by a factor of 10X with minimal error.
| Speaker: | Iyad Al Khatib - Politecnico di Milano, Milano, Italy |
| Authors: | Iyad Al Khatib - Politecnico di Milano, Milano, Italy |
| | Edoardo Paone - Politecnico di Milano, Milano, Italy |
| | Sotirios Xydis - Politecnico di Milano, Milan, Italy |
| | Vittorio Zaccaria - Politecnico di Milano, Milano, Italy |
| | Gianluca Palermo - Politecnico di Milano, Milano, Italy |
| | Cristina Silvano - Politecnico di Milano, Milano, Italy |
A General Variable-Latency VLSI Design Methodology Based on Critical Path Identification and Completion Prediction
The ever-increasing parametric variations in the latest VLSI technologies increase performance variability and hinder performance improvement in traditional guard-banding-based synchronous VLSI design. In this paper we present a general variable-latency VLSI design methodology based on critical path identification and completion prediction. We present experimental results showing advantage of the proposed methodology in terms of area, power consumption and performance based on MCNC benchmark circuits.
| Speaker: | Quiyaam F. Mohammed - Univ. of Texas at San Antonio, TX |
| Authors: | Kokila Dodda - Univ. of Texas at San Antonio, TX |
| | Quiyaam F. Mohammed - Univ. of Texas at San Antonio, TX |
| | Bao Liu - Univ. of Texas at San Antonio, TX |
Work-in-Progress – AgiES: Agile Methods for Embedded System Development
AgiES project aims to develop and utilize agile methods for the development of embedded systems covering all of its parts such as electronic hardware, hardware dependent software, and digital integrated circuit design. Agile philosophy is famous in the field of software engineering but rarely used in the development of embedded systems due to the more rigid nature of it. Our goal is to gather a toolbox of agile practices which may be adopted by teams developing embedded systems. These practices originate from principles presented in Agile Manifesto and they are said to improve development team productivity and well-being at work.
| Speaker: | Ville Rantala - Univ. of Turku, Turun yliopisto, Finland |
| Authors: | Ville Rantala - Univ. of Turku, Turun yliopisto, Finland |
| | Matti Kaisti - Univ. of Turku, Turun yliopisto, Finland |
| | Tuomas Mäkilä - Univ. of Turku, Turun yliopisto, Finland |
| | Sami Hyrynsalmi - Univ. of Turku, Turun yliopisto, Finland |
| | Teijo Lehtonen - Univ. of Turku, Turun yliopisto, Finland |
Customized Physical Design for Partitioned On-Chip Memory Module in FPGA
Recently there has been an emerging trend in using FPGAs for computation because of the significant performance/power efficiency. However, the time-consuming physical design has been a major bottleneck for the entire system. While traditional physical design algorithms are targeting random logic, circuits generated by high-level synthesis usually contain well-defined structures. In this paper, partitioned on-chip memory module is used as an example to show the benefit of customized physical design for these well-defined generated circuits. Experimental results show that both the effectiveness and efficiency of the physical design algorithms can be improved if customized for partitioned on-chip memory module.
| Speaker: | Guojie Luo - Peking Univ., Beijing, China |
| Authors: | Cong Yan - Peking Univ., Beijing, China |
| | Peng Li - Peking Univ., Beijing, China |
| | Guojie Luo - Peking Univ., Beijing, China |
Queueing Theory Analysis of Memory Architectures
This paper describes a general framework for the performance analysis of various classes and architectures of computer memory using queueing theory techniques. Although conventional single-port memory architectures have been studied within the framework of queuing theory, this classical work is insufficient to model and analyze (i) multi-port memories, due to the inherent parallelism of simultaneous read/write requests and (ii) emerging non-volatile technologies with asymmetric read/write latencies. Our work provides a general framework to analyze and compare such advanced memory architectures in terms of performance metrics such as the average number of waiting requests and average queue delay.
| Speaker: | David Dgien - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | David Dgien - Univ. of Pittsburgh , Pittsburgh, PA |
| | Jiayin Li - Univ. of Pittsburgh , Pittsburgh, PA |
| | Kartik Mohanram - Univ. of Pittsburgh , Pittsburgh, PA |
A Multi-Task Scheduling and Allocation Method for Reliable Network-on-Chip
When the number of nodes in a network-on-chip (NoC) increases, the possibility of node failures may be increased. To continue the service despite failure of some nodes, a reliable NoC needs to be developed. This paper proposes a multi-task scheduling and allocation method to realize a reliable NoC. For the given upper bound of the number of failed NoC nodes, the proposed method enumerates possible failure patterns. Then, for each failure pattern, multi-task scheduling and allocation based on the given multiplicities of scheduling and allocation are performed with given constraints such as the time constraint of a given application.
| Speaker: | Hiroshi Saito - Univ. of Aizu, Aizu-Wakamatsu, Japan |
| Authors: | Hiroshi Saito - Univ. of Aizu, Aizu-Wakamatsu, Japan |
| | Tomohiro Yoneda - National Institute of Informatics, Chiyoda-ku, Japan |
| | Yuichi Nakamura - NEC Corp., Kawasaki, Japan |
Extending Run-Time Resource Management to Optimize Heap Memory Utilization of Embedded Applications
Modern computer systems allow the concurrent execution of a massive number of applications. As performance and quality-of-service become a fundamental request, applications need to be more aware of the runtime workload and tune their requirements accordingly. In this work we focus on the dynamic memory management of runtime-tunable applications via a system-wide runtime resource manager. We examine a multimedia application, which is reconfigured during runtime due to the variability of the system workload.
| Speaker: | Ioannis Koutras - National Technical Univ. of Athens, Athens, Greece |
| Authors: | Iraklis Anagnostopoulos - National Technical Univ. of Athens, Athens, Greece |
| | Ioannis Koutras - National Technical Univ. of Athens, Athens, Greece |
| | Patrick Bellasi - Politecnico di Milano, Milano, Italy |
| | Alexandros Bartzas - National Technical Univ. of Athens, Athens, Greece |
| | William Fornaciari - Politecnico di Milano, Milano, Italy |
| | Dimitrios Soudris - National Technical Univ. of Athens, Athens, Greece |
Optimal Dimensioning and Configuration of Electrical Energy Storage Systems for Electric Vehicles
This paper proposes a methodology for the optimal dimensioning and configuration of electrical energy storage systems for electric vehicles. We present a method for the determination of the optimal type of battery cells and the appropriate configurations in terms of the amount of cells based on various customized requirements, including the driving range, global emissions, rated power output, and installation space. For this purpose, an Integer Quadratically-Constrained Quadratic Programming problem is formulated and solved. Experimental results give evidence that our approach successfully solves the dimensioning and configuration problem of the electrical energy storage system for electric vehicles.
| Speaker: | Martin Lukasiewycz - TUM CREATE Ltd., Singapore |
| Authors: | Wanli Chang - TUM CREATE Ltd., Singapore |
| | Martin Lukasiewycz - TUM CREATE Ltd., Singapore |
| | Sebastian Steinhorst - TUM CREATE Ltd., Singapore |
| | Samarjit Chakraborty - Technical Univ. of Munich, Munich, Germany |
Understanding the Performance Impacts of Storage Class Memory for I/O-Intensive Workloads
Emerging non-volatile memory (NVM) blurs the line between memory and storage due to its DRAM-like access latency and byte-addressability, so called Storage Class Memory (SCM). SCM enables a new way of accessing storage and calls for a rethinking on I/O system design. In this paper, we take a quantitative approach to understand the impact of SCM on I/O subsystem. We conduct cycle-level full-system simulation to analyze the performance bottleneck of a SCM-based storage system connected to the memory bus. We find that reducing software overheads and streamlining data movement are critical to fully exploit the potential of SCM-based storage system.
| Speaker: | Chia-Lin Yang - National Taiwan Univ., Taipei, Taiwan |
| Authors: | Shun-Chih Yu - National Taiwan Univ., Taipei, Taiwan |
| | Yung-En Hsieh - National Taiwan Univ., Taipei, Taiwan |
| | YA-YUNN SU - National Taiwan Univ., Taipei, Taiwan |
| | Chia-Lin Yang - National Taiwan Univ., Taipei, Taiwan |
| | Hsiang-Pang Li - Macronix International Co., Ltd., Hsinchu, Taiwan |
FineDedup: A Fine-Grained Deduplication Technique for Flash-Based SSDs
Data deduplication is an effective solution in improving the lifetime of flash-based solid-state drives (SSDs) by preventing redundant data from being written. Existing deduplication techniques for SSDs, however, fail to fully eliminate potential redundant data because of their coarse-grained granularity. In this paper, we propose a fine-grained deduplication technique for SSDs, called FineDedup, that improves likelihood of eliminating redundant data. FineDedup also resolves technical difficulties caused by its finer granularity, i.e., increased memory requirement and read response time. Our results show that FineDedup reduces the amount of written data by up to 32% over existing techniques with negligible overheads.
| Speaker: | Taejin Kim - Seoul National Univ., Seoul, Republic of Korea |
| Authors: | Taejin Kim - Seoul National Univ., Seoul, Republic of Korea |
| | Sungjin Lee - Seoul National Univ., Seoul, Republic of Korea |
| | Jihong Kim - Seoul National Univ., Seoul, Republic of Korea |
Efficient Implementation of Virtual Coarse Grained Reconfigurable Arrays on FPGAs
Fine grained Field Programmable Gate Arrays (FPGA) suffer from high compilation times. To avoid this problem, Virtual Coarse Grained Reconfigurable Arrays (Virtual CGRA), or CGRAs implemented on FPGAs, have been proposed.
Conventional implementations of VCGRAs use functional FPGA resources, such as LookUp Tables, to implement the registers, switch blocks and other components that make the VCGRA configurable. We show that this is a large overhead that can often be avoided by mapping these components directly on lower level FPGA resources such as switch blocks and configuration memory.
| Speaker: | Dirk Stroobandt - Ghent Univ., Gent, Belgium |
| Authors: | Karel Heyse - Ghent Univ., Ghent, Belgium |
| | Tom Davidson - Ghent Univ., Ghent, Belgium |
| | Elias Vansteenkiste - Ghent Univ., Gent, Belgium |
| | Karel Bruneel - Ghent Univ., Ghent, Belgium |
| | Dirk Stroobandt - Ghent Univ., Gent, Belgium |
A DVFS Framework for Low-Power Embedded GPUs
Computational power of embedded graphics processing units (GPUs) in mobile system-on-chips has been increasing steadily to provide high quality user experience related to 2D and 3D graphics. Moreover, the architecture of embedded GPUs is evolving from graphic accelerators into streaming multiprocessors, which enables programmers to use GPUs for general parallel processing. In this paper, we propose a dynamic voltage and frequency scaling (DVFS) framework for low-power embedded GPUs. Our experimental results show that conventional processor DVFS policies can achieve power reduction of embedded GPUs with reasonable performance degradation.
| Speaker: | Daecheol You - Hanyang Univ., Seoul, Republic of Korea |
| Authors: | Daecheol You - Hanyang Univ., Seoul, Republic of Korea |
| | Youngho Ahn - Hanyang Univ., Seoul, Republic of Korea |
| | Ki-Seok Chung - Hanyang Univ., Seoul, Republic of Korea |
BCIBench: A Benchmarking Suite for EEG-Based Brain Computer Interfaces
Increased demands for applications of brain computer interface (BCI) have led to growing attention towards their low-power embedded processing architecture design. Most clinical and entertainment applications of BCI require wearable devices. Better understanding of application characteristics can lead to effective power optimization techniques for future wearable BCIs. In this paper, we introduce BCIBench, a benchmarking suite which includes a wide range of algorithms used for pre-processing, feature extraction and classification in BCI applications. We analyze the architectural characteristics of these algorithms. We provide insights into architectural components that can reduce the power consumption of embedded systems used for these applications.
| Speaker: | Ali Ahmadi - Univ. of Texas at Dallas, TX |
| Authors: | Ali Ahmadi - Univ. of Texas at Dallas, TX |
| | Roozbeh Jafari - Univ. of Texas at Dallas, TX |
Achieving Timing Closure in Ultra-Low Voltage Designs
As the supply voltage is down to ultra-low level, timing closure becomes a serious challenge in the use of multiple power modes. In this paper, we incorporate the synthesis of clock network and data path to satisfy all timing constraints in ultra-low voltage designs. Our methodology has two main approaches. First, we propose a low-power multi-power-mode adjustable delay buffer architecture to reduce clock skew with very small power consumption. Second, we propose the first multi-power-mode minimum padding technique to fix all hold violations in all power modes simultaneously. Experimental results show that integration of both approaches yields the best results.
| Speaker: | Wen-Pin Tu - Chung Yuan Christian Univ., Chung Li, Taiwan |
| Authors: | Wen-Pin Tu - Chung Yuan Christian Univ., Chung Li, Taiwan |
| | Chung-Han Chou - National Tsing Hua Univ., Hsinchu, Taiwan |
| | Shih-Hsu Huang - Chung Yuan Christian Univ., Chung Li, Taiwan |
| | Shih-Chieh Chang - National Tsing Hua Univ., Hsinchu, Taiwan |
| | Yow-Tyng Nieh - Industrial Technology Research Institute, Hsinchu, Taiwan |
| | Chien-Yung Chou - Industrial Technology Research Institute, Hsinchu, Taiwan |
Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems
Modern heterogeneous embedded platforms, composed of general purpose, digital signal and application specific processors, also include partial dynamic reconfigurable devices. Partial dynamic reconfiguration allows exploiting time multiplexing of the device area for accelerators, but imposes severe overheads in terms of area utilization and reconfiguration latency. In this paper we propose a heuristic based on Ant Colony Optimization (ACO) that simultaneously performs scheduling, mapping and linear placing of tasks on a heterogeneous system with reconfigurable devices, hiding reconfiguration overheads through prefetching. We show that our approach is more general and finds better solutions (16.5% in average) with respect to competing solutions.
| Speaker: | Francesco Regazzoni - Univ. of Lugano, Lugano, Switzerland |
| Authors: | Antonino Tumeo - Pacific Northwest National Laboratory, Richland, WA |
| | Francesco Regazzoni - Univ. of Lugano, Lugano, Switzerland |
A Hierarchical Scrubbing Method for Resistance Drift in Multi-Level Cell Phase-Change RAM
This paper aims at reducing the overhead of scrub operations required for multi-level cell phase-change RAM under the resistance drift problem. The proposed method is a hierarchical one since it manages the validity and value information at the granularity of page and cache block to reduce the frequency of scrub operations.
| Speaker: | Jina Yoon - Pohang Univ. of Science and Technology, Pohang, Republic of Korea |
| Authors: | Youngsik Kim - Pohang Univ. of Science and Technology, Pohang, Republic of Korea |
| | Jina Yoon - Pohang Univ. of Science and Technology, Pohang, Republic of Korea |
| | Sungjoo Yoo - Pohang Univ. of Science and Technology, Pohang, Republic of Korea |
| | Sunggu Lee - Pohang Univ. of Science and Technology, Pohang 790-784, Republic of Korea |
Simple and Feasible Grid Routing Method for Self-Aligned Quadruple Patterning
Self-Aligned Quadruple Patterning (SAQP) is one of the candidates for sub-14nm node and beyond. In SAQP process, design must follow strict constraints tighter than in Litho-Etch-Litho-Etch-Litho-Etch process to make designers draw feasible SAQP layouts. We propose a new grid routing method for the spacer type SAQP process. Wafer image in SAQP is decomposed into primary, secondary and tertiary patterns. The proposed method draws primary and secondary patterns on the grid structure after tertiary patterns are drawn. Two pattern cutting techniques are applied instead of trimming process. We show the routing results utilizing maze router.
| Speaker: | Chikaaki Kodama - Toshiba Corp., Yokohama, Japan |
| Authors: | Chikaaki Kodama - Toshiba Corp., Yokohama, Japan |
| | Hirotaka Ichikawa - Toshiba Microelectronics Corp., Kawasaki, Japan |
| | Fumiharu Nakajima - Toshiba Corp., Yokohama, Japan |
| | Koichi Nakayama - Toshiba Corp., Yokohama, Japan |
| | Shigeki Nojima - Toshiba Corp., Yokohama, Japan |
| | Toshiya Kotani - Toshiba Corp., Yokohama, Japan |
Power Mananegment Based on Frame Rate for Mobile Virtualization
In this paper, we proposed a low power solution using frame rate for mobile virtualization such as L4Android. Virtualization is well known to be efficient for resource management and security; however, it is hard to apply legacy power management solutions because of its system complexity. To decrease the complexity of power management, we used the frame rate as a performance measure because graphic processing is closely related to the quality of user experience (QoE) of smartphone. If a certain level of frame rate is satisfied, it is possible to reduce the performance without the degradation of QoE.
| Speaker: | Youngho Ahn - Hanyang Univ., Seoul, Republic of Korea |
| Authors: | Youngho Ahn - Hanyang Univ., Seoul, Republic of Korea |
| | Daecheol You - Hanyang Univ., Seoul, Republic of Korea |
| | Ki-Seok Chung - Hanyang Univ., Seoul, Republic of Korea |
Feasibility Analysis for Temperature Constrained Real-Time Scheduling on Multi-Core Platforms
In this paper, we study the problem on how to determine if a periodic voltage schedule for a multi-core system can satisfy a given maximum temperature constraint. We first develop a novel method to quickly calculate the temperature for a periodic schedule, and then develop three feasibility checking conditions to test if a periodic voltage schedule exceeds the peak temperature limit. Our experimental results show that the proposed temperature calculation method has no more than 1.5oC difference in accuracy and 100 times faster in computational time, and also demonstrate the effectiveness of our feasibility checking conditions.
| Speaker: | Ming Fan - Florida International Univ., Miami, FL |
| Authors: | Ming Fan - Florida International Univ., Miami, FL |
| | Vivek Chaturvedi - Florida International Univ., Miami, FL |
| | Shi Sha - Florida International Univ., Miami, FL |
| | Gang Quan - Florida International Univ., Miami, FL |
| | Meikang Qiu - Univ. of Kentucky, Lexington, KY |
Dynamic Tone Mapping on OLED Display Based on Video Classification
OLED screen is one of the most energy-consuming modules in a smartphone. In this work, we enhanced the OLED power models to consider the real-time factors like refresh frequency and scene changing. Our analysis showed that the power characteristic of an OLED screen displaying video streams have obvious patterns based on video classification and can be used as rule-based classifier. Under different video categories, we also proposed the corresponding dynamic tone mapping methods for OLED power reduction. Compared to other OLED power management techniques, our methods have lower computation overhead, better portability, and less dependency on the hardware features.
| Speaker: | Xiang Chen - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | Xiang Chen - Univ. of Pittsburgh , Pittsburgh, PA |
| | Zhan Ma - Samsung, Richardson, TX |
| | Felix C. Fernandes - Samsung, Richardson, TX |
| | Chun Jason Xue - City Univ. of Hong Kong, Hong Kong |
| | Yiran Chen - Univ. of Pittsburgh , Pittsburgh, PA |
Formal Representation of the Design Feature Variety in Analog Circuits
This paper presents a design knowledge representation structure to express the design feature variety in analog circuits. The insight is important to characterize the novel and common feature in a circuit compared to a set of alternative designs, the conditions in which existing design features can be reused to meet new specifications, and the exploration of new conceptual design solutions. The knowledge representation structure has three basic operators defined for pairs of circuits, circuit comparison, circuit instantiation-abstraction, and circuit feature combination. The knowledge representation structure for a set of state-of-the-art designs is discussed in the paper.
| Speaker: | Alex Doboli - State Univ. of New York, Stony Brook, NY |
| Authors: | Cristian Ferent - State Univ. of New York, Stony Brook, NY |
| | Alex Doboli - State Univ. of New York, Stony Brook, NY |
Configurable Timing Margin Monitor for Reliability, Test, and Debug
CMOS devices are susceptible to aging and process variations. We explore the design and use of in-situ sensors that actively monitor timing margin of speed-determining paths. A warning bit is set in the status register if the margin on monitored paths is below a configurable threshold. Optionally, paths that cause the warning can be recorded and scanned out. Applications include: (1) in-field flagging of warning to allow timely servicing in high-reliability applications, e.g., automotive and medical, and (2) cost-effective delay-fault testing and debug. We have designed and simulated novel circuits to facilitate this method. Silicon testing expected in 2013.
| Speaker: | Puneet Sharma - Freescale Semiconductor, Inc., Austin, TX |
| Authors: | Puneet Sharma - Freescale Semiconductor, Inc., Austin, TX |
| | Magdy Abadir - Freescale Semiconductor, Inc., Austin, TX |
High Speed Cycle Accurate Simulation for Cache-Incoherent MPSoCs
We present a new high-speed cycle-accurate multicore simulator addressing an important, neglected, category of multicore systems: deeply-embedded cache-incoherent systems. Many of these MPSoCs do not implement coherent caches, avoiding the hardware complexity, area, and energy overheads, and the resulting non-deterministic run-time. Exploiting this, we can parallelize the simulation while delivering accurate performance modelling and timing-accurate functional simulation. We present quantitative performance data for 1 to 64 core NoC systems, verifying simulation accuracy against FPGA implementations. We achieve average simulation rates of 4-MIPS on a real-world cluster-computing environment, with average timing error of 3% across a range of multi-programmed embedded benchmarks.
| Speaker: | Christopher Thompson - Univ. of Edinburgh, Edinburgh, United Kingdom |
| Authors: | Christopher Thompson - Univ. of Edinburgh, Edinburgh, United Kingdom |
| | Miles Gould - Univ. of Edinburgh, Edinburgh, United Kingdom |
| | Oscar Almer - Univ. of Edinburgh, Edinburgh, United Kingdom |
| | Nigel P. Topham - Univ. of Edinburgh, Edinburgh, United Kingdom |
A Force-Directed 3D IC Partitioning Algorithm
3D IC design is one of the challenging problems of today. 3D partitioning solutions can significantly impact manufacturability and performance of a circuit. In this work, a 3D partitioning technique is developed that reduces the number of TSVs by using force directed placement technique. A circuit is partitioned into several layers and a force directed placement problem is solved to find the optimal locations of the partitions. This partitioning solution is improved by using a proposed force directed simulated annealing technique. The proposed technique is tested on ISPD04 circuits, and show up 44% reduction in the number of TSVs.
| Speaker: | Aysa Fakheri Tabrizi - Univ. of Calgary, Calgary, AB, Canada |
| Authors: | Aysa Fakheri Tabrizi - Univ. of Calgary, Calgary, AB, Canada |
| | Laleh Behjat - Univ. of Calgary, Calgary, AB, Canada |
| | William Swartz - Timberwolf Systems, Inc., Dallas, TX |
Associative Processing Using Coupled Oscillators
We are using weakly coupled, non-linear oscillators as functional units to perform pattern matching in high dimensional vector spaces. Our goal is to exploit the capabilities of coupled oscillators to compute vector comparison operations and thus perform nearest neighbor (or K-nearest neighbor) search. Using coupled oscillators to perform pattern matching was introduced by Hoppensteadt and Izhikevich and shown to be able to form attractor basins at the minima of a Lyapunov energy function. Using a Hopfield learning rule, a network of oscillators described by this model can learn patterns and perform the functions of associative memory.
| Speaker: | Yan Fang - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | Yan Fang - Univ. of Pittsburgh , Pittsburgh, PA |
| | Donald M. Chiarulli - Univ. of Pittsburgh , Pittsburgh, PA |
| | Steven P. Levitan - Univ. of Pittsburgh , Pittsburgh, PA |
Extracting Temporal Assertions from Timing Specification Diagrams
Extracting assertion properties from data traces helps in reducing the manual effort and improving the functional coverage. However, previous techniques work in post-design stage and assume that design is always correct and behavior captured by dummy tests waveforms is complete. Our proposed technique extracts the assertion properties at pre-design stage from timing specification diagrams. These assertions act as reference model throughout the design phase and verify the specifications against the implementation. We create, move and expand temporal windows across waveform and apply logic minimization techniques to learn most precise assertions. We also evaluate our technique with standard AMBA-APB Bus IP.
| Speaker: | Varun Jain - Univ. of California, Santa Barbara, CA |
| Authors: | Varun Jain - Univ. of California, Santa Barbara, CA |
| | Tim Sherwood - Univ. of California, Santa Barbara, CA |
A Novel Time Division Multiplexing Control Mechanism for Bidirectional On-Chip Networks
Configuring a routing path and mitigating congestion are the keys to improve the performance of an on-chip network. Conventionally, high throughput was achieved by architectural improvement without considering the overhead of costs such as design effort, die area, and power consumption. A novel dynamically self-reconfigurable Time-Division-Multiplexing (TDM) control mechanism is proposed for the bidirectional Network-on-Chips (BiNoC). Experimental results showed that TDM-BiNoC effectively exploits the hardware resources without adding extra area-demanding communication channels. Network interconnection performance was therefore enhanced by more than 21% at only 6% overhead. Using our TDM mechanism, the impact of costly hardware is noticeably alleviated.
| Speaker: | Sao-Jie Chen - National Taiwan Univ., Taipei, Taiwan |
| Authors: | Chun-jen Wei - National Taiwan Univ., Taipei, Taiwan |
| | Yi-yao Weng - National Taiwan Univ., Taipei, Taiwan, Taiwan |
| | Wen-Chung Tsai - Industrial Technology Research Institute, Hsinchu, Taiwan |
| | Yuhen Hu - Univ. of Wisconsin, Madison, WI |
| | Sao-Jie Chen - National Taiwan Univ., Taipei, Taiwan |
Simulation and Analysis of Advanced Network Memory Architectures
This paper describes the development of simulators for the generation and analysis of memory access traces within advanced memory architectures found in a network processor. Our simulator integrates SimpleScalar, PacketBench, and DRAMSim2 to realize cycle-accurate simulation of architectures such as virtually pipelined memory by accurately modeling the environment in which network memory operates. Our preliminary results show that the characteristics (read/write, access rate, access pattern, etc.) of memory requests is highly dependent on the application being run, demonstrating that mathematical models are insufficient for the analysis of these advanced memory architectures.
| Speaker: | Nathan A. Hunter - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | Nathan A. Hunter - Univ. of Pittsburgh , Pittsburgh, PA |
| | Jiayin Li - Univ. of Pittsburgh , Pittsburgh, PA |
| | Kartik Mohanram - Univ. of Pittsburgh , Pittsburgh, PA |
Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems
Due to many attractive and unique properties, NAND flash memory has been widely adopted in mission-critical hard real-time systems and some soft real-time systems. However, the non-deterministic garbage collection makes it difficult to predict the system response time of each data request. This paper presents Lazy-RTGC that adopts free write buffer strategy and lazy partial garbage collection technology to provide the guaranteed system response time. We evaluate and compare our scheme with representative real-time NAND flash management schemes. Experimental results show that Lazy-RTGC can significantly improve both the average and the worst system performance with high space utilization.
| Speaker: | Qi Zhang - Nanjing Univ., Nanjing, China |
| Authors: | Qi Zhang - Nanjing Univ., Nanjing, China |
| | Xuandong Li - Nanjing Univ., Nanjing, China |
| | Linzhang Wang - Nanjing Univ., Nanjing, China |
| | Tian Zhang - Nanjing Univ., Nanjing, China |
| | Yi Wang - Hong Kong Polytechnic Univ., Hong Kong |
| | Zili Shao - Hong Kong Polytechnic Univ., Hong Kong |
A Multi-Level Variable Lumping Scheme for Robust Data Modeling in Distributed Sensing Environments
This paper presents a multi-level state variable lumping scheme to construct robust mathematical data models from data sampled through a network of sensing devices of limited resources, like bandwidth and buffer memory. The data models are partial differential equations. The scheme minimizes the modeling errors due to data losses and time delays. Experiments used the method for thermal modeling of ULTRASPARC Niagara T1 architecture, however, the technique can be used to model a large variety of physical phenomena.
| Speaker: | Alex Doboli - State Univ. of New York, Stony Brook, NY |
| Authors: | Anurag Umbarkar - State Univ. of New York, Stony Brook, NY |
| | Alex Doboli - State Univ. of New York, Stony Brook, NY |
ILPc: A Novel Approach for Scalable Timing Analysis of Synchronous Programs
Worst Case Reaction Time (WCRT) analysis is essential for validating the synchrony hypothesis for synchronous programs. In this paper, we develop a new ILP based WCRT analysis technique, called ILPc , that exploits the concurrency explicitly during the ILP formulation to avoid the state-space explosion problem. Using extensive benchmarking we demonstrate the efficacy of the approach: for complex programs, the ILPc is often orders of magnitude faster compared to the existing approaches, while achieving the same level of precision. This paper, thus, paves the way for scalable WCRT analysis of complex embedded systems designed using the synchronous approach.
| Speaker: | JiaJie Wang - Univ. of Auckland, Auckland, New Zealand |
| Authors: | JiaJie Wang - Univ. of Auckland, Auckland, New Zealand |
| | Partha S. Roop - Univ. of Auckland, Auckland, New Zealand |
| | Sidharta Andalam - TUM CREATE Ltd., Singapore |
Acceleration of SystemC/TLM simulations
The complexity of SystemC virtual prototyping is continuously increasing. Accelerating RTL/TLM SystemC simulations is essential to control future SoC development cost and time-to-market. In this paper, we present RAVES, a highly-parallel special-purpose multicore architecture that achieves simulation performance more efficiently by parallel execution of light-weight user-level threads on many small cores. We present a design study based on the hardware virtual prototype of RAVES processors running a co-designed custom SystemC kernel. Our evaluation suggests that a 64-core RAVES processor can deliver similar simulation performance as a high-end x86 processor at 3.5x less die area and 1.9x lower power.
| Speaker: | Nicolas Ventroux - CEA-LIST, Gif-Sur-Yvette, France |
| Authors: | Nicolas Ventroux - CEA-LIST, Gif-Sur-Yvette, France |
| | Julien Peeters - CEA-LIST, Gif-Sur-Yvette, France |
| | Tanguy Sassolas - CEA-LIST, Gif-Sur-Yvette, France |
| | James C. Hoe - Carnegie Mellon Univ., Pittsburgh, PA |
A New SAT-Based Approach for Equivalence Checking of Hardware-Dependent Low-Level Embedded System Software
This paper presents a novel approach to formally prove the equivalence of low-level hardware-dependent programs. Inspired by hardware verification techniques, a software miter is developed as the computational model to perform the verification task. Taking into account the reactive behavior the software exposes, two programs are shown to be equivalent if they exhibit the same input/output behavior. The approach relies on a bounded SAT-based paradigm which represents every handled program by a program netlist. Experimental results show the effectiveness of the proposed technique for industrial low-level software in relevant equivalence checking scenarios such as code porting and automated/manual code transformations.
| Speaker: | Carlos Villarraga - Technische Univ. Kaiserslautern, Kaiserslautern, Germany |
| Authors: | Carlos Villarraga - Technische Univ. Kaiserslautern, Kaiserslautern, Germany |
| | Bernard Schmidt - Technische Univ. Kaiserslautern, Kaiserslautern, Germany |
| | Joerg Bormann - OneSpin Solutions GmbH, Munich, Germany |
| | Dominik A. Stoffel - Univ. of Kaiserslautern, Kaiserslautern, Germany |
| | Wolfgang Kunz - Univ. of Kaiserslautern, Kaiserslautern, Germany |
Design Methodologies for 3D Mixed Signal Integrated Circuits: a Practical 8-bit SAR ADC Design Case
3D IC is a good candidate to address the design issues in conventional analog/digital mixed-signal IC design. In this work, we develop a feasible and easy-to-market design flow for 3D mixed-signal IC. By leveraging the 2D commercial EDA tools, we also propose novel mechanisms in both pre-simulation and post-simulation to evaluate the design of 3D mixed-signal IC. A differential successive approximation register analog-to-digital converter (SAR ADC) is designed on a two-layer stacked chip. The experimental results show that our 3D SAR ADC can achieve significant power reduction and provide higher SNDR and SFDR.
| Speaker: | Wulong Liu - Tsinghua Univ., Beijing, China |
| Authors: | Wulong Liu - Tsinghua Univ., Beijing, China |
| | Tao Zhang - Pennsylvania State Univ., State College, PA |
| | Xue Han - Tsinghua Univ., Beijing, China |
| | Yu Wang - Tsinghua Univ., Beijing, China |
| | Yuan Xie - Pennsylvania State Univ., Advanced Micro Devices, Inc., University Park, PA |
| | Huazhong Yang - Tsinghua Univ., Beijing, China |
An Adaptive Instruction Memory for Simultaneous Enhancement of Low Power and High Performance Computing
Handheld computing devices urgently require low-power and high-performance operations. Since billions transistors operate in a mobile computing chip, current technology only permits a trade-off between low-power and high-performance computing. An adaptive instruction memory (AIM) system and a backend code compiler for ARM instruction sets are developed for simultaneously enhancing energy efficiency and performance of mobile devices. The AIM system employs a small, simple, and fast L1 and/or L2 caches and achieve significant energy efficiency and performance improvement rates for three microarchitectures—Intel’s StrongARM (i.e., 5.45x/3.42x), Intel’s ARM Xscale (i.e., 4.97x/3.31x), and Compaq Alpha21264 (i.e., 5.32x/1.23x)—with MiBench.
| Speaker: | Yong Kyu Jung - Adaptmicrosys LLC, Erie, PA |
| Author: | Yong Kyu Jung - Adaptmicrosys LLC, Erie, PA |
Optimization for Static Timing Analysis in Test and Debug
In this work, we present and analyze optimization techniques applied to DFT modes Static Timing Analysis (STA). The developed approach has been applied to the Graphics North-Bridge (GNB) used in the latest generation of AMD Fusion™ Accelerated Processor Unit (APU). Its application allowed significant reduction in STA run-time. Besides its effectiveness for DFT-specific modes, the proposed solution is relevant to a wide range of STA applications. Detailed analysis of the approach and numeric results of its application are presented.
| Speaker: | David Akselrod - Advanced Micro Devices, Inc., Markham, ON, Canada |
| Authors: | Anatoly Normatov - Advanced Micro Devices, Inc., Markham, ON, Canada |
| | Pearl Liu - Advanced Micro Devices, Inc., Markham, ON, Canada |
| | Arie L. Margulis - Advanced Micro Devices, Inc., Markham, ON, Canada |
| | Rahul Shukla - Advanced Micro Devices, Inc., Markham, ON, Canada |
| | David Akselrod - Advanced Micro Devices, Inc., Markham, ON, Canada |
Dynamic Resolution in Distributed Cyber-Physical System Simulation
Cyber-physical systems challenge distributed simulation techniques for reasons of the heterogeneous tools used to model system components at different levels of abstraction, and the challenge of coordinating heterogeneous simulators with different notions of time. The SimConnect and SimTalk distributed cyber-physical system simulation tools meet the synchronization challenge of distributed simulation, but also offer dynamic resolution among coordinated simulators for tradeoffs in simulation speed versus accuracy. This paper discusses the dynamic resolution capabilities of SimConnect and SimTalk, and evaluates the tools in distributed simulation of a closed-loop motor control system. Results show selectable tradeoffs in speedup over the non-dynamic coordinations.
| Speaker: | Dylan Pfeifer - Univ. of Texas at Austin, TX |
| Authors: | Dylan Pfeifer - Univ. of Texas at Austin, TX |
| | Jonathan Valvano - Univ. of Texas at Austin, TX |
| | Andreas Gerstlauer - Univ. of Texas at Austin, TX |
Dynamic Power Reduction with Standard Cells Re-organization During "Logic Synthesis"
This is about the logic power reduction of data path oriented designs, over the "Logic Synthesis" process. Currently, the implementation tools in the synthesis and physical design area do not address this to a fully possible extent. This is especially true when the sequential and combinatorial logic are distributed irregularly. The idea here is, to increase the yield of multiple input standard cell gates during the synthesis and/or after the synthesis. The cells are re-organized such that, the gate nodal capacitance is decreased, in fact improving the other cost factors such as timing and area. About 20% dynamic power was improved.
| Speaker: | Vinay S. Adavani - Infinera Corp., Bangalore, India |
| Author: | Vinay S. Adavani - Infinera Corp., Bangalore, India |
A Formal Approach to DC Operating Point Analysis for Large Mixed Signal Circuits: Challenges and Opportunities
DC operating point analysis is a pivotal step in most circuit simulation tasks. Current tools are geared towards finding one DC operating point and cannot guarantee finding all such points. Recent advances in SMT (Satisfiability Modulo Theory) solvers have prompted new work in formal approaches to solve this problem but have been successfully applied only to extremely small circuits. In this work, we couple various abstraction techniques with SMT based search to find all DC equilibrium points for large circuits. While the problem is by no means solved, we outline several important observations and techniques towards making this problem tractable.
| Speaker: | Parijat Mukherjee - Texas A&M Univ., College Station, TX |
| Authors: | Parijat Mukherjee - Texas A&M Univ., College Station, TX |
| | Chirayu S. Amin - Intel Corp., Hillsboro, OR |
| | Peng Li - Texas A&M Univ., College Station, TX |
CyberPhysical System-on-Chip (CPSoC) : A Self-Aware Design Paradigm with Cross-Layer Virtual Sensors and Actuators
Cyber–physical systems (CPSs) are physical and engineered systems whose operations are monitored, coordinated, controlled, and integrated by a computing and communication core. We propose cyber-physical system-on-chips (CPSoC), a new class of multiprocessor system-on-chip (MPSoC), that inherits most features of MpSoC in addition to on-chip and cross-layer sensing and actuation to enable self-awareness within the observe-decide-act paradigm. Unlike tradition MPSoC designs, CPSoC differs primarily on the co-design of control, communication, and computing system that interacts with the physical environment in real-time in order to modify their behavior so as to adaptively achieve certain objectives and QoS illustrated with a cogent example.
| Speaker: | Santanu Sarma - Univ. of California, Irvine, CA |
| Authors: | Nikil Dutt - Univ. of California, Irvine, CA |
| | Nalini Venkatasubramanian - Univ. of California, Irvine, CA |
| | Alex Nicolau - Univ. of California, Irvine, CA |
| | Puneet Gupta - Univ. of California, Los Angeles, CA |
Implementing Wireless Communication Links For 3D ICs – The Microbump Way
Three-dimensional integrated circuit (3-D IC) technology for improving integration density has made great progresses, and its wide deployment in high performance computing systems has been envisaged during the recent years. We present a promising perspective of wireless communication links between 3-D ICs for reducing latency and internal pin counts. We propose a wireless link utilizing a fully Si-compatible micro-bump antenna, and demonstrate the feasibility of using micro-bumps placed between the layers of 3-D IC as a high-efficiency, compact and wide-bandwidth antenna. The wide-bandwidth antenna allows a highly flexible wireless link with a data capacity comparable to that of wireline link.
| Speaker: | Julia H. Lu - Purdue Univ., West Lafayette, IN |
| Authors: | Julia H. Lu - Purdue Univ., West Lafayette, IN |
| | Wing-Fai Loke - Purdue Univ., West Lafayette, IN |
| | Dimitrios Peroulis - Purdue Univ., West Lafayette, IN |
| | Byunghoo Jung - Purdue Univ., West Lafayette, IN |
High-Level Directives for Efficiently Utilizing Scratchpad Memory
Effectively utilizing scratchpad memory (SPM) for data objects is challenging and requires programmer intervention. It involves identifying (sections of) data objects that will benefit from being allocated in SPM, specifying memory allocations, and managing data transfers into and out of SPM for the common cases where the data objects do not fit into SPM. In this paper, we propose the use of high-level, programmer-provided compiler directives for specifying data objects that should be in SPM at a given point of time. The compiler uses this information to automatically manage the required memory allocations and data transfers to effectively utilize SPM.
| Speaker: | Ayodunni Aribuki - Univ. of Houston, Houston, TX |
| Authors: | Ayodunni Aribuki - Univ. of Houston, Houston, TX |
| | Eric Stotzer - Texas Instruments, Inc., Stafford, TX |
| | Ernst Leiss - Univ. of Houston, Houston, TX |
Microchips that Repair Themselves
Microchips that Repair Themselves As we continue to scale transistors, we have to accept the issue of atomic scale defects. Atomic scale defects broaden the failure distribution while we continue to increase the number of transistors on a die. The result is an increased probability of having a few signal paths on every die that cannot meet the reliability goals for the die. One potential solution is to give the chip the ability to detect this degradation and to take action to repair the damage. This paper will describe techniques for self-repair that can be accomplished with existing CMOS technology.
| Speaker: | Timothy E. Turner - College of Nanoscale Science and Engineering, Albany, NY |
| Author: | Timothy E. Turner - College of Nanoscale Science and Engineering, Albany, NY |
Bridging EDA and Instrumentation Interfaces and Design Flows
EDA Verification and Silicon Debug are two sides of the same process – product validation. Verification addresses at pre-silicon analysis and debug is primarily concerned with analysis of the silicon end product (both hardware and software). Integrating EDA verification and hardware instrumentation tools used for debug benefit all parties, by enabling new levels of perspective and reuse. Historically, EDA verification and instrumentation debug tools have few common interfaces, resulting in incompatible independent pre-silicon and in-silicon verification and analysis. Converging approaches and interfaces can provide more comprehensive and reusable analysis tools and capabilities for use throughout the product development process.
| Speaker: | Neal Stollon - HDL Dynamics, Dallas, TX |
| Author: | Neal Stollon - HDL Dynamics, Dallas, TX |
Non-Deterministic Evaluation of SPICE-like Simulation Algorithms on Distributed Systems
SPICE-like simulation algorithms have many intrinsically sequential elements. Attempts to parallelize these algorithms have resulted in limited speed-ups on multi-cores. We propose to distribute the solution of circuit equations across a large number of light-weight parallel processors. The aim is to avoid the construction of a global network matrix and thus localize all communications by distributed evaluation of Kirchhoff’s Current Law at each circuit node. The processors work asynchronously, performing the calculations for each node in a random (non-deterministic) order. Although this results in redundant calculations and slower convergence per node, results suggest that overall a speed-up can be obtained.
| Speaker: | Mansour R. Darabad - Univ. of Southampton, Southampton, United Kingdom |
| Authors: | Mansour R. Darabad - Univ. of Southampton, Southampton, United Kingdom |
| | Mark Zwolinski - Univ. of Southampton, Southampton, United Kingdom |
Pruning in System-Level Design Space Exploration Through MCM Analysis for PPN Networks
System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters are considered. Simulation-based DSE, in which different design instances are evaluated using system-level simulations, typically are computationally costly. In this paper we present an optimization technique that significantly reduces the number of simulations needed during system-level DSE. We propose an iterative design space pruning methodology based on Maximum Cycle Mean (MCM) analysis of a graph of different application mappings for Polyhedral Process Networks (PPN).
| Speaker: | Roberta Piscitelli - Univ. of Amsterdam, Amsterdam, The Netherlands |
| Authors: | Roberta Piscitelli - Univ. of Amsterdam, Amsterdam, The Netherlands |
| | Hristo Nikolov - Leiden Univ., Leiden, The Netherlands |
| | Andy D. Pimentel - Univ. of Amsterdam, Amsterdam, The Netherlands |
| | Todor Stefanov - Leiden Univ., Leiden, The Netherlands |
| | Sven van Haastregt - Leiden Univ., Leiden, The Netherlands |
The Detection of Malicious Data Attack on NAND Flash Storage System Based on Power Signature
The private information on the NAND flash storage system (NFSS) in consumer and mobile devices has become the primary target of malicious data attacks. Detecting and identifying the attacks are generally costly and significantly slow down the system response time. In this work, we revealed the fact that the power signature of NFSS will deviate from the normal pattern during data attacks. Therefore, it is possible to detect the malicious data attacks by monitoring real-time power signature of NFSS.Power signature analysis examples on the NFSS with the buffers built on different memory technologies are also given.
| Speaker: | Hai Li - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | Jie Guo - Univ. of Pittsburgh , Pittsburgh, PA |
| | Guangyu Sun - Peking Univ., Beijing, China |
| | Chun Jason Xue - City Univ. of Hong Kong, Hong Kong |
| | Hai Li - Univ. of Pittsburgh , Pittsburgh, PA |
Distributed Runtime Computation of Constraints for Multiple Inner Loops
This paper presents hardware solution for runtime computation of loop constraints and synchronizing delays for multiple inner loops in parallel distributed implementation of digital signal processing sub-systems. Methods to map and generate the runtime computation code for loop constraints and synchronizing delays are also presented. Compared to the traditional methods, the proposed solution achieves 55% average code compaction and 32.7% average performance improvement. The solution has modest hardware cost that increases linearly with the dimension of the architecture and has no performance penalty. Results from multiple realistic examples are presented, analyzed and compared to the traditional methods.
| Speaker: | Nasim Farahini - KTH Royal Institute of Technology, Stockholm, Sweden |
| Authors: | Nasim Farahini - KTH Royal Institute of Technology, Stockholm, Sweden |
| | Ahmed Hemani - KTH Royal Institute of Technology, Stockholm, Sweden |
| | Kolin Paul - Indian Institute of Technology, New Delhi, India |
High Density 3D Stacked RRAM Cache Designs
A conventional memristor stack design has an insulator layer for every crossbar layer stacked. In the proposed design, more crossbars can be stacked for the same space and it’s about 2x in certain cases which means it sets the next generation in density. While still suffering a few short-comings, concurrent access in particular, the new design proves itself as an interesting design alternative because of higher memory density. Another design is also proposed, a hybrid design between conventional and the proposed extrapolated design. The hybrid design gives higher concurrency than extrapolated design and higher density than conventional design.
| Speaker: | Selvakumaran Vadivelmurugan - Purdue Univ., West Lafayette, IN |
| Author: | Selvakumaran Vadivelmurugan - Purdue Univ., West Lafayette, IN |
Variability Aware Efficient Post-Silicon Validation via Segmentation of Process Variation Envelopes
we propose an efficient method for incorporating the knowledge of global (worst case of on-die, across die, across wafers, and across wafer lots) and local (worst case on-die) process variations to generate vectors for identifying delay marginalities in a design during post-silicon validation. With the goal of significantly reducing the number of vectors required for validation, we propose an approach for segmenting the total global plus local process variation envelope into sub-envelopes, where each sub-envelope is guaranteed to capture worst-case total local variations along with partial global variations, and where all sub-envelopes collectively capture the worst-case total variations.
| Speaker: | Prasanjeet Das - Univ. of Southern California, Los Angeles, CA |
| Authors: | Prasanjeet Das - Univ. of Southern California, Los Angeles, CA |
| | Sandeep K. Gupta - Univ. of Southern California, Los Angeles, CA |
Post-Silicon Test Generation with Virtual Prototypes
Virtual prototypes are increasingly used in system development to enable early software development and validation before hardware prototypes become available. Virtual prototypes also have major potential to play a crucial role in post-silicon validation. We present an approach to generation of post-silicon tests with virtual prototypes. For a hardware component of interest, we exercise its virtual prototype by symbolic execution. Based on results of symbolic execution, a concrete test case is generated for each exercised path. We have applied this approach to virtual devices of five network adapters. The results show that this approach is feasible, efficient, and useful.
| Speaker: | Kai Cong - Portland State Univ., Portland, OR |
| Authors: | Kai Cong - Portland State Univ., Portland, OR |
| | Fei Xie - Portland State Univ., Portland, OR |
| | Li Lei - Portland State Univ., Portland, OR |
STT-MRAM for Non-Volatile Logic ASIC Applications
In this paper we present a methodology of low-power non-volatile ASIC design using a hybrid CMOS/Magnetic process. This methodology is implemented using a full digital Process Design Kit that we have developed and illustrated on a case study digital FIR filter. Furthermore, we present power estimation results, using advanced magnetic technologies, such as spin transfer torque in in-plane and perpendicular-to-plane magnetized tunnel junctions. These results show that non-volatile devices can be integrated in logic designs, for both security and low power purpose, without increasing the standard cell transistor sizes and without area or power overhead.
| Speaker: | Gregory Di Pendina - Spintec, Grenoble, France |
| Authors: | Gregory Di Pendina - Spintec, Grenoble, France |
| | Guillaume Prenat - Spintec, CEA-INAC/CNRS/UJF/INPG, Grenoble, France |
| | Bernard Dieny - Spintec, Grenoble, France |
Timing Variation Tolerant Real Time Adaptive Pipelines using Wave Completion Sensing
In this paper, we propose a high throughput, robust pipeline design to reduce/eliminate the clock frequency guard bands imposed on microprocessor pipelines for resilience against static and dynamic variations like threshold voltage variations, supply and ground bounce fluctuations, random delay defects to mention a few. The design is self-adjusting, self-driven and environment adaptive and is equipped with lost cost, low overhead switching detection sensors which signal upon completion of all gate node transitions in a combinational logic and subsequently latch the result to the next pipeline stage.
| Speaker: | Jayaram Natarajan - Georgia Institute of Technology, Atlanta, GA |
| Authors: | Jayaram Natarajan - Georgia Institute of Technology, Atlanta, GA |
| | Sahil Kapoor - Georgia Institute of Technology, Atlanta, GA |
| | Varsha Koorapati - Georgia Institute of Technology, Atlanta, GA |
| | Debesh Bhatta - Georgia Institute of Technology, Atlanta, GA |
| | Adit Singh - Auburn Univ., Auburn, AL |
| | Abhijit Chatterjee - Georgia Institute of Technology, Atlanta, GA |
P-Spectrum: A Personalized Smartphone Power Management Technique Based on Real-Time Battery and User Behavior Monitoring
Smartphone power consumption is determined by both specific hardware and user behavior. We propose “P-Spectrum” – an intelligent smartphone power management scheme based on the personal operation patterns of the user. The smartphone application power profiles are captured and guide the run-time application scheduling to meet the battery life budget. The impacts on the user experience are minimized by fitting the updated application schedule to the user’s personal preference to the maximum extent. A battery usage monitor is also designed to precisely trace the power consumption. Our initial measurements on Galaxy S3 show that P-Spectrum can reduce the energy effectively.
| Speaker: | Xiang Chen - Univ. of Pittsburgh , Pittsburgh, PA |
| Authors: | Xiang Chen - Univ. of Pittsburgh , Pittsburgh, PA |
| | Hai Li - Univ. of Pittsburgh , Pittsburgh, PA |
iVAMS: Intelligent Metamodel-Integrated Verilog-AMS for Fast Analog Block Optimization
The gap between abstraction levels in analog design is a major obstacle for advancing analog and mixed-signal design. Intelligent surrogate models for low-level analog building blocks are needed to bridge behavioral and transistor-level simulations. Parameterized behavioral models in Verilog-AMS based on the artificial neural network metamodels are presented for efficient system-level design exploration. To the best of the authors' knowledge this is the first paper to integrate ANN metamodels in Verilog-AMS. To demonstrate iVAMS, a biologically-inspired ``firefly optimization algorithm'' is applied to an OP-AMP design. The optimization process is sped up by 5580X due to the use of iVAMS.
| Speaker: | Geng Zheng - Univ. of North Texas, Denton, TX |
| Authors: | Geng Zheng - Univ. of North Texas, Denton, TX |
| | Saraju Mohanty - Univ. of North Texas, Denton, TX |
| | Elias Kougianos - Univ. of North Texas, Denton, TX |
Designing Hardware in the Holodeck
We envision the design of circuits and systems in a virtual world making use of virtual reality applications and expect two main advantages from this approach. First, visualization is targeted, since the approach allows for literally entering the system. As a result, complex structures can be grasped much easier in a 3D environment. A second application of this scenario is the actual design of the system in the virtual world. In the paper, we review current state of the art in both software and hardware to discuss a possible implementation and point out ideas and applications.
| Speaker: | Mathias Soeken - Univ. of Bremen, Bremen, Germany |
| Authors: | Mathias Soeken - Univ. of Bremen, Bremen, Germany |
| | Rolf Drechsler - Univ. of Bremen, Bremen, Germany |
Merging Silicon Photonics and Electronics: A Design Challenge
Silicon Photonics is rapidly gaining maturity in high-bandwidth optical communication, with applications in datacom, access networks and I/O for bandwidth-intensive electronics, through co-integration of photonics and electronics into the same chip or multi-chip module. However, the combination introduces a new set of problems on the design side: Codesign/cosimulation of complex photonic and electronic circuits, tolerance to process and operational variability, and verification algorithms that can handle photonic circuits. We will discuss these challenges and give an outlook on how design tools need to evolve to address the needs of photonic-electronic IC designers.
| Speaker: | Wim Bogaerts - Ghent Univ., Ghent, Belgium |
| Authors: | Wim Bogaerts - Ghent Univ., Ghent, Belgium |
| | Pieter Dumon - IMEC, Gent, Belgium |
Modeling of Retention Time for High Speed Embedded Dynamic Random Access Memories
Embedded Dynamic Random Access Memories (eDRAM) are becoming popular choice for large cache applications due to its density, speed and power benefits. One of the crucial challenges in eDRAM design is meeting the retention time specification. Due to implementation in logic process usually eDRAM suffers from poor retention time compared to commodity DRAM. Retention time of eDRAM designed in scaled technologies not only depends on bitcell leakage but also on effects like reference voltage variations, frequency dependent writeback voltage and pattern dependent coupling noise. This paper investigates these components and provides a simplistic yet accurate model of eDRAM retention time.
| Speaker: | Swaroop Ghosh - Univ. of South Florida, Tampa, FL |
| Author: | Swaroop Ghosh - Univ. of South Florida, Tampa, FL |
Floorplan Driven Architectures and High-level Synthesis Algorithm for Dynamic Multiple Supply Voltages
We propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Experimental results show that our algorithm achieves 44.9% energy-saving compared with conventional algorithms.
| Speaker: | Shin-ya Abe - Waseda Univ., Shinjuku-ku, Japan |
| Authors: | Shin-ya Abe - Waseda Univ., Shinjuku-ku, Japan |
| | Youhua Shi - Waseda Univ., Shinjuku-ku, Japan |
| | Kimiyoshi Usami - Shibaura Institute of Technology, Koto-ku, Japan |
| | Masao Yanagisawa - Waseda Univ., Shinjuku-ku, Japan |
| | Nozomu Togawa - Waseda Univ., Tokyo, Japan |
Design Automation for Large Scale Social Networks
The explosion in social media adoption has opened up new opportunities for next-generation personalized web and information exchange in big data scenarios. Making sense of the massive number of overlapping streams of information generated by hundreds of millions of users on large social networks requires novel analytics and scalable computational techniques. This paper shows emerging themes offered by scalable EDA tools in building next generation tools for design automation in the social world. As a first step in this nascent area, approaches and analogs based on field solvers and timing simulators are considered.
| Speaker: | Arun V. Sathanur - Univ. of Washington, Seattle, WA |
| Authors: | Arun V. Sathanur - Univ. of Washington, Seattle, WA |
| | Vikram Jandhyala - Univ. of Washington, Seattle, WA |
| | Chuanjia Xing - Univ. of Washington, Seattle, WA |
Identifying Key Elements of Variability using Heterogeneity Estimation for Fast Redesign
This paper introduces a new heterogeneity based approach that can identify key components in a circuit or system. These key components are the controlling ones that will play main roles in affecting system and circuit behavior such as reliability, speed and power. Instead of testing numerous corners for a number of running modes, the identifications of components with high heterogeneity will allow us to redesign or ECO in a much faster manner. By combining influential weights, such as Most Significant Error bits, we can single out components or gates that once redesigned, can lower down the potential error rate.
| Speaker: | Janet M. Roveda - Univ. of Arizona, Tucson, AZ |
| Authors: | Fahd Shaikh - Univ. of Arizona, Tucson, AZ |
| | Wei He - Univ. of Arizona, Tucson, AZ |
| | Jonathan Sprinkle - Univ. of Arizona, Tucson, AZ |
| | Janet M. Roveda - Univ. of Arizona, Tucson, AZ |
Verilog-AMS-POM: Verilog-AMS Integrated POlynomial Metamodelling of a Memristor-Based Oscillator
This paper proposes a two-level framework for memristor based mixed-signal design exploration. First, a Verilog-A memristor model is proposed which is not source-type dependent and has an advantage over existing SPICE memristor models. Second, a POlynomial Metamodel integrated Verilog-AMS (Verilog-AMS-POM) is proposed to enable fast circuit-accurate system-level design exploration of memristor based circuits and systems. A memristor based programmable Schmitt trigger oscillator is proposed as a case study. The coefficients of determination of the proposed metamodels are greater than 0.99 and the RMSE are less than 0.09. Verilog-AMS-POM simulation achieve a over 30,000X speedup compared to SPICE simulation.
| Speaker: | Geng Zheng - Univ. of North Texas, Denton, TX |
| Authors: | Geng Zheng - Univ. of North Texas, Denton, TX |
| | Saraju Mohanty - Univ. of North Texas, Denton, TX |
| | Elias Kougianos - Univ. of North Texas, Denton, TX |
Do We Need Wide Flits in Networks-on-Chip?
The increasing numbers of on-chip cores have infused a “communication-centric” flavor to multicore CPUs. The existence of multiple cores on the same silicon die necessitates a scalable on-chip communication infrastructure. Packet-based Networks-on-Chip (NoC) have emerged as the most viable candidate for the interconnect backbone of future CMPs. The flit size is one of the important design parameters of NoC that affects the performance and the cost significantly. Interestingly, most recent researches on NoC assumes a certain flit size without any reasoning. This paper provides a guideline to determine the flit size.
| Speaker: | Junghee Lee - Georgia Institute of Technology, Atlanta, GA |
| Authors: | Junghee Lee - Georgia Institute of Technology, Atlanta, GA |
| | Chrysostomos A. Nicopoulos - |
| | Sung Joo Park - Georgia Institute of Technology, Atlanta, GA |
| | Madhavan Swaminathan - Georgia Institute of Technology, Atlanta, GA |
| | Jongman Kim - Georgia Institute of Technology, Atlanta, GA |
An Analog Bus for Low Power On-Chip Digital Communication
We evaluate digital-to-analog and analog-to-digital converter based inter-core communication scheme to significantly reduce the power consumption of multiple bit-line wide buses in multi-core processors and networks-on-chip. The proposed scheme replaces n-bit wide bus running between cores with a single line, by encoding the information (that was to be carried on the n-bit bus) into 2n levels of voltage on a single wire. Such a scheme offers the best of the two most prominent low power inter-core communication schemes – bus encoding and differential-low-voltage signaling – by encoding n lines into 1 and keeping the average signal swing to about VDD/2.
| Speaker: | Farah Naz Taher - Auburn Univ., Auburn, AL |
| Authors: | Farah Naz Taher - Auburn Univ., Auburn, AL |
| | Suraj Sindia - Auburn Univ., Auburn, AL |
| | Vishwani Agrawal - Auburn Univ., Auburn, AL |
The Invisible Shield: User Classification and Authentication for Mobile Device Based on Gesture Recognition
Intelligent mobile devices are widely used in daily life. A large amount of sensitive information is stored on the devices, raising severe concerns on data security. In this work, we propose a novel user classification and authentication scheme for mobile devices based on gesture recognition. The user’s gesture patterns are collected by the integrated sensors and a learning algorithm is developed to recognize the mobile user and accommodate the ever-changing hardware and biometric features. The initial analysis shows the great possibility for our gesture-based security scheme to reach the sufficient accuracy with invisible impact on the user experience.
| Speaker: | Kent W. Nixon - Univ. of Pittsburgh , Bridgeville, PA |
| Authors: | Kent W. Nixon - Univ. of Pittsburgh , Bridgeville, PA |
| | Xiang Chen - Univ. of Pittsburgh , Pittsburgh, PA |
| | Zhi-Hong Mao - Univ. of Pittsburgh , Pittsburgh, PA |
| | Kang Li - Rutgers Univ., Piscataway, NJ |
| | Yiran Chen - Univ. of Pittsburgh , Pittsburgh, PA |
Deadlock Verification in Register Transfer Level Designs of Communication Fabrics
Communication fabrics constitute a key component of multi-core processors and systems-on-chip. Detection of message dependent deadlocks in communication fabrics is a challenge due to the large number of queues and the distributed character of control. We address the verification of deadlock freedom of Register Transfer Level designs of communication fabrics. We reduce queues to an abstract entity, and convert deadlock freedom to an SMT instance. Experimental results show that our approach scales to large fabrics, for example, an 8x8 mesh network. Our approach is sound but incomplete. Future work includes the generation of invariants to rule out false deadlocks.
| Speaker: | Sebastiaan J.C. Joosten - Open Univ. of the Netherlands, Heerlen, The Netherlands |
| Authors: | Sebastiaan J.C. Joosten - Open Univ. of the Netherlands, Heerlen, The Netherlands |
| | Julien Schmaltz - Open Univ. of the Netherlands, Heerlen, The Netherlands |
Yield and Timing Constrained Spare TSV Assignment for Three-Dimensional Integrated Circuits
Through Silicon Via (TSV) is a critical enabling technique in 3D ICs. However, it may suffer from many reliability issues. In this paper, we focus on the structure that uses one spare TSV for a group of original TSVs, and study the optimal assignment of spare TSVs under yield and timing constraints to minimize design cost. We show that such problem can be modeled through constrained graph decomposition, and prove its NP-hardness. An efficient heuristic is then developed. Experimental results show that our heuristic can reduce the number of spare TSVs by 32.32% compared with a nearest-neighbor based heuristic.
| Speaker: | Yu-Guang Chen - National Tsing Hua Univ., Hsinchu, Taiwan |
| Authors: | Yu-Guang Chen - National Tsing Hua Univ., Hsinchu, Taiwan |
| | Yiyu Shi - Missouri Univ. of Science and Technology, Rolla, MO |
| | Kuan-Yu Lai - National Tsing Hua Univ., Hsinchu, Taiwan |
| | Ming-Chao Lee - Global Unichip Corp., Hsinchu, Taiwan |
| | Wing-Kai Hon - National Tsing Hua Univ., Hsinchu, Taiwan |
| | Shih-Chieh Chang - National Tsing Hua Univ., Hsinchu, Taiwan |
Static Low Power Verification in Mixed-Signal SoC Designs
Complex signal paths traversing multiple power domains through both analog and digital circuits are common in SoC designs deploying low power design techniques thus making the full-chip power verification work based on standard power formats quite challenging. This may result in hazards during the transitioning of the chip into or out of different power states. This paper attempts to address that challenge by describing a static sign-off approach for low-power verification in mixed-signal SoC designs. The methodology was successfully deployed on a 32nm accelerated processing unit (APU) design and a case study of the same is included in this paper.
| Speaker: | Shubhyant Chaturvedi - Advanced Micro Devices, Inc., Austin, TX |
| Authors: | Shubhyant Chaturvedi - Advanced Micro Devices, Inc., Austin, TX |
| | Pascal Bolzhauser - Concept Engineering GmbH, Freiburg, Germany |
| | Shruti Anand - |
Perspective: Practical Application of Thousands of Cores to EDA Problems
After decades of false starts, it appears that the end of serial performance gains has truly arrived, and there is now no alternative but to embrace parallelism. In this paper, we first examine prior work in the area, and make plain the reasons parallel computing has failed to take root. We then present a set of new problems in design automation that have previously been ignored; these problems present ideal cases for parallel computation, and hold the promise of further performance gains for both serial and parallel systems.
| Speaker: | Patrick H. Madden - SUNY Binghamton, Binghamton, NY |
| Authors: | Jason Gallia - SUNY Binghamton, Johnson City, NY |
| | Patrick H. Madden - SUNY Binghamton, Binghamton, NY |
Interchangeable SystemVerilog Random Constraints
SystemVerilog constraints are declarative in nature. To modify or override a constraint, explicit details about the test bench must be known, the new constraint correctly implemented, and simulation re-compiled. This approach is time- and knowledge-expensive. We propose a standard suite of SystemVerilog constraint building blocks that are lazily and dynamically instantiated during simulation. Coupled with a front-end parser and test bench-wide resource manager, random values incur fully interchangeable constraints specified as a string and provided on the command-line or via function call. This results in easily applied and manipulated constraints with no test inheritance or instrumentation requirement or even re-compilation.
| Speaker: | Jeremy Ridgeway - LSI Corp., Milpitas, CA |
| Author: | Jeremy Ridgeway - LSI Corp., Milpitas, CA |
Analytical Task Mapping of Scientific Applications on 3D Network Topologies
Supercomputers with ever increasing computing power are being built for scientific applications. As the system size scales up, communication in the network becomes increasingly expensive due to large distance between nodes and network contention, leading to the scaling bottleneck. Topology-aware task mapping, which maps parallel application tasks onto processors by considering the network topology and the communication pattern of the application, is an essential technique for communication optimization. In this paper, we propose to apply the analytical placement technique of VLSI physical design for task mapping on 3D topologies. The resultant analytical task mapping algorithm is highly efficient.
| Speaker: | Xuanxing Xiong - Illinois Institute of Technology, Chicago, IL |
| Authors: | Jingjin Wu - Illinois Institute of Technology, Chicago, IL |
| | Zhiling Lan - Illinois Institute of Technology, Chicago, IL |
| | Xuanxing Xiong - Illinois Institute of Technology, Chicago, IL |
| | Jia Wang - Illinois Institute of Technology, Chicago, IL |