GoodKook's Place: 10월 2023

목요일, 10월 05, 2023

[영어공부] 고위합성 자동화 도구의 동향

[역자주] 반도체 설계 분야의 최신 설계기법에 관한 동향보고서(survey)다. 영어공부도 편식하면 않되겠기에 이정도 글은 상식으로 읽어볼만 하다. 총 30여쪽인데 본문은 다소 전문적인 내용이 담겨 있으니 요약과 서론 부분을 발췌하여 읽기로 한다. 간간이 나오는 전문용어에는 나름 주석을 붙여 보겠다.

[원문출처] IEEE Access, Vol.8,2020, https://ieeexplore.ieee.org/abstract/document/9195872

---------------------------------------------------------------------------

Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains

제목부터 길고 난해하다. 동원된 단어들은 기본수준 이지만 생략되고 축약되어서 이 논문이 어느 잡지에 실렸는지 모른다면 도데체 무슨 내용인지 감 잡기 어렵다. 구글의 번역의 도움을 받아보니 이렇게 나왔다.

"재구성 가능한 플랫폼의 자동 고급 코드 배포를 향하여: 고급 합성 도구 및 도구 체인에 대한 조사"

번역된 내용은 나중에 따지기로 하고 몇가지 외국어들이 눈에 띈다. '플랫폼', '코드', '체인'은 우리말에 동화된 외래어가 되어 버렸다는 뜻일까? 그렇다면 우리는 이 외래어를 적절하게 사용할 수 있어야 할 것이다. 이 논문이 전기전자공학기술자들의 전문지 IEEE Explore에 실렸다는 점을 감안하면,

'platform'은 각종 논리회로의 요소들을 미리 배치해 놓은 FPGA(Field Programable Gate Array),
'code'는 계산법(algoritjm)을 높은 추상화 수준의 컴퓨팅 언어로 기술한 원시구문(source code)을 뜻한다. 이렇게 옮겨 봤다.

"높은 추상화 수준에서 작성된 설계구문을 재구성 가능한 반도체 토대 위에 구현하는 자동화된 방법에 대하여: 고위 합성과 그와 동반하는 일련의 도구들의 동향보고"

MOSTAFA W. NUMAN[1], BRADEN J. PHILLIPS[2], (Member, IEEE), GAVIN S. PUDDY[1], (Associate Member, IEEE), AND KATRINA FALKNER[1]
Corresponding author(교신저자): Mostafa W. Numan (mostafa.numan@adelaide.edu.au)

[1] School of Computer Science, The University of Adelaide, Adelaide, SA 5005, Australia / 호주 아델레이드 대학교, 컴퓨터과학과
[2] School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA 5005, Australia / 아델레이드 대학교 전기전자 공학과

This work was supported by the Maritime Division of the Defence Science and Technology Group, Australia.

이 논문은 국방과학기술단 해양과의 지원을 받아 수행되었음.

---------------------------------------------------------------------

ABSTRACT

요약

Heterogeneous computing systems with tightly coupled processors and reconfigurable logic blocks provide great scope to improve software performance by executing each section of code on the processor or custom hardware accelerator that best matches its requirements and the system optimisation goals.

범용 연산기와 재구성 가능한 논리회로부가 밀접히 결합된 이종 연산기 체계는 구문을 범용 계산기와 전용 계산회로로 구성된 가속기에서 나눠 실행시키고 체계적으로 최적화함으로써 높은 실행 성능을 발휘한다.

heterogeneous computing system: 이종 연산기 체계

processors: 범용계산기 CPU
reconfigurable logic blocks: 전용계산기 FPGA

'tightly coupled' 범용계산기와 재구성 가능 전용 계산기의 밀접한 결합을 강조하고 있다.
Reconfigurable 은 Programmable 과 같은 의미
'improve software performance' 는 HLS의 목적

This article is motivated by the idea of a software tool that can automatically accomplish the task of deploying code, originally written for a conventional computer, to the processors and reconfigurable logic blocks in a heterogeneous system.

원래 전통적인 전자계산기에 수행되도록 작성된 구문을 범용 계산기와 재구성 가능한 논리계산기로 구성된 이종 계산기 체계에 자동으로 구현하는 소프트웨어 도구들이 등장하였기에 이 논문을 작성하게 됐다.

We undertake an extensive survey of high-level synthesis tools to determine how close we are to this vision, and to identify any capability gaps.

우리는 고위합성도구가 이 기대에 얼마나 가까워 졌는지 그리고 그 간격이 어느 정도인지 알기 위해 심도있게 조사해봤다.

high-level synthesis(HLS): 고위합성 (높은 추상화 수준)
gaps: 계산법을 기술하는 방법 사이에서 추상성(abstraction level)의 차이.

범용 컴퓨팅 언어는 추상성 수준이 높다.
전용 하드웨어는 RTL(Register-Transfer Level)로 추상성이 낮다.

RTL(Register-Transfer Level): 클럭(number of clocks)과 비트-폭(bit-width)의 상세

The survey is structured according to a new framework that clearly expresses the relationships between the many tools surveyed.

이 조사는 다양한 도구들을 살펴보고 그들 사이의 관계를 분명히 하는 시각에서 살펴봤다.

new framework: 도구의 우열이나 장단점을 찾기보다 각 도구마다 가진 특징을 알아보고 그 도구들 사이의 관계를 정리해봤다.

We find that none of the existing tools can deploy general high-level code without manual intervention.

우리는 이번 조사에서 손보지 않고도 고위 구문을 낮은 RTL의 하드웨어로 변환 할 수 있는 도구는 없음을 알았다.

현재 성숙된 단계에 이른 HDL(Hardware Description Language)에도 합성 가능한 구문 형식이 있다(synthesizable subsets). 하물며 이보다 높은 추상화 수준의 C++는 어떠랴.

Logic synthesis from arbitrary high-level code remains an open problem with dynamic data structures, function pointers and recursion all presenting challenges.

매우 높은 추상화 수준의 구문으로부터 논리식 합성이 가지고 있는 과제로는 동적 자료처리, 주소변수 함수, 재귀호출 등은 여전히 해결되지 않고 있다.

dynamic data structures: 동적 자료구조. 메모리 할당(memory allocation), 링크 리스트(linked-list), 스택 포인터(stack pointer) 등.
function pointers: 함수의 주소를 포인터 변수에 대입하여 호출하는 기법
recursion: 함수 내에서 자신을 호출. 프랙탈(Fractals) 알고리즘

Other challenges include automating the tasks of code partitioning, optimisation and design space exploration.

그외 어려운 과제로는 자동화된 구문 분할, 최적화, 설계 구조 평가 등이 있다.

C++ 컴파일러의 다양한 기계어 코드 생성 옵션을 상기해 보자.

코드 크기/실행속도 최적화
인-라인 코드 인-라인
호출 방식
자료형 변환

-------------------------------------------------------------------------

요약만 보더라도 굉장히 많은 토론꺼리를 담고 있다. 분량이 다소 많긴 하지만 최신 반도체 설계 기법에 관심이 있다면 전체 논문을 읽어보길 권한다.

SECTION I.Introduction

For the last four decades, Moore's law and Dennard scaling have relentlessly delivered improvements in computing performance [1]. Since the early 2000s their impact has begun to wane and alternative ways to improve performance have begun to emerge.

Heterogeneous computing is a promising approach in which a group of processing nodes execute a workload in parallel. Given different kinds of nodes including multi-core CPUs, real-time processors, DSPs, GPUs, and accelerators on FPGAs or ASICs, the computing workload can be partitioned such that each part is executed on a processor that is well-matched to its requirements and the performance optimisation goals.

This article is concerned with the engineering task of writing software for a heterogeneous system and considers how close existing tools and technologies are to a fully automatic system in which high-level source code is partitioned and deployed to heterogeneous nodes with a minimum of human intervention. This is an ambitious scope so we constrain ourselves, in this article, to the task of deploying source code blocks onto custom FPGA logic.

It is possible, of course, to write software specifically for a particular heterogeneous system by manually partitioning tasks among the processors, and using the most appropriate programming language for each of the different processors. For example a Hardware Description Language (HDL) such as Verilog could be used for tasks executing on an FPGA, and CUDA for those on a GPU. An alternative, which has seen a great deal of research activity in recent years, is to use High-Level Synthesis (HLS) for generating hardware modules from code written in a High-Level Language (HLL) (such as C, C++ or Python).

There are benefits of using HLS instead of HDL so that the entire application is in a high-level language: simulation speed is generally faster; debugging is less difficult; it is easier to explore and evaluate design alternatives; and the high-level language may include features that cannot be easily expressed in a HDL [2].

Although current HLS tools do not always produce performance-optimised implementations, applications without stringent performance requirements can be more quickly and easily developed using HLS. HLS software developers do NOT necessarily need to be FPGA or HDL experts, and optimisation opportunities can be exposed to the designer that cannot be easily explored via HDL approaches. In some cases, a project that would not have been practical in HDL, given its complexity, limited time frame and small development team, can be feasible in HLS at a low performance cost compared to an HDL-based approach [3]–[5].

An HLS-based design for a heterogeneous system could be started from scratch, or use pre-existing code originally written for a conventional CPU. Either way, to effectively use current HLS technology, system developers require considerable knowledge and experience in the application domain, computer programming, and HLS design flow.

Deploying pre-existing code written for a conventional CPU onto a heterogeneous system with the aim of improving performance or efficiency is even more difficult. The code needs to be substantially restructured to be synthesisable, and to produce optimised hardware. This needs to be done for all the code: not just the application source code but also any library functions it uses. To date Automatic Code Deployment (ACD) tools capable of performing this challenging task without human intervention have been the subject of limited research (e.g. [6], [7]) but a considerable amount of work has been done on automating some of the more challenging, tedious and time-consuming steps in this process (e.g. [8]–[11]).

This article surveys recent toolchains and workflows for high-level synthesis to FPGA with a focus on technologies that might eventually be used for automatic code deployment. Section I introduces HLS and the motivation for heterogeneous computing based on HLS. Section II categorises different contemporary approaches to deploy compute-intensive code segments to FPGA hardware accelerators. The categories introduced in Section II are used to organise a thorough survey, in Sections III and IV, of approaches that take a candidate function expressed in a HLL and produce low-level HDL suitable for FPGA deployment. The arguments in these sections focus on contemporary HLS tools currently used in academia or industry; legacy tools are included in summary tables for completeness. Specification of a hypothetical tool for ACD to FPGA, as well as a brief summary of progress reported in the literature towards making HLS-based FPGA code deployment less dependent on human judgement and proficiency, are provided in Section V.

SECTION II.High-Level Code Deployment Approaches

FIGURE 1. Design flows for high-level code deployment.

SECTION III.Behavioural Approach for FPGA Synthesis

FIGURE 2. A generic HLS design flow.

TABLE 1 Currently Available Commercial HLS Tools

TABLE 2 Currently Available Academic HLS Tools

----------------------------------------------------------------------

원 논문에 각 HLS 도구들의 작동 방식을 간략히 설명하고 있으니 찾아보자. 어떤 기법들이 동원되고 있는지 눈여겨볼 만 하다.

SECTION VI.Conclusion

The motivation for this article was a vision of a software tool that could automatically deploy sections of code, originally written for a conventional CPU, to FPGA accelerators to achieve an implementation advantage, whether that be latency, throughput, energy or some other optimisation goal. How close are existing tools to this delivering this vision, and where are the capability gaps?

To survey and evaluate existing tools we provided a classification of design flows in FIGURE 1. This classification has neatly expressed the relationships between many different hardware synthesis tools surveyed under three broad approaches: manual re-coding, behavioural synthesis, and dataflow synthesis, as well as variations of these.

Sections III and IV surveyed many commercial and research tools for code deployment, organised according to the framework in FIGURE 1. Wherever possible we have identified the pedigree of these tools and their relationship to other tools in the survey. For the more widely used or surveyed tools we have provided an overview of salient features and capabilities.

None of the existing tools are able to fulfil the vision of fully automatic deployment of general C/C++ code to a heterogeneous system of FPGAs and CPUs. Capability gaps include the generation of synthesisable HLS code from HLL code that uses pointers to pointers or functions, recursive functions, or dynamic memory allocations. Other challenges include efficient partitioning of the code, optimisation of generated hardware, and design space exploration. All of these are the subject of active research efforts as surveyed in Section V-B.

This work has also identified a trend that is somewhat orthogonal to the idea of automatic code deployment. There is currently significant work in approaches to express the application in a high level language that is more amenable for execution on diverse platforms. Examples include the growing proliferation of domain-specific languages or dataflow representations.

수요일, 10월 04, 2023

[양평집] 2023년 9월, 고등어는 어디로 갔을까?

아침 마당에 나서면 한기가 느껴집니다. 어김없이 가을이 왔습니다. 재작년 창고에서 태어나 작년에 이어 올 여름 내내 마당에서 지내던 '고등어'가 지난달 집을 나가더니 한달째 소식이 없군요. 집안에 두마리, 마당과 테라스에서 살던 다섯마리 중 한마리 입니다. 날도 쌀쌀해 지는데 어디가서 끼니는 거르지 않는지 문득 걱정이 듭니다. 꼬리도 같이 뛰놀던 '고등어' 생각을 할까요? 집나가면 고생 일텐데...

꼬리와 꼬북이는 마당과 집안을 들락날락 합니다. 가끔 벌레를 달고 들어올 때도 있습니다. 집안으로 들어올때 발이라도 털고 들어오면 좋으련만 그걸 알면 고양이가 아니겠지요. 슬며시 소파에 올라와 팔을 베고 잠든 모습을 보며 마음의 평화를 찾습니다.

텃밭에 고추며 오이, 가지는 이제 제 몫을 다했고 단호박과 맷돌호박이 남아 가을을 맞이 하네요. 이 호박이 영글면 식혜를 만들려고 기다리고 있습니다.

올해 고구마는 작년보다 그나마 조금 더 수확했습니다. 고구마는 황토질 땅에서 잘 된다고 하는데 아무래도 마사토 텃밭에서는 어렵겠습니다. 그래도 내년에 또 심겠지요.

텃밭의 김장 배추는 무럭무럭 자랍니다. 예년에 비해 일주일 가량 일찍 모종을 심었는데 자라는 모양이 다르군요. 들녘에 하루가 다르게 벼가 영글어갑니다. 세상 살이가 다 때가 있나 봐요.

개강을 하고 학교 강의가 오후 시간에 잡혀서 저녁 도시락을 싸갑니다. 마당 텃밭에서 나는 채소와 감자를 넣어 샌드위치를 만듭니다. 루꼴라와 치커리를 꼭 심어보세요. 텃밭 채소의 재발견 입니다.

가을 화단은 구절초로 시작합니다. 곧 국화가 온 마당을 차지하게 되겠군요.

여름꽃 플록스를 데드헤딩 해줬더니 다시 한번 모양을 내고 있고, 겹아스타가 어여쁩니다.

새깃 유홍초와 추명국도 자리를 빛내주고 있습니다.

가을 노을이 아름답습니다. 밤이 깊어지니 반딧불이도 제법 날아다니고 있습니다. 집 주위가 아직 번잡하지 않아서 청정한 탓이겠지요. 이 환경이 오래 지속되길 바래 봅니다.

[영어공부] 곤충 만한 내연기관 로봇

Soft Robot Walks by Repeatedly Blowing Itself Up. Rapid explosive actuation powers this insect-scale robot’s jumps

EVAN ACKERMAN, 14, SEP. 2023, https://spectrum.ieee.org/explosive-robot-insect

로봇의 동력원으로 전기를 먼저 떠올릴 테지만 사실 화학연료를 따라가진 못합니다. 전기 배터리는 조용하고 각종 전자장치에 잘 어울리긴 하지만 화학연료에 비해 에너지 밀도를 따지면 한참 모자랍니다. 게다가 작게 만드는데 한계가 있습니다.

It’s hard to beat the energy density of chemical fuels. Batteries are quiet and clean and easy to integrate with electrically powered robots, but they’re 20 to 50 times less energy dense than a chemical fuel like methanol or butane. This is fine for most robots that can afford to just carry around a whole bunch of batteries, but as you start looking at robots that are insect-size or smaller, batteries simply don’t scale down very well. And it’s not just the batteries—electric actuators don’t scale down well either, especially if you’re looking for something that can generate a lot of power.

9월 14일자 사이언스지에 내연기관을 사용하는 곤충 크기의 로봇이 소개 됐습니다. [ Powerful, soft combustion actuators for insect-scale robots, https://www.science.org/doi/10.1126/science.adg5067 ]

작은 연소통에 가스(부탄)를 충전하여 폭발 시키면 고탄성 고무막(elastomer)이 순간적으로 부풀어 힘을 내게 되는 구조 라고 합니다.

엔진의 무게는 325 미리그램으로 초당 100회의 폭발을 내면서 약 9.5 뉴턴의 힘을 냅니다. 고무막의 탄성이 관건인데 약 75만회의 진동을 견딘다고 하네요.

"It took a lot of care, iterations, and intelligence to come up with this steerable, insect-scale robot," Shepherd told us. "Does it have to have legs? No. It could be a speedy slug, or a flapping bee. The amplitudes and frequencies possible with this system allow for all of these possibilities. In fact, the real issue we have is making things move slowly."

폭발 주기를 (천천히) 조절 하는 것이 관건이라고 합니다.

Getting these actuators to slow down a bit is one of the things that the researchers are looking at next. By trading speed for force, the idea is to make robots that can walk as well as run and jump. And of course finding a way to untether these systems is a natural next step. Some of the other stuff that they’re thinking about is pretty wild, as Shepherd tells us: "One idea we want to explore in the future is using aggregates of these small and powerful actuators as large, variable recruitment musculature in large robots. Putting thousands of these actuators in bundles over a rigid endoskeleton could allow for dexterous and fast land-based hybrid robots." Personally, I’m having trouble even picturing a robot like that, but that’s what’s exciting about it, right? A large robot with muscles powered by thousands of tiny explosions—wow.

대형 로봇에 순간적인 힘을 내는 근육에 적용하고 싶다고 합니다.