Seeking Alpha

Matthias Waldhauer

Matthias Waldhauer
Send Message
View as an RSS Feed
View Matthias Waldhauer's Comments BY TICKER:
Latest  |  Highest rated
  • AMD Update: What's New, And What Is Not [View article]
    @RandSec: "Each Fiji has 4GB on-chip. AMD has shown the dual Fiji card which apparently is running the Quantum demo unit. So 4GB + 4GB is 8GB, and it is coming."

    In dual GPU setups the separate memories usually hold redundant data. In effect the amount of RAM doesn't add up (but the bandwidth does). But as I pointed out above, such a fast RAM also helps in reducing performance hitting effects of PCIe transfers. One could also say that there are diminishing returns for adding 2x or 4x as much RAM. It also costs power BTW, which can't be used for shaders (given a fixed PCB power budget).
    Jun 20, 2015. 12:03 PM | Likes Like |Link to Comment
  • AMD Update: What's New, And What Is Not [View article]
    Fury X is roughly 10% faster on average than the 980 Ti. So at least it is not behind.
    Jun 19, 2015. 11:18 AM | 1 Like Like |Link to Comment
  • AMD Update: What's New, And What Is Not [View article]
    There are graphics cards with 1, 1.5, 2, 3, 4 (3.5), 6 GB and so on (from Nvidia and AMD). I doubt that any game company would optimize for a specific GDDR amount just to fill it. That would just constrain the creativity of the level designers. Visualizing content is a very dynamic process and might change on a per frame basis while turning the view. So the higher res textures have to get updated dynamically. For that there is also a lot of advanced tech available like compression, tiling, and prefetching. The remainder goes over the PCIe connection.

    A higher bandwidth also helps with these fetches, as the mem blocks being worked on are read and written faster and thus block the buses for a shorter amount of time. This allows for more background PCIe update traffic.
    Jun 19, 2015. 11:15 AM | 2 Likes Like |Link to Comment
  • AMD Update: What's New, And What Is Not [View article]
    14nm
    Jun 19, 2015. 11:08 AM | 1 Like Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    Aricool,

    you might look for such heterogeneous computing support in visual programming tools. There are some efforts
    http://bit.ly/1GknPdK
    http://bit.ly/1GknPdO

    Otherwise a part of the solution is to use good visual tools to analyze your parallel code, for example with Nvidia Nsight or AMD's CodeXL.

    At work we also have Matlab Simulink developers and C/C++ developers. Both ways have their pros and cons.

    Also check
    http://bit.ly/1GknN5L
    http://bit.ly/1GknN5N
    The latter did a Simulink model to parallel C code conversion.
    Jun 3, 2015. 10:15 AM | 1 Like Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    I'm only trading this way. Automated Trading based on parallelized ML techniques is one project of mine. Tech in itself just for the matter of the interesting facts is another. No long term with me since markets are rigged. :)

    BTW May 26th was the last day with significantly higher sell volume in AMD. Today (06/02) and on 28th the buy volume is much stronger. Today due to Excavator news:
    - double digit IPC growth already before Zen and
    - higher freq in same power envelope,
    - but optimized for 15W, so no high performance parts to be expected
    Jun 2, 2015. 12:43 PM | 2 Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    Python with PyOpenCL is good for a start, but you still have to write the OpenCL kernels like everywhere else (in C). Despite the different available frameworks and languages - the most important thing is to understand and be able to imagine how the parallel code processes the data, where things should be synchronized etc. This also helps in detecting potential bottlenecks etc.
    Jun 2, 2015. 12:38 PM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @Nemesis,
    Nice story. Morph Core doesn't help single core performance, which still is the main driver for many desktop tasks. It actually lowers it from the level w/o Morph Core. And it lowers per-thread MT performance (8 in order threads sharing one core).

    Phi, Pascal, GCNx - parallelism everywhere, this is the future, but not for all available tasks.
    Jun 2, 2015. 12:32 PM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @gofx:

    Since I wasn't up to date regarding GF's processes, I did some research. So far I didn't find any of directly comparable results of the 28SHP (Excavator) and GF28A (Steamroller) processes. I also found no comparisons of 28SHP to 20SHP, 28HPP or 28SLP. This makes it a bit difficult as there are still many variables or degrees of freedom, like frequency, power, leakage, area, and costs. 28SHP likely is tuned towards frequency and area (gate first was said to save 10% here).

    So if 28SHP would be better in frequency, but worse in dynamic power, and leakage, compared to the other 28nm processes, the improvements would be lower for frequency, but better regarding power metrics, maybe in the end also in performance per Watt (compared to the given comparisons).
    May 28, 2015. 06:46 AM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @Aricool,
    of course, there is a lot of legacy code out there. It often covers stuff like GUI handling, data structure organization, software engineering requirements, and so on. This often comes with a not so nice inherent serial behaviour due to a lot of dependencies. But there is enough code, which spends a lot of time in a couple of lines or call some library functions. These are points, where some parallelization already helps a lot. Maybe even indirectly by using a newer version of a library.

    With OpenCL, CUDA = SIMD code I mean, that you have compute kernels, which are being applied to a lot of data. There is no kernel loop, just more like a mapping. To make use of this condition, the GPUs have kind of SIMD units (e.g. AMD GCN with 16 ALUs per SIMD unit). Such a unit takes 16 work items (single data) at once (1/4 of a work group) and processes them according to one instruction and an execution mask. It's simply a natural trade off between ALU logic and control logic. A CPU core has a lot of control logic, making it more flexible for single data elements, but at a cost.

    This is also the reason, why AVX doesn't increase IPC, but throughput. ;)

    Some autoparallelizing compilers I remember from SPEC submissions are/were compilers from Sun, Intel, Pathscale and GCC with Graphite (I think). The first three already do a good job. But instead of adapting the hotspot loops to the loop pattern detectors of the compilers, one might simply add OpenMP to their code for some easily implemented multithreading. Of course doing parallel code for CPUs, GPUs, etc. involves some good understanding of what's happening there.

    Finally besides OpenCL, CUDA, HSA, there are frameworks like AMP, Bolt, OpenACC, which should also help in getting some adaption. As long as the single core performance continues to improve that slowly, more developers will turn to better parallelization. In the end it's also a matter of power efficiency. Some compilers can already compile for more power efficient code.
    May 27, 2015. 01:16 PM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @Fiberton:
    It's not 130nm. But technology and tools improve too over time, so if you work with Cadence, Synopsis or others and apply the given rools and get support by some resident engineers, it'll work in the end.

    I'm in the ADAS and autonomous driving R&D business and it's not easy either. But in the end it works.

    So does GP on GPGPU work for trading system generation etc. There's a lot of old "knowledge" out there, which still seems to work for many (due to our natural and/or disturbed psychological conditions), but it doesn't stand a hard analysis.
    May 26, 2015. 02:32 AM | 1 Like Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @Aricool,

    there is the simple problem, that a lot of the code out there can't be parallelized that easily. So it wouldn't be useful to force this programming paradigm onto everything. :)

    BTW the OpenCL, CUDA, etc. code is actually SIMD code - single instruction multiple data. So this increases the throughput drastically while still having low IPC levels. It's just a matter of definitions.

    The parallel code train is on its way and compilers (already capable of autoparallelization) might also help further in this regard.

    High throughput also doesn't mean responsive software. On Hyperthreading cores running 2 threads (1 GUI, 1 background), the response performance is actually just about 60-70% of the same core w/o Hyperthreading or SMT.
    May 24, 2015. 12:32 PM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @whiteknave,

    there is no such direct relation between draw calls and fps. This depends on the engine, the scene's attributes, the number of avg. draw calls, the related API overhead, the CPU code scheduling and thread synchronization, etc.

    One example: if the slow DX11 draw calls of many objects in a scene take up 25% of total frame time, then a tenfold draw calls speedup (reducing frametime to 77.5%) results in ~29% higher fps. A somewhat simpler 3D engine code might also help to gain additional percents of performance.
    May 24, 2015. 12:25 PM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @Aricool,

    there are risks everywhere, but in case of the process they lie somewhere else (GF, Samsung) and in case of the microarchitecture they can be reduced by using many existing components (as Keller said), or exact simulations, and by using new components or modifications only if they've been thoroughly researched, like ASF (proposed long before Intel came with TSX), which is still MIA.
    May 20, 2015. 11:19 AM | Likes Like |Link to Comment
  • AMD's Second Half Guidance Seems Unrealistic: Zen Will Not Topple Skylake [View article]
    @gofx,

    as I remember, Glew was happy to see his baby come to life in some way, but would have wished more credit where credit was due.

    The speculations about his hidden agenda also came up back then already. ;)
    May 20, 2015. 11:16 AM | 1 Like Like |Link to Comment
COMMENTS STATS
40 Comments
18 Likes