Let’s go for my web review for the week 2024-23.

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Tags: tech, ai, gpt, machine-learning, safety, research

Another cruel reminder that basic reasoning is not to be expected from LLMs. Here is a quote from the conclusion of the paper which makes it clear:

“We think that observations made in our study should serve as strong reminder that current SOTA LLMs are not capable of sound, consistent reasoning, as shown here by their breakdown on even such a simple task as the presented AIW problem, and enabling such reasoning is still subject of basic research. This should be also a strong warning against overblown claims for such models beyond being basic research artifacts to serve as problem solvers in various real world settings, which are often made by different commercial entities in attempt to position their models as a strong mature product for end-users. […] Observed breakdown of basic reasoning capabilities, coupled with such public claims (which are also based on standardized benchmarks), present an inherent safety problem. Models with insufficient basic reasoning are inherently unsafe, as they will produce wrong decisions in various important scenarios that do require intact reasoning.”

A critique of the ‘No AI’ Instagram and Artstation copycat child. - David Revoy

Tags: tech, social-media, art, criticism

Interesting critique of this new platform… it’s the beginning of the hype cycle but will probably exhibit the same decay phenomenon than other platforms.

Why are vulnerabilities out of control in 2024? – Open Source Security

Tags: tech, foss, security, data, data-science

The more releases out there the more vulnerabilities are (and could be) discovered. Some actions are necessary to get things under control properly.

Engineering for Slow Internet

Tags: tech, networking, reliability

A good reminder of everything which might go wrong when connectivity is bad. Most tools let you down in such a case.

Why do CPUs have multiple cache levels? | The ryg blog

Tags: tech, hardware, cpu

Very nice explanation and metaphors on how CPUs cache levels work.

BenchExec: A Framework for Reliable Benchmarking and Resource Measurement

Tags: tech, benchmarking, tools

Looks like an interesting benchmarking tool. To keep an eye on.

TIL #099 – order values of dictionary by iterable of keys with operator.itemgetter | mathspp

Tags: tech, programming, python

Definitely a nice Python trick. Fairly elegant, I’ll try to remember it.

Message authentication codes for safer distributed transactions

Tags: tech, filesystem, distributed, safety, cryptography

Interesting use of cryptography without a security concern. It’s more about safety and ensuring something wasn’t missed by mistake.

The state of Vulkan apps in 2024

Tags: tech, 3d, vulkan, portability

The difficult path for Vulkan. The data obviously is biased since it includes games and most of them are still targeting Windows and so DirectX. I’d be curious to see something similar excluding games (and so focusing on medical, industrial etc.).

How I learned Vulkan and wrote a small game engine with it

Tags: tech, 3d, game, vulkan

Interesting dive into the experience of writing a small Vulkan engine (almost) from scratch.

How to Build Engineering Strategy

Tags: tech, management, strategy, vision

Packed with useful information. Clearly some things I’m eager to test in there.

xkcd: Earth Temperature Timeline

Tags: science, history, data-visualization

So yes, the climate changed before… now slowly scroll until the end to appreciate how brutal it is this time.

Bye for now!