Symbol Crash

Interview with Lei

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We talk with Lei, long-time Defcon goon and founder of Disconnect Camp, about how to recover from infosec burnout, the origin story of Disconnect Camp, some war stories from his tenure as a Defcon goon, and how to keep your cool in a pandemic when you’ve already been dealing with burnout for years.

Lei’s links:
Disconnect Camp: https://disconnect.camp/
Twitter: https://twitter.com/disconnectcamp

Frustration-Aggression Hypothesis: https://en.wikipedia.org/wiki/Frustration%E2%80%93aggression_hypothesis

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Disable Unmuted Autoplay in Chrome version 62 and above

Does it seem like Chrome used to do a better job at NOT automatically playing video? Having problems with unmuted video or audio automatically playing when you visit certain sites? Have you gone looking for the old autoplay settings only to discover they’re not in Chrome at all anymore?

You are not alone. In this post, we will first tell you how to fix it, then if you’re interested, keep reading for details about the changes made in Chrome, why they suck, and how we figured out how to disable them. That way, when they change things again, you’ll be able to work out how to handle it.

The Solution:

For Windows, right-click on the icon you click on to start Chrome. If this is on the taskbar on Windows 10, right-click on the taskbar icon, then move up to Google Chrome and right-click on that as well. Then click on Properties, this will open the Google Chrome Properties dialog.

In the Target field, you should see this:

<code>"c:\Program Files\Google\Chrome\Application\chrome.exe"</code>

1	<code>"c:\Program Files\Google\Chrome\Application\chrome.exe"</code>

Click to edit the Target field, Ctrl-A to select everything, hit Backspace to delete it all, and paste the following:

"c:\Program Files\Google\Chrome\Application\chrome.exe" --disable-features=PreloadMediaEngagementData,MediaEngagementBypassAutoplayPolicies,RecordMediaEngagementScores,RecordWebAudioEngagement

1	"c:\Program Files\Google\Chrome\Application\chrome.exe" --disable-features=PreloadMediaEngagementData,MediaEngagementBypassAutoplayPolicies,RecordMediaEngagementScores,RecordWebAudioEngagement

Click Apply, Click OK, and you’re done.

For Linux/OSX, the solution is the same. Find the icon you’re clicking on to start Chrome, edit the properties, and add the same command line flags after the location of the Chrome binary:

--disable-features=PreloadMediaEngagementData,MediaEngagementBypassAutoplayPolicies,RecordMediaEngagementScores,RecordWebAudioEngagement

1	--disable-features=PreloadMediaEngagementData,MediaEngagementBypassAutoplayPolicies,RecordMediaEngagementScores,RecordWebAudioEngagement

That’s it. Restart Chrome and you should have the old behavior back. Videos might still autoplay on some sites, but they should always be muted until you click on them.

You make sure that it’s off if the chrome://media-engagement/ link stops working! Without this fix, that link will show your current Media Engagement settings and what data has been logged.

What changed? Why does this fix work?

Back in version 62 of Chrome, they added a feature they called Media Engagement Index (MEI), which keeps a log of how many times you actually click on video and audio on various sites. Once you’ve actually clicked on a video on a site a certain number of times, it AUTOMATICALLY DISABLES AUTOPLAY PROTECTIONS for that site. What’s worse than that, they preload a list of sites that get a free bypass of autoplay protections, which includes many porn sites.

Deciding that they did such a good job with this feature, they then proceeded to remove the autoplay settings from the interface in the browser.

Kind of shitty behavior. I guess they never figured that people might want autoplay disabled all the time, even on sites they use frequently or even on the magical list of sites that Google decided get a free pass. Maybe they were just trying to get more people to accidentally blast the audio from porn sites? Otherwise I’m not sure why anyone thought this was a good idea.

Fortunately, you can still disable these features from the command line using the –disable-features flag.

Our recommended fix disables four features, which restore the old autoplay behavior, disable the preloaded bypass list, and completely disable the extra tracking of your media consumption:

PreloadMediaEngagementData – Disabling this feature will disable the list of sites that Google has pre-determined should be able to bypass autoplay protections.

MediaEngagementBypassAutoplayPolicies – Disabling this feature disallows sites that you use regularly to bypass autoplay protections.

RecordMediaEngagementScores – Disabling this feature turns off the Media Engagement tracking altogether.

RecordWebAudioEngagement – Disabling this feature turns off the Media Engagement tracking for web audio.

Try enabling and disabling those features individually if you want to further tune this behavior.

Don’t Take Our Word For It – Look at the Code!

You can search the Chromium source code here: https://source.chromium.org/chromium

This can show you all the other features you might want to disable or enable from the command line. For example, searching for one of our flags, PreloadMediaEngagementData, brings us to a file called media_switches.cc in the Chrome source. This is how we found the flags to disable the whole MEI system, and there are many other feature flags in there you might want to play with.

You can also use the Chromium code search to find out how these feature flags are actually used. Searching again for our flag, we can also see the file media_engagement_contents_observer.cc, which has all of the logic for the MEI features and exactly how and when these flags are used!

If things change in the future, check back on these two files to see if they’ve added more features or logic you need to disable.

What is up with the Preload List? Porn gets to bypass autoplay? Really?

From the Chromium code search, we searched for PreloadMediaEngagementData and found where it loads the list of sites that get to bypass autoplay. It’s coming from a protobuf file called preloaded_data.pb which you can find in your Chrome application folder. On our test machine (version 88), this was at:

C:\Program Files\Google\Chrome\Application\88.0.4324.104\MEIPreload\preloaded_data.pb

1	C:\Program Files\Google\Chrome\Application\88.0.4324.104\MEIPreload\preloaded_data.pb

Protobuf is a binary data encoding from Google, so you can’t just read it. Being lazy, we just searched Github for preloaded_data.pb, and found this nice Python script , courtesy of NeatMonster, to decode this file to plain text (mirror).

Included in that gist is the list of preloaded list of sites that can bypass autoplay, and you can see sites like pornhub and xhamster in there, among a bunch of other questionable sites for this privilege.

But again, don’t take our word for it, you can run this yourself. Copy your preloaded_data.pb file out of the Chrome folder and into a temporary folder (or Downloads, etc.), save the unpack_dafsa.py file to the same folder, and run it from the command line (requires Python):

python unpack_dafsa.py

1	python unpack_dafsa.py

That will spit out the current contents of the autoplay bypass list for your installed version of Chrome.

Not exactly a list of sites you want to have just blast audio without your explicit permission, is it?

Interview with Vi Grey

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We meet with Vi Grey who answers all the questions we’ve had about the Nintendo Entertainment System since we were kids but were too afraid to ask. A prolific developer of homebrew NES ROMs, Vi Grey helps us understand the present and future of innovation on the NES platform. We also discuss his work with polyglot files featured in PoC||GTFO. This episode itself is in fact a polyglot, check the mp3 metadata of the file on the RSS feed for more information.

Vi Grey’s links:
I Dream of Game Genies (HOPE 2018 talk): https://www.youtube.com/watch?v=0rcKWQVMQ5w
Twitch Stream: https://www.twitch.tv/ViGreyTech
More at https://vigrey.com/

NESmaker: https://www.thenew8bitheroes.com/
Brad Smith on Light Guns on modern TV’s: https://www.youtube.com/watch?v=qCZ-Z-OZFUs
Damien Yerrick (more homebrew tools): https://pineight.com/
Tom7 (more NES hacks): http://tom7.org/

CypherCon: https://cyphercon.com/

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Swarm Intelligence with Pongolyn

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We have a chat with Pongolyn, a community organizer and strategist for the Pacific Northwest Englightend, one of the largest teams in the augemented reality game Ingress. We discuss the key elements needed to develop swarm intelligence and how they were applied to continent-spanning efforts.

Pongo has spent years deconstructing her experience into a valuable set of strategies for anyone organzing large numbers of volunteers, and expertly up-levelling them into easily digestible lessons on swarm-based strategies, gamification, and game theory for people that never played Ingress.

If you’ve ever had to organize a protest or a podcast, this episode is for you!

Pongolyn’s talks:
BSides Portland 2019 – https://www.youtube.com/watch?v=Eq33S_Rz4qo
Toorcamp 2018 – https://www.youtube.com/watch?v=UfYg3EVn_Jg
Defcon 26 – https://www.youtube.com/watch?v=bPTymsk1I_E

SwarmWise – The Tactical Manual to Changing the World by Rick Falkvinge
https://docs.google.com/file/d/0Bz8cVS8LoO7OOHhJUUF5akJ4RHc

Hannah Fry Ted Talk – Is life really that complex?
https://www.ted.com/talks/hannah_fry_is_life_really_that_complex

Screeps – https://screeps.com/

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Threat Modeling: None of Your Security Tools Help me Get More Money for my Security Program

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

For too long, the confusion caused by the Adam Shostack/MS threat modeling “methodology” has prevented security teams from doing any productive risk analysis. That ends now. We clear up the confusion around what a threat model is, what it’s for, how best to go about developing one, what is so very very wrong with the Adam Shostack/MS method of threat modeling, and how to achieve better results with less effort and arguing.

Check out the links for useful templates and examples. And remember: a dataflow diagram is an important piece of design documentation, but it is not and can never be an effective threat model.

Threat Modeling Template Examples from SymbolCrash, adjust these to suit!

Simple Threat Model Example:
https://www.symbolcrash.com/wp-content/uploads/2020/10/Threat-Model-Template-Simple.xlsx

CVSS 3.1 Auto-calculating Model with Automatic Coloring by Severity:
https://www.symbolcrash.com/wp-content/uploads/2020/10/Threat-Model-Template-CVSS-3.1.xlsx

“How to measure anything in cybersecurity risk”
https://www.howtomeasureanything.com/cybersecurity/

CVSS 3.1 Calculator at first.org
https://www.first.org/cvss/calculator/3.1

Automated Secrets Detection:
https://github.com/Yelp/detect-secrets
https://github.com/anshumanbh/git-all-secrets
https://github.com/dxa4481/truffleHog

Old-School SANS Threat Modeling Template Example:
https://www.sans.org/blog/practical-risk-analysis-and-threat-modeling-spreadsheet/

Mentioned Tools:
https://github.com/lyft/cartography
https://github.com/nccgroup/ScoutSuite

C4 model:
https://c4model.com/

What is the Actual Financial Impact of a Breach?
https://www.nber.org/digest/jun18/economic-and-financial-consequences-corporate-cyberattacks
https://www.nber.org/papers/w24409

Threat Modeling Tools that uselessly force everything into a DFD (not recommended):
ThreatModeler – https://threatmodeler.com/
Irius Risk – https://iriusrisk.com/
OWASP ThreatDragon – https://owasp.org/www-project-threat-dragon/
MS Threat Modeling Tool – https://www.microsoft.com/en-us/download/details.aspx?id=49168

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Golang Offensive Tools with C-Sto and capnspacehook

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We talk with some of the most prolific developers of Golang offensive tools, from opposite points on the globe, about why they use Go, what they’ve been working on, how to work around some of Go’s challenges for red teams, and where things are going in the near future with Go malware. Featuring C-Sto (bananaphone/goWMIexec) and capnspacehook (pandorasbox/garble).

List of Golang Security Tools:
https://github.com/Binject/awesome-go-security

C-Sto:
https://github.com/c-sto/goWMIExec
https://github.com/C-Sto/BananaPhone
https://github.com/C-Sto/gosecretsdump

capnspacehook:
https://github.com/capnspacehook/pandorasbox
https://github.com/capnspacehook/taskmaster

Misc:
https://github.com/moonD4rk/HackBrowserData
https://github.com/emperorcow/go-netscan
https://github.com/CUCyber/ja3transport
https://github.com/EgeBalci/sgn
https://github.com/sassoftware/relic
https://github.com/swarley7/padoracle
https://github.com/gen0cide/gscript

Command and Control:
https://github.com/BishopFox/sliver
https://github.com/DeimosC2/DeimosC2
https://github.com/t94j0/satellite

Obfuscation/RE:
https://github.com/goretk/redress
https://github.com/unixpickle/gobfuscate
https://github.com/mvdan/garble

Of interest, but breaks Docker & Terraform:
https://github.com/unsecureio/gokiller

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Interview with Josh Pitts

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We talk with Josh Pitts, creator of The Backdoor Factory, ebowla, and SigThief, about the backstory of some of these tools and the offensive open-source tools debate. Featuring Vyrus and fast Dan.

Pitts Links:
https://github.com/sponsors/secretsquirrel
https://github.com/secretsquirrel/the-backdoor-factory
https://github.com/Genetic-Malware/Ebowla
https://github.com/secretsquirrel/SigThief
https://sec.okta.com/articles/2018/06/issues-around-third-party-apple-code-signing-checks
https://github.com/golang/go/issues/16292

Golang rewrite:
https://binject.github.io/backdoorfactory
https://github.com/Binject/debug

BananaPhone / Hell’s Gate:
https://github.com/C-Sto/BananaPhone

More Code Signature Bypasses:
https://www.securityinbits.com/malware-analysis/interesting-tactic-by-ratty-adwind-distribution-of-jar-appended-to-signed-msi/
dylib TOCTOU: http://powerofcommunity.net/poc2015/pangu.pdf
linux by design: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883949

Copy-Paste Compromises: https://www.cyber.gov.au/sites/default/files/2020-06/ACSC-Advisory-2020-008-Copy-Paste-Compromises.pdf

Other:
https://github.com/vyrus001/go-mimikatz

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Using Binject

Binject is a sweet multipart library, making up several tools for code-caving and backdooring binaries via golang. The project was originally inspired as a rewrite of the backdoor factory in go and now that it’s functional this post will show you how to use it. In this post we are going to explore how you can use the library operationally for a number of tasks. We will start with an example of using some of the command line tools included with the project for the arbitrary backdooring of files. Next we will look at using the library to backdoor a file programmatically. Finally we will use the bdf caplet with bettercap to backdoor some binaries being transmitted on the network, on-the-fly. I want to give a shout out to the homie Vyrus, as a lot of this was inspired by him but in non-public projects, so I can’t link to his stuff. I also want to give a shoutout to Awgh, as he’s been an awesome mentor and powerhouse in implementing a lot of the Binject features. Below you can see the binjection command line tool being used to backdoor an arbitrary windows PE, on Linux. In the next section we will explore some of the command line features of Binject.

Using the command line tools included with Binject is pretty straightforward; the main library Binject/binjection contains a command line interface tool that exposes all of the existing functionality for backdooring files on macOS, Windows, and Linux. Above we can see go-donut being used to turn a gscript program into position independent shellcode, then we use the binjection command line tool to backdoor a Windows PE (a .exe file), all on a Linux OS. The binjection cli tool takes 3 main command line flags, “-f” to specify the target file to backdoor, “-s” to specify a file containing your shellcode in a raw bytecode format, and “-o” specifying where to write your new backdoor file. Optionally you can give a “-l” to write the output to a logfile instead of standard out. You can also specify the injection method to use, although the tool only supports a very limited and mostly default set currently. The binjection cli tool will automatically detect the executable type and backdoor it accordingly. Another library and command line tool included with the framework is Binject/go-donut, which is essentially just a port of TheWover/donut. We can see this being used above to prepare another program to be embedded in our target executable. I really like both of these command line tools because it’s easy to cross compile them for linux or macOS, giving me a really convenient way to generate my target shellcode regardless of what OS I’m operating from. Having the entire tool chain in go allows me to easily move my tools to whatever operating system or use them all together in the same codebase. Even if you’re not familiar with go, you can just as easily compile the cli tools and script them together with something like bash or powershell. Below we can see the binjection cli tool being used to backdoor ELF executables on Linux.

Using binjection programmatically as a go library is also super simple and arguably far more useful because you can now integrate it into so many more projects. The library calls are just as straight forward, basically a single function call depending on the binary type your backdooring. Here we can see it as a standalone example for others to use. We can also see it being implemented here for Windows in Sliver, a golang based c2 framework with tons of features. We can also use binjection in gscript, although it requires this embarrassingly small shim interface. This is insanely powerful functionality to be able to ship in an implant binary, as the implant can now backdoor, already persisted, legitimate binaries on the target system. You can even break down the supporting libraries and use other parts of Binject, like Binject/debug, as a triage tool, which we demonstrate with bintriage. Finally, to bring the project full circle, Binject has been integrated with bettercap for the on-the-fly backdooring of files on the network. It currently accomplishes this using bettercap’s ARP spoofing module, the network proxy module, and a helper tool to manage the file queue, making the whole process really clean. Using the integration is easy with the Binject/backdoorfactory helper tool. Simply follow these usage instructions, which just involves installing all of the necessary prerequisite tools, and then Binject/backdoorfactory will spit out the caplet and command you need you need for bettercap. You can see a demo of all of this together in the video at the end. So now you have a pretty good idea of some different ways you can use Binject. We also encourage people to submit pull requests to the library with new injection methods or even further enumerating the executable types. There is still a lot of work to be done here but you can use the library currently to great effect.

Protesters and Technology feat. Will Scott and Vyrus

This entry is part [part not set] of 24 in the series Hack the Planet

In this episode of the Hack the Planet Podcast:

We are joined in the studio by Vyrus and privacy researcher Will Scott to talk about the dual-edged sword of technology in the context of protests. We dive deep on technical innovations from the Black Lives Matter protests, especially in the areas of software defined radio and crowd-sourcing. Then things slide off the rails in the usual manner.

Radio Links:
https://openmhz.com/
https://github.com/robotastic/trunk-recorder/wiki
https://github.com/szpajder/rtlsdr-airband/wiki
https://www.rtl-sdr.com/using-a-kerberossdr-to-monitor-air-traffic-control-voice-ads-b-acars-vdl2-simultaneously-on-a-raspberry-pi-3b/
https://github.com/unsynchronized/gr-mixalot
https://www.usenix.org/blog/security-analysis-apco-project-25-two-way-radio-system
https://tar1090.adsbexchange.com/

EFF Protest Guide https://ssd.eff.org/en/module/attending-protest
A Good American https://youtu.be/666wsDcoNrU

NFS server https://github.com/willscott/go-nfs
Will at CCC https://media.ccc.de/v/36c3-10565-what_s_left_for_private_messaging

Be a guest on the show! We want your hacker rants! Give us a call on the Hacker Helpline: PSTN 206-486-NARC (6272) and leave a message, or send an audio email to podcast@symbolcrash.com.

Original music produced by Symbol Crash. Warning: Some explicit language and adult themes.

Making Every Cycle Count in the Fight Against COVID

I have asthma, and the quicker we determine the right set biochemical properties of SARS-CoV-2 needed to develop an antiviral or vaccine at mass scale, the quicker I go back to some semblance of normal life. I’ll do quite a bit if I can to help the cause, which is why I’ve gone fairly deep on optimizing how I can increase my Folding@Home throughput. What follows is a whirlwind tour of the best ways I’ve found to increase folding performance, and by extension make my htop graphs look like this:

12 CPU folding threads and 1 GPU folding thread optimally scheduled

Folding@Home is a distributed computing project that simulates biological proteins as their atoms interact with each other. This involves a great number of floating point and vector operations. We’ve discussed this in detail on our previous podcast episode, “Fold, Baby, Fold.”

The high level strategies for increasing throughput include:

Optimizing thread scheduling
Reducing cache contention
Increasing instructions retired per cycle
Maximizing memory and I/O performance relative to NUMA layout
Eliminating wasteful overhead

Overclock Everything

This first point almost goes without saying: but overclock your CPU, your memory, and your GPU wherever possible without overheating or losing stability. (There’s an entirely separate post to be written about my heat management journey to date).

CPU Isolation

The rest of this post is gonna assume you’re in this to win this and willing to sacrifice nearly all of your computing cycles to the viral alchemical overlords who manage the big work unit in the sky. Me? I have 12 physical cores and 24 logical cores to work with on my Threadripper 1920x.

My normal usermode processes get 1 core: core 0. Why core 0? There are multiple IRQs that can’t be moved from that core, and it’s gonna have more context switches than the average core to begin with. Everything else is gonna be managed by me. I use the kernel boot arg “isolcpus=1-23”. I also set “nohz_full=1-23” to prevent the “scheduler tick” from running, which supposedly helps reduce context switches.

NUMA NUMA

It’s no longer 2004 and viral Europop pantomime videos are dead. It’s 2020 and consumer CPUs are letting some abstractions run a little closer to the physical layout of cores on die. Some of your cores might have priority access to one memory channel over another. Various devices on the PCI bus are also assigned priority access to certain cores. You can run the lstopo utility to check out your own configuration. Here’s mine:

Pay special attention to L3 cache layout and PCI device priority

In order for each node to properly be allocated half of the RAM in my system, I had to move one of my two DIMMs to the other side of the motherboard. This was discovered only after searching through random forum posts. There’s no great documentation here from AMD!

NUMA node #1 has access to the GPU on the PCI bus, so any threads managing GPU folding will be assigned to a core in that node.

Pinning the GPU Core

Folding@Home can give you work units to stress the CUDA cores on your overpriced GPU. You deserve to get the most out of your investment. In order to move data around all those CUDA cores, there’s one usermode thread that needs as much processing power as it can get. I pin my threads to logical cores 11 and 23 (both on NUMA node 1). Nothing else will run on those two logical cores (one physical core; logical core number modulo 12 is the physical core number for my Threadripper).

If you pin cores as above, that’s the single best way to improve GPU folding performance without tweaking a single GPU setting. You can do some stuff with renicing processes, but I couldn’t tell you how much that does compared to just giving a whole physical core to GPU folding.

CPU Pinning

Hyperthreading is a convenient white lie. You can try and run two CPU folding threads on one physical core, but unless you have very specialized hardware chances are the floating point unit in each core is going to be a bottleneck that prevents you from actually getting double the performance. Here’s what the backend instruction pipeline looks for the Zen architecture, which is what each of the cores on my TR1920x is based on:

You get a max of four 128-bit floating point operations per cycle. AVX256 instructions are 256 bits wide. That’s your bottleneck. Empirically, I’m not getting much better folding throughput per core by running 1 CPU thread versus 2. Your mileage may vary, but I’ve stuck with only scheduling 1 folding thread 1 CPU core per CPU core except where necessary.

Specifically, of logical cores 0-23, CPU folding currently occurs on core 8 and cores 12-22. I chose core 8 because it does not share an L3 cache with any cores that do GPU folding. I would like to keep L3 cache pressure as low as reasonably possible for those cores. This means physical core 8 (logical cores 8 and 20) has two CPU folding threads scheduled while every other physical core has at most 1. This, in my experience, has been a better arrangement than turning off hyperthreading/SMT in the BIOS settings.

Again, you can do some stuff with scheduler hints but again I can’t tell you if it’s worth it.

numactl

Linux processes can have hints for how to allocate new memory pages to a process relative to the the NUMA layout of a system. The one we want for CPU folding is “localalloc” which says that physical memory must be provided from the local NUMA node of the calling thread. This helps to ensure optimal memory performance. The easiest way to set this for a process is to use the numactl command.

Bypassing Needless Syscalls

If you are a writing a multithreaded application, one way you can have one thread wait for another to complete an action is to check as often as possible if that action is done yet. Another is to say “I’m gonna let some other thread do some work, wake me when it’s done and I’ll do another check.” The latter is what happens when you call the “sched_yield” syscall. Folding at Home CPU cores call that a lot. Probably to be nice to other processes (this is meant to be run in the background after all).

Do 11% of cycles really need to be spent in entry_SYSCALL_64?

The calls are initiated from userland via the sched_yield libc function which is a wrapper for the syscall. Because sched_yield is a dynamically loaded symbol, we can hook the loading with an appropriate LD_PRELOAD setting and force all calls to that function to immediately return 0 without ever yielding to the kernel. This empirically boosts folding throughput by a noticeable amount once you have threads pinned appropriately.

Disable Spectre Mitigations

Cycles spent on preventing Spectre attacks are cycles not spent folding proteins. There can be a non-trivial number of these cycles. The image above showed something like 7% of cycles spent in __x86_indirect_thunk_rax which is a Spectre mitigation construct.

Get rid of them by setting the “mitigations=off” kernel boot argument. Does this argument affect microcode, or do I need to downgrade microcode to fully disable mitigations? I don’t know–the documentation kinda sucks and I haven’t been able to find out!

Keep Your House in Order

Put kernel threads and IRQs on unused cores.

Keep your other usermode threads on core 0 unless you need more parallelism.

Putting It Together

Folding@Home distributes binary files named “FahCore_a7” and “FahCore_22” which nearly all folding work units are processed by. If you rename one to something like “FahCore_a7.orig” and replace the file on disk where “FahCore_a7” originally was with an executable shell script, you can run shell commands on the start of each new work unit. For example, you can set how the folding processes run without needing to poll for created processes. This also allows for LD_PRELOADing libraries into subprocesses.

Additionally, I run a shell script on a cron job set for every 10 minutes that ensures folding processes are properly pinned. A script moves kernel threads and IRQs to unused cores every half hour.

All of this is public at https://github.com/mitchellharper12/folding-scripts

Useful Tuning Tools:

cat for writing things to sysfs
taskset for pinning specific threads to specified CPUs
cset/cpuset for creating meta constructs for managing sets of threads and cores and memory nodes
numactl for setting NUMA flags for processes
renice for changing thread scheduler priority
ionice for changing io priority for threads
chrt for changing which part of the scheduler subsystem is used to manage a given thread
perf stat for getting periodic instructions per cycle data (I like to run watch -n 1 perf stat -C 8,12-22 sleep 1 in a root shell)
perf record for capturing trace data (don’t sleep on the -g flag for collecting stack traces)
perf report for displaying the data from perf record
Reading the perf examples blog post
htop for point-in-time visualization of workload distribution across cores (explanation of colors and symbols)

Applications to Fuzzing and Red Teaming

If you have a spare fuzzing rig or password cracker, running Folding@Home and optimizing thread scheduling is a great way to learn about how your kernel scheduler works. This can help you learn how to schedule threads for your workloads in order to maximize your iterations per second.

Additionally, this might give you some leverage to run untrusted workloads in a VM or container to mitigate Spectre without needing to take the performance hit of kernel mitigations (note we make no claims or warranties on this point).

The Linux perf tools allow for sampling of the behavior of a target thread without attaching a debugger.

Optimize for Throughput

The number 1 most predictive statistic for how well your optimizations are working is “the change over time in how long it takes the Folding@Home logs to update percent complete.” I’ve been running tail -F /var/lib/fahclient/log.txt and counting the average delta between timestamp updates for a given work unit. There are other stats you can try and optimize for, like instructions retired per cycle as reported by perf stat, but that can be misleading if you start over-optimizing for that. Note that when you restart the Folding@Home client, the first reported update in percentage complete needs to be ignored (a work unit can be checkpointed in between percentage updates).

Although these techniques were developed on a Threadripper, they apply to all the Intel Core series laptops I have scattered around my apartment running Folding@Home. You’ll know you’re making progress when the amount of time red bars are visible on htop CPU graphs significantly changes.

A Parting Message

Finally, you can help Symbol Crash help protein researchers by registering your client with folding team 244374. Remember that if you register for a user passkey you are eligible for a quick return bonus.

This post is a bit terse, but that’s because half the fun is building a mental model of how all the pieces fit together! Here’s a thread documenting some of my intermediate progress. The intro to our DMA special has some backstory on this whole effort.

If you have any follow up questions or want to rave about your own performance, you can call the Hacker HelpLine at (206) 486-6272 or send an email to podcast@symbolcrash.com to be featured on an upcoming episode of Hack the Planet!