RFC: PHP microservice for containerized shell execution
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	tstarling
	Aug 13 2020, 7:38 AM

Details

Subject	Repo	Branch	Lines +/-
ci::master/deployment_server: add new k8s namespace for shellbox	operations/puppet	production	+12 -2
Migrate to BoxedCommand/Shellbox	mediawiki/extensions/Score	master	+746 -508
Use Shellbox for Shell::command() etc.	mediawiki/core	master	+368 -1 K
Add wikimedia/shellbox	mediawiki/vendor	master	+13 K -7
Add basic pipeline configuration	mediawiki/libs/Shellbox	master	+4 K -3
[WIP] Migrate to BoxedCommand	mediawiki/extensions/Score	master	+144 -186
Shellbox initial source files	mediawiki/libs/Shellbox	master	+6 K -3
Add BoxedCommand abstraction and Shellbox client	mediawiki/core	master	+211 -14
Library for containerized shell execution (initial commit)	mediawiki/libs/Shellbox	master	+86 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Joe	T252745 Sandbox/limit child processes within a container runtime
Resolved	tstarling	T260330 RFC: PHP microservice for containerized shell execution
Resolved	jeena	T261369 Deployment infrastructure for PHP microservices
Resolved	jeena	T261783 Blubber composer/PHP support
Resolved	Legoktm	T270660 Set up PHP security scanning for Shellbox
Resolved	Legoktm	T263295 Setup Git repo and CI for shellbox
Resolved	hashar	T263313 Provide a job template for phan jobs for php libraries
Resolved	dduvall	T270654 Cannot publish Shellbox image due to uppercase letter
Resolved	Legoktm	T270656 Shellbox PHPUnit coverage job is failing
Open	None	T263545 Decide on logging in k8s for ShellBox
Resolved	BPirkle	T263816 Provide direct access to a Guzzle HTTP client
Resolved	tstarling	T267531 Shellbox Windows support
Resolved	tstarling	T267532 Shellbox MediaWiki integration
Resolved	tstarling	T267530 Shellbox command validation
Open	None	T268427 Make Shellbox actually do streaming
Open	None	T271179 Have Shellbox emit metrics
Resolved	Daimona	T273965 Add taintedness data for new methods in Shellbox
Resolved	Legoktm	T281423 New Service Request Shellbox
Resolved	Joe	T271822 Add support for scraping php applications to the kubernetes prometheus scraper
Resolved	Joe	T283774 httpd-fcgi image is failing validation using its own test.sh
Resolved	Legoktm	T286384 Benchmark Shellbox

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 622707 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/libs/Shellbox@master] [WIP] Shellbox

https://gerrit.wikimedia.org/r/622707

gerritbot added a project: Patch-For-Review.Aug 27 2020, 7:29 AM

tstarling updated the task description. (Show Details)Aug 28 2020, 5:19 AM

• AMooney edited projects, added MW-on-K8s; removed Platform Team Workboards (Clinic Duty Team).Sep 1 2020, 2:36 PM

dr0ptp4kt mentioned this in T261964: Agree on architecture for first evaluation engine.Sep 3 2020, 4:01 PM

Naike moved this task from Backlog to In Progress on the MW-on-K8s board.Sep 8 2020, 3:30 PM

One outstanding question is what to do about the restrictions bitfield. In production, firejail will be disabled and restrictions will be ignored. But in other situations, presumably some effort should be made to respect them. Obviously NO_LOCALSETTINGS doesn't make sense, so the new wrapper in MediaWiki will respond to that by adding LocalSettings.php to a blacklist array.

In the WIP patch, restriction wrappers can be plugged in, so I used strings instead of bits, since I figured that would be more easily extensible. I kept the boolean semantics though, and I'm having second thoughts about that. Should restrictions just be replaced by an associative array of wrapper options?

There are some nice things about bits that maybe we don't want to lose. There's some validation due to the fact that constants are always used. Another alternative to using an associative array is to just have getters and setters for everything in Command. I've already taken this approach in the WIP patch to unpack limits() into separate fields. There are only 5 restrictions currently defined in MediaWiki\Shell\Shell, so converting them to getters and setters is quite feasible. But maybe it makes the calling code a little longer.

The lack of a stability guarantee in RESTRICT_DEFAULT is problematic for a library, which is why the library default in the WIP patch is to apply no restrictions. MediaWiki can apply default restrictions in its factory.

tstarling moved this task from P1: Define to P4: Tune on the TechCom-RFC board.Sep 9 2020, 8:58 PM

Note that Shellbox depends on monolog and core will depend on shellbox, so this will make core indirectly depend on monolog. In core at the moment, monolog is optional (although it's a dev dependency). I think loose integration between core's LoggerFactory and monolog is the source of a lot of nuisances, so I'm happy to add it as a core dependency, as part of a plan to more tightly integrate core's logging with monolog in future. Monolog is fairly large (6 kloc) but doesn't have any secondary dependencies.

Regarding restrictions again. I'm not a big fan of Firejail after my recent code review and bug reports, so I'm looking at the restriction options with a view to making them independent of the sandboxing system. Shell::SECCOMP is awkward in this respect since it disables a firejail-specific set of syscalls. It also also implies no_new_privs, disabling setuid-root executables, which seems like it should be a separate option.

Several callers do leave out Shell::SECCOMP, for example with restrict(Shell::RESTRICT_NONE). I guess for b/c the simplest thing is to have Shell::SECCOMP map to more specific options:

public function restrict( $restrictions ) {
    if ( $restrictions & Shell::SECCOMP ) {
        $this->firejailDefaultSeccomp( true );
        $this->noNewPrivs( true );
    } else {
        $this->firejailDefaultSeccomp( false );
        $this->noNewPrivs( false );
    }
    ...
}

This is assuming I follow my previous idea of splitting out the restrictions bits into boolean options with their own mutators. Here Shell::SECCOMP implies noNewPrivs() so that if a wrapper supports noNewPrivs() but not firejailDefaultSeccomp(), Shell::SECCOMP will activate noNewPrivs().

Change 626548 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] [WIP] Use Shellbox for Shell::command() etc.

https://gerrit.wikimedia.org/r/626548

Change 626925 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Add BoxedCommand abstraction and Shellbox client

https://gerrit.wikimedia.org/r/626925

An open question is what to do about shell pipelines. Currently if you do Shell::command()->unsafeParams('foo|bar') then foo will run under firejail and bar will run unsandboxed. Maybe that's accidental? We could do firejail sh -c "foo|bar" but that doesn't work with NO_EXEC. We could parse the command line and disallow pipelines and lists in BoxedCommand, but that would probably lead to non-portable code as people set up their own shell wrappers. Or we could parse the command line and run each component separately under firejail, but that's the most work, and I'm not sure if anything really needs it. Or we could leave it as it is.

Probably some sort of command parsing will be needed to implement validation. Server-side command validation is an unimplemented requirement.

Change 627635 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/libs/Shellbox@master] Library for containerized shell execution (initial commit)

https://gerrit.wikimedia.org/r/627635

This RFC is up for public discussion today at 21:00 UTC (23:00 CEST, 2pm PDT).
The discussion is taking place on IRC, in the #wikimedia-office channel on freenode.

First of all, cool stuff!

Second, I noticed the following:

I previously considered making the service be aware of Swift, but I don't think there is much benefit after T260504 is done.

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Third, might it be possible to run a daemon on a spawned box for things that benefit from being warmed up? Or are the commands always supposed to be run cold (even if the box speculatively warmed up)?

Thanks @tstarling et al for the hard work, effort, and updates.

In T260330#6408193, @tstarling wrote:

Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an attacker can use a command to exfiltrate the key and then spoof requests. If it's not possible, maybe we should think about asymmetric encryption.

The kernel keyring can have a key loaded onto it (via keyctl) which is usable for crypto operations (via AF_ALG sockets) but the key itself cannot be accessed by userspace.

In T260330#6467549, @dpifke wrote:

The kernel keyring can have a key loaded onto it (via keyctl) which is usable for crypto operations (via AF_ALG sockets) but the key itself cannot be accessed by userspace.

Can this be used from inside a docker container?

In T260330#6467753, @daniel wrote:

Can this be used from inside a docker container?

I've used it with LXC containers (each container having its own keyring namespace), so it should also be possible with Docker.

In T260330#6458637, @tstarling wrote:

An open question is what to do about shell pipelines.

I didn't see any shell pipelines in your caller survey and can't think of any off hand - is this a usecase we need to support?

Currently if you do Shell::command()->unsafeParams('foo|bar') then foo will run under firejail and bar will run unsandboxed. Maybe that's accidental? We could do firejail sh -c "foo|bar" but that doesn't work with NO_EXEC. We could parse the command line and disallow pipelines and lists in BoxedCommand, but that would probably lead to non-portable code as people set up their own shell wrappers. Or we could parse the command line and run each component separately under firejail, but that's the most work, and I'm not sure if anything really needs it. Or we could leave it as it is.

Yeah, that's accidental. I'd lean toward disallowing pipelines and require them to be implemented in PHP itself, though that might be problematic if whatever intermediate output is very large and can't be buffered like a real pipeline would.

In T260330#6468248, @Legoktm wrote:

I didn't see any shell pipelines in your caller survey and can't think of any off hand - is this a usecase we need to support?

I mentioned in the survey that DjvuHandler has a pipeline. It's a case that really benefits from it -- an enormous uncompressed image being compressed to JPEG before storing it to Swift.

Score has a single input and typically runs 5 separate shell commands to generate various outputs from that input. One of the intermediate files (a .wav) is potentially large. So a remote script file would be ideal for Score.

I've been thinking about how to make this cross-platform. Microsoft's Azure documentation has some pretty blunt advice on cross-platform scripting: "just use bash". We could potentially follow this advice and require bash on Windows.

• Niedzielski subscribed.Sep 18 2020, 2:10 PM

Change 627635 merged by Ppchelko:
[mediawiki/libs/Shellbox@master] Library for containerized shell execution (initial commit)

https://gerrit.wikimedia.org/r/627635

tstarling updated the task description. (Show Details)Sep 24 2020, 11:55 PM

tstarling mentioned this in T263816: Provide direct access to a Guzzle HTTP client.Sep 25 2020, 12:13 AM

Change 630016 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/extensions/Score@master] [WIP] Migrate to BoxedCommand

https://gerrit.wikimedia.org/r/630016

Change 630017 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/extensions/Score@master] [WIP] Scripted score rendering

https://gerrit.wikimedia.org/r/630017

I uploaded the Score changes to give you an idea of what a moderately complex caller looks like in practice. It's lightly tested but seems to work for me.

Change 626925 abandoned by Tim Starling:
[mediawiki/core@master] Add BoxedCommand abstraction and Shellbox client

Reason:
I squashed it with the parent since I kept conflicting with myself.

https://gerrit.wikimedia.org/r/626925

BPirkle closed subtask T263816: Provide direct access to a Guzzle HTTP client as Resolved.Oct 26 2020, 9:05 PM

• AMooney added a project: Platform Team Workboards (Purple).Nov 5 2020, 6:57 PM

• AMooney moved this task from Backlog to Doing on the Platform Team Workboards (Purple) board.Nov 5 2020, 7:28 PM

tstarling added a subtask: T267531: Shellbox Windows support.Nov 9 2020, 10:31 AM

tstarling added a subtask: T267532: Shellbox MediaWiki integration.Nov 9 2020, 10:36 AM

tstarling added a subtask: T267530: Shellbox command validation.

Change 622707 merged by jenkins-bot:
[mediawiki/libs/Shellbox@master] Shellbox initial source files

https://gerrit.wikimedia.org/r/622707

Put on Last Call until 2 December.

Task description edit: added "backwards compatibility" section explaining how this will work for default installations.

HarnashPL subscribed.Nov 19 2020, 9:50 AM

@tstarling, I was wondering if you could please weigh in on my questions above?

In the context of Wikifunctions one thought with needing to support multiple programming languages is that of fanning out sandboxed code execution, at least in part, with this sort of tech. There are alternative open source options that seem to support many of our needs, but there's the obvious opportunity cost of needing to keep those things up to date, too.

In T260330#6467414, @dr0ptp4kt wrote:

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Sure, it's just that I can only find one thing that takes the result of a command and puts it into Swift, and that is Score. Maybe Score should be migrated to do it the same way as everything else, rather than adding more things that work like Score. Score blocks wikitext parsing while command execution and Swift upload are in progress, which is not a great architecture. If Shellbox can upload directly to Swift, then that makes it privileged, partly defeating the purpose of having it. The current plan is that Swift credentials will not be accessible to sandboxed code.

Third, might it be possible to run a daemon on a spawned box for things that benefit from being warmed up? Or are the commands always supposed to be run cold (even if the box speculatively warmed up)?

I don't understand what you mean by this. Commands don't warm up. We ideally want commands to be fully isolated from each other so that they can't attack each other. It sounds like you want a service runner, rather than a command runner.

In the context of Wikifunctions one thought with needing to support multiple programming languages is that of fanning out sandboxed code execution, at least in part, with this sort of tech. There are alternative open source options that seem to support many of our needs, but there's the obvious opportunity cost of needing to keep those things up to date, too.

It would be pretty easy to add an async execute function which returns a promise instead of a plain result, if that's what you need. The Guzzle client has an async entry point which returns a promise, which can be extended by chaining. The implementation is not great -- I think you would hit performance/robustness problems if you had 100 requests in flight. For a few requests it should probably work.

In T260330#6639403, @tstarling wrote:

In T260330#6467414, @dr0ptp4kt wrote:

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Sure, it's just that I can only find one thing that takes the result of a command and puts it into Swift, and that is Score. Maybe Score should be migrated to do it the same way as everything else, rather than adding more things that work like Score. Score blocks wikitext parsing while command execution and Swift upload are in progress, which is not a great architecture. If Shellbox can upload directly to Swift, then that makes it privileged, partly defeating the purpose of having it. The current plan is that Swift credentials will not be accessible to sandboxed code.

I would be against having this service have side effects (and privileged access to swift).

This service is more secure if it has access to no secret whatsoever, and also this way we avoid overstretching its goals.

Thanks @tstarling and @Joe. It sounds like this isn't the right fit for Wikifunctions. I appreciate the advice!

Change 630016 abandoned by Tim Starling:
[mediawiki/extensions/Score@master] [WIP] Migrate to BoxedCommand

Reason:
squashed

https://gerrit.wikimedia.org/r/630016

Reedy mentioned this in T257066: Extension:Score / Lilypond is disabled on all wikis.Nov 30 2020, 4:09 PM

Seppl2013 subscribed.Dec 5 2020, 2:26 PM

For the extensions https://www.mediawiki.org/wiki/Extension:Diagrams and https://www.mediawiki.org/wiki/Extension:Piwo the approach discussed here might be very useful.

In the proposal it says:
Have a PHP microservice, accessible via HTTP, which takes POSTed inputs, writes them to the container's filesystem as temporary files, runs a shell command, and responds with gathered output files.

Why would this have to be a PHP microservice? This would be a good opportunity to open up MediaWiki to all kinds of services written in different programming languages and avoid the dreadful dependencies on PHP and Lua that lock out so many options available out there. I think a good security model for such services is the main aspect to be offered by the service infrastructure. Whether the service will call binaries somewhere or not is IMHO just a side discussion.

The service is mainly for executing shell commands. Right now that happens through wfShellExec. Typically to invoke programs such as git, imagemagic, ffmpeg, as well as in the case of Syntaxhighlight to invoke a Python program.

We have effectively taken the PHP function wfShellExec and turned it into a service so that execution can happen on another server. What language this middleware service is written it doesn't really matter as it is transparent to any program or binary run through it. The service is minimal and with no external dependencies, PECL packages, or Lua etc and should be trivial to deploy as such, including from a security perspective.

As being a modern and optional service connected through service wiring in MW, if you did want to have the middleware facilitated by a different language for some reason, one could replace that through an extension so long as it implements the same interface.

In T260330#6632099, @Krinkle wrote:

Put on Last Call until 2 December.

This RFC has been approved and is now closed.

Krinkle closed this task as Resolved.Dec 9 2020, 9:24 PM

Krinkle moved this task from Untriaged to Approved on the TechCom-RFC (TechCom-RFC-Closed) board.

Can the task stay open to track implementation? The RFC workboard has "Approved" and "Implemented" columns so I figured it would progress through those while still being open.

Given the title and task description, I assumed it was a dedicated task, but I see it's used as tracking task indeed. Sorry about that.

• wkandek subscribed.Dec 10 2020, 11:06 PM

tstarling updated the task description. (Show Details)Dec 15 2020, 12:55 AM

Change 649651 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[mediawiki/libs/Shellbox@master] Add basic pipeline configuration

https://gerrit.wikimedia.org/r/649651

Lucas_Werkmeister_WMDE mentioned this in T214378: Check simple format constraints (no grouping) in PHP instead of SPARQL.Dec 18 2020, 4:04 PM

Change 649651 merged by jenkins-bot:
[mediawiki/libs/Shellbox@master] Add basic pipeline configuration

https://gerrit.wikimedia.org/r/649651

Legoktm mentioned this in Shellbox.Dec 21 2020, 9:37 PM

Legoktm added a project: Shellbox.Dec 21 2020, 9:58 PM

Tacsipacsi subscribed.Dec 26 2020, 10:31 PM

Legoktm closed subtask T263295: Setup Git repo and CI for shellbox as Resolved.Jan 5 2021, 10:54 PM

thcipriani closed subtask T261369: Deployment infrastructure for PHP microservices as Resolved.Jan 22 2021, 10:40 PM

Change 661239 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/vendor@master] Add wikimedia/shellbox

https://gerrit.wikimedia.org/r/661239

Change 661239 merged by jenkins-bot:
[mediawiki/vendor@master] Add wikimedia/shellbox

https://gerrit.wikimedia.org/r/661239

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.30; 2021-02-09).Feb 4 2021, 12:00 AM

Daimona added a subtask: T273965: Add taintedness data for new methods in Shellbox.Feb 5 2021, 11:41 AM

Change 626548 merged by jenkins-bot:
[mediawiki/core@master] Use Shellbox for Shell::command() etc.

https://gerrit.wikimedia.org/r/626548

Change 630017 merged by jenkins-bot:
[mediawiki/extensions/Score@master] Migrate to BoxedCommand/Shellbox

https://gerrit.wikimedia.org/r/630017

Daimona closed subtask T273965: Add taintedness data for new methods in Shellbox as Resolved.Mar 19 2021, 7:46 PM

Addshore mentioned this in T176312: Don’t check format constraint via SPARQL (safely evaluating user-provided regular expressions).Apr 20 2021, 12:30 PM

Change 683111 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] ci::master/deployment_server: add new k8s namespace for shellbox

https://gerrit.wikimedia.org/r/683111

Change 683111 merged by Dzahn:

[operations/puppet@production] ci::master/deployment_server: add new k8s namespace for shellbox

https://gerrit.wikimedia.org/r/683111

Legoktm mentioned this in T281423: New Service Request Shellbox.Apr 28 2021, 10:02 PM

Dzahn added a subtask: T281423: New Service Request Shellbox.Apr 28 2021, 10:44 PM

Aschroet subscribed.May 21 2021, 8:30 PM

The 1.36 release notes say that "Command::execute() now returns a Shellbox\Command\UnboxedResult instead of a MediaWiki\Shell\Result. Any type hints should be updated."

Am I right in thinking that this means that extension's methods that return a Result should be changed to not have a return type hint, in order to maintain backwards compatibility? It seems the class alias for Result doesn't work for type hints (it throws "must be an instance of MediaWiki\Shell\Result, instance of Shellbox\Command\UnboxedResult returned").

Gwendal subscribed.Jul 3 2021, 2:21 PM

R4356th subscribed.Jul 14 2021, 8:33 AM

Legoktm closed subtask T281423: New Service Request Shellbox as Resolved.Aug 20 2021, 9:43 PM

tstarling closed subtask T267530: Shellbox command validation as Resolved.Nov 23 2021, 11:50 PM

SD0001 subscribed.Jan 9 2022, 4:47 AM

tstarling closed this task as Resolved.Jul 21 2022, 2:47 AM

tstarling closed subtask T267532: Shellbox MediaWiki integration as Resolved.

tstarling closed subtask T267531: Shellbox Windows support as Resolved.Jul 21 2022, 2:51 AM

Vedmaka subscribed.Jan 12 2023, 4:04 PM

Fuzheado subscribed.Apr 17 2024, 3:11 PM

Maintenance_bot removed a project: Patch-For-Review.Apr 17 2024, 3:31 PM

Krinkle moved this task from Approved to Implemented on the TechCom-RFC (TechCom-RFC-Closed) board.Aug 7 2024, 4:07 PM