The mock discussion still misses the real solution, which is to refactor the code so that you have a function that simple reads the file and returns json that is essentially a wrapper around open and doesn't need to be tested.
Then have your main function take in that json as a parameter (or class wrapping that json).
Then your code becomes the ideal code. Stateless and with no interaction with the outside world. Then it's trivial to test just like and other function that is simple inputs translated outputs (ie pure).
Every time you see the need for a mock, you're first thought should be "how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it and push it to the boundary of my codebase?"
Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
Funnily enough, I am preparing a simple presentation at work to speak about exactly that. The idea of separating "logic" from I/O and side effects is an old one and can be found in many architectures (like hexagonal architecture). There is plenty of benefit doing this, but testing is a big one.
It should be obvious, but this is not something that seem to be thought in school or in most workplaces, and when it is, it's often through the lens of functional programming, which most just treat as a curiosity and not a practical thing to use at work. So I started to teach this simple design principle to all my junior dev because this is something that is actually quite easy to implement, does not need a complete shift of architecture/big refactor when working on existing code, and is actually practical and useful.
> Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
In practice the issues I see with this are that the "side effect" part is usually either: extensive enough to still justify mocking around testing it, and also intertwined enough with your logic to be hard to remove all the "pure" logic. I rarely see 90-95% of functions being pure logic vs side effects.
E.g. for the first, you could have an action that requires several sequenced side effects and then your "wrapper over APIs" still needs validation of calling the right APIs in the right oder with the right params, for various scenarios. Enter mocks or fakes. (And sometimes people will get clever and say use pubsub or events for this, but... you're usually just making the full-system-level testing there harder, as well as introducing less determinism around your consistency.)
For the second, something like "do steps I and J. If the API you call in step J fails, unwind the change in I." Now you've got some logic back in there. And it's not uncommon for the branching to get more complex. Were you building everything in the system from first principles, you could try to architect something where I and J can be combined or consolidated in a way to work around this; when I and J are third party dependencies, that gets harder.
I agree with you, however convincing an entire team of devs to explicitly separate the interface of impure parts of code is very difficult.
If you introduce a mocking library to the test portion of the codebase, most developers will start to use it as a way to shortcut any refactoring they don't want to do. I think articles like this that try to explain how to better use mocks in tests are useful, although I wish they weren't necessary.
I always liked the phrase 'Hoist your I/O' [1] but yes, you can only hoist it up so many times until its outside of your application completely (making it completely pure, and now someone else's responsibility).
This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.
> If you are mocking anything in the standard library your code is probably structured poorly.
I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.
These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.
Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.
I think when testing code with an open call, it is a good idea to test what happens on different return values of open. If that is not what you intent to test for this method, then that method shouldn't contain open at all, as already pointed out by other comments.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do.
It does. And this is exactly the problem, here!
> TFA: The thing we want to avoid is opening a real file
No! No, no, no! You do not 'want to avoid opening a real file' in a test.
It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.
Details matters, but good test doubles here are important. You want to capture all calls to IO and do something different. You don't want tests to break because someone has a different filesystem, didn't set their home directory as you want it setup, or worse is trying to run two different tests at the same time and the other test is changing files the other wants.
Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).
In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.
Even in the case you mention you really shouldn't be overriding these methods. Your load settings method should take the path of the settings file as an argument, and then your test can set up all the fake files you want with something like python's tempfile package
> In Why your mock doesn’t work I explained this rule of mocking:
> Mock where the object is used, not where it’s defined.
For anyone looking for generic advice, this is a quirk of python due to how imports work in that language (details in the linked post) and shouldn't be considered universal.
If you make the function pure it will be easier to test. Pass the moving parts as function parameters, then you can pass in the mocks in the actual functions when testing. Example:
Great article. In addition, updating your mocking code can often be time-consuming. To try to make this easier, I built mock[1], which streamlines the process of setting up mock services for testing.
If you’re doing TDD, you could just view this as moving the “open” call to your unit test. As others point out, that encourages pure functions that can pipe in input from other sources than just file paths.
Honestly I don't buy it.
Worse, this is one of the reason I prefer to do "minimal integration tests" instead of unit tests.
Take the example snippet of code
>>> The thing we want to avoid is opening a real file
and then the article goes and goes around patching stdlib stuff etc.
But instead I would suggest the real way to test it is to actually create the damn file, fill it with the "normal" (fixed) content and then run the damn test.
This is because after years of battling against mocks of various sort I find that creating the "real" resource is actually less finicky than monkeypatching stuff around.
Apart from that; yeah, sure the code should be refactored and the paths / resources moved out of the "pure logical" steps, but 1) this is an example and 2) this is the reality of most of the actual code, just 10x more complex and 100x more costly to refactor.
That works fine for files, but what if the integration is with a third party service for example?
You can create an actual mock networked service but it's much more work.
I think this is an example explaining what seems like a good practice for using mocks in python to me, the actual code in the post is barely "supporting cast".
If it's HTTP you can create the fixtures and serve them with a mock server. I'm a frontend dev, so backend APIs are like 3rd parties to me.
I use a browser extension for scraping actual backend responses, which downloads them with a filename convention the mock server understands. I mostly use it for development, but also for setting up screenshot tests. For example,
Arguably this is a problem in when the patch is unapplied.
Presumably in the coverage case it’s being called by a trace function, which inevitably runs during test execution — and while we want the trace function to be called during the test function, we really want it without any patches the test function is using. But this arguably requires both an ability for the trace function to opt-out of patches and for the patcher to provide a way to temporarily disable all of them.
Why even mock anything in this example? You need to read the source code to work out what to mock, reaching deep inside the code to name some method to mock.
But what if you just passed in the contents of the file or something?
Edit: oh wait actually this is what the very last line in the blog post says! But I think it should be emphasized more!
I feel like the #1 reason mocks break looks nothing like this and instead looks like: you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks. Which highlights how awful mocking as a concept is; it is of truly limited usefulness for anything but the most brittle of tests.
Don't test the wrong things; if you care about some precondition, that should be an input. If you need to measure a side effect, that should be an output. Don't tweak global state to do your testing.
> you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks
Rarely should a mock be “interacting with the underlying code”, because it should be a dead end that returns canned data and makes no other calls.
If your mock is calling back into other code you’ve probably not got a mock but some other kind of “test double”. Maybe a “fake” in Martin Fowler’s terminology.
If you have test doubles that are involved in a bunch of calls back and forth between different pieces of code then there’s a good chance you have poorly factored code and your doubles are complex because of that.
Now, I won’t pretend changes don’t regularly break test doubles, but for mocks it’s usually method changes or additions and the fix is mechanical (though annoying). If your mocks are duplicating a bunch of logic, though, then something else is going on.
Most of the real world is about manipulating the real world. For algorithms it is fine to say depend on the pure inputs/outputs. However what we care about is that global state is manipulated correctly and so the integration tests that verify that are what are important. In most cases your algorithm shouldn't be unit tested separately since it is only used in one place and changes when the users change: there is no point in extra tests. If the algorithm is used in many places comprehensive unit tests are important, but they get in the way when the algorithm is used only once and so the tests just inhibit changes to the algorithm as requirements change (you have to change the user, the integration tests, and the unit tests that are redundant).
As such I disagree. Global state is what you should be testing - but you need to be smart about it. How you setup and verify global state matters. Don't confuse global state above with global state of variables, I mean the external state of the program before and after, which means network, file, time, and other IO things.
IO and global state is also just inputs that can be part of arrange-act-assert. Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Again I've heard "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time", in which case you ought to take a moment to write down what you're trying to accomplish, rather than using mocks to pave over poor architectural decisions. Eventually, the query fails and the mock succeeds, because they were completely unrelated.
So far I've only seen mocks fail eventually and mysteriously. With setups and DI you can treat things mostly as a black box from a testing point of view, but when mocks are involved you need surgical precision to hit the right target at the right time.
The main reason why your mock breaks later is because you refactored the code. You did the one thing tests are supposed to help you do, and the tests broke. If code was never modified, you wouldn't need automated tests. You'd just test it manually one time and never touch it again. The whole point of tests is you probably will rewrite internals later as you add new features, improve performance or just figure out better ways to write things. Mock-heavy tests are completely pointless in this respect. You end up rewriting the code and the test every time you touch it.
There are really only a few reasons to use mocks at all. Like avoiding network services, nondeterminism, or performance reasons. If you need to do a lot of mocking in your tests this is a red flag and a sign that you could write your code differently. In this case you could just make the config file location an optional argument and set up one in a temp location in the tests. No mocking required and you're testing the real API of the config file module.
The mock discussion still misses the real solution, which is to refactor the code so that you have a function that simple reads the file and returns json that is essentially a wrapper around open and doesn't need to be tested.
Then have your main function take in that json as a parameter (or class wrapping that json).
Then your code becomes the ideal code. Stateless and with no interaction with the outside world. Then it's trivial to test just like and other function that is simple inputs translated outputs (ie pure).
Every time you see the need for a mock, you're first thought should be "how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it and push it to the boundary of my codebase?"
Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
Funnily enough, I am preparing a simple presentation at work to speak about exactly that. The idea of separating "logic" from I/O and side effects is an old one and can be found in many architectures (like hexagonal architecture). There is plenty of benefit doing this, but testing is a big one.
It should be obvious, but this is not something that seem to be thought in school or in most workplaces, and when it is, it's often through the lens of functional programming, which most just treat as a curiosity and not a practical thing to use at work. So I started to teach this simple design principle to all my junior dev because this is something that is actually quite easy to implement, does not need a complete shift of architecture/big refactor when working on existing code, and is actually practical and useful.
> Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
In practice the issues I see with this are that the "side effect" part is usually either: extensive enough to still justify mocking around testing it, and also intertwined enough with your logic to be hard to remove all the "pure" logic. I rarely see 90-95% of functions being pure logic vs side effects.
E.g. for the first, you could have an action that requires several sequenced side effects and then your "wrapper over APIs" still needs validation of calling the right APIs in the right oder with the right params, for various scenarios. Enter mocks or fakes. (And sometimes people will get clever and say use pubsub or events for this, but... you're usually just making the full-system-level testing there harder, as well as introducing less determinism around your consistency.)
For the second, something like "do steps I and J. If the API you call in step J fails, unwind the change in I." Now you've got some logic back in there. And it's not uncommon for the branching to get more complex. Were you building everything in the system from first principles, you could try to architect something where I and J can be combined or consolidated in a way to work around this; when I and J are third party dependencies, that gets harder.
I agree with you, however convincing an entire team of devs to explicitly separate the interface of impure parts of code is very difficult.
If you introduce a mocking library to the test portion of the codebase, most developers will start to use it as a way to shortcut any refactoring they don't want to do. I think articles like this that try to explain how to better use mocks in tests are useful, although I wish they weren't necessary.
"sans-I/O" is one term for that style. I like it a lot but it's not a free lunch.
I always liked the phrase 'Hoist your I/O' [1] but yes, you can only hoist it up so many times until its outside of your application completely (making it completely pure, and now someone else's responsibility).
[1] https://www.youtube.com/watch?v=PBQN62oUnN8
Or "functional core, imperative shell".
This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.
> If you are mocking anything in the standard library your code is probably structured poorly.
I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.
These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.
Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.
[1] https://hynek.me/articles/what-to-mock-in-5-mins/
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.
All rules exist to be broken in the right circumstances. But in 99.9% of test code, there's no reason to do any of that.
I think when testing code with an open call, it is a good idea to test what happens on different return values of open. If that is not what you intent to test for this method, then that method shouldn't contain open at all, as already pointed out by other comments.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do.
It does. And this is exactly the problem, here!
> TFA: The thing we want to avoid is opening a real file
No! No, no, no! You do not 'want to avoid opening a real file' in a test.
It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.
Details matters, but good test doubles here are important. You want to capture all calls to IO and do something different. You don't want tests to break because someone has a different filesystem, didn't set their home directory as you want it setup, or worse is trying to run two different tests at the same time and the other test is changing files the other wants.
Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).
In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.
Your tests are either hermetic, or they're flaky.
That means the test environment needs to be defined and versioned with the code.
Even in the case you mention you really shouldn't be overriding these methods. Your load settings method should take the path of the settings file as an argument, and then your test can set up all the fake files you want with something like python's tempfile package
> In Why your mock doesn’t work I explained this rule of mocking:
> Mock where the object is used, not where it’s defined.
For anyone looking for generic advice, this is a quirk of python due to how imports work in that language (details in the linked post) and shouldn't be considered universal.
If you make the function pure it will be easier to test. Pass the moving parts as function parameters, then you can pass in the mocks in the actual functions when testing. Example:
refactor for easier testing in your test you can now mock a and bGreat article. In addition, updating your mocking code can often be time-consuming. To try to make this easier, I built mock[1], which streamlines the process of setting up mock services for testing.
https://dhuan.github.io/mock/latest/examples.html
If you’re doing TDD, you could just view this as moving the “open” call to your unit test. As others point out, that encourages pure functions that can pipe in input from other sources than just file paths.
Honestly I don't buy it. Worse, this is one of the reason I prefer to do "minimal integration tests" instead of unit tests. Take the example snippet of code
and the very first comment just below>>> The thing we want to avoid is opening a real file
and then the article goes and goes around patching stdlib stuff etc.
But instead I would suggest the real way to test it is to actually create the damn file, fill it with the "normal" (fixed) content and then run the damn test.
This is because after years of battling against mocks of various sort I find that creating the "real" resource is actually less finicky than monkeypatching stuff around.
Apart from that; yeah, sure the code should be refactored and the paths / resources moved out of the "pure logical" steps, but 1) this is an example and 2) this is the reality of most of the actual code, just 10x more complex and 100x more costly to refactor.
That works fine for files, but what if the integration is with a third party service for example?
You can create an actual mock networked service but it's much more work.
I think this is an example explaining what seems like a good practice for using mocks in python to me, the actual code in the post is barely "supporting cast".
If it's HTTP you can create the fixtures and serve them with a mock server. I'm a frontend dev, so backend APIs are like 3rd parties to me.
I use a browser extension for scraping actual backend responses, which downloads them with a filename convention the mock server understands. I mostly use it for development, but also for setting up screenshot tests. For example,
screenshot the app and pixel diff it screenshot…> I use a browser extension for scraping actual backend responses
Can you tell the name of the extension ?
Arguably this is a problem in when the patch is unapplied.
Presumably in the coverage case it’s being called by a trace function, which inevitably runs during test execution — and while we want the trace function to be called during the test function, we really want it without any patches the test function is using. But this arguably requires both an ability for the trace function to opt-out of patches and for the patcher to provide a way to temporarily disable all of them.
Why even mock anything in this example? You need to read the source code to work out what to mock, reaching deep inside the code to name some method to mock.
But what if you just passed in the contents of the file or something?
Edit: oh wait actually this is what the very last line in the blog post says! But I think it should be emphasized more!
I feel like the #1 reason mocks break looks nothing like this and instead looks like: you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks. Which highlights how awful mocking as a concept is; it is of truly limited usefulness for anything but the most brittle of tests.
Don't test the wrong things; if you care about some precondition, that should be an input. If you need to measure a side effect, that should be an output. Don't tweak global state to do your testing.
> you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks
Rarely should a mock be “interacting with the underlying code”, because it should be a dead end that returns canned data and makes no other calls.
If your mock is calling back into other code you’ve probably not got a mock but some other kind of “test double”. Maybe a “fake” in Martin Fowler’s terminology.
If you have test doubles that are involved in a bunch of calls back and forth between different pieces of code then there’s a good chance you have poorly factored code and your doubles are complex because of that.
Now, I won’t pretend changes don’t regularly break test doubles, but for mocks it’s usually method changes or additions and the fix is mechanical (though annoying). If your mocks are duplicating a bunch of logic, though, then something else is going on.
Most of the real world is about manipulating the real world. For algorithms it is fine to say depend on the pure inputs/outputs. However what we care about is that global state is manipulated correctly and so the integration tests that verify that are what are important. In most cases your algorithm shouldn't be unit tested separately since it is only used in one place and changes when the users change: there is no point in extra tests. If the algorithm is used in many places comprehensive unit tests are important, but they get in the way when the algorithm is used only once and so the tests just inhibit changes to the algorithm as requirements change (you have to change the user, the integration tests, and the unit tests that are redundant).
As such I disagree. Global state is what you should be testing - but you need to be smart about it. How you setup and verify global state matters. Don't confuse global state above with global state of variables, I mean the external state of the program before and after, which means network, file, time, and other IO things.
IO and global state is also just inputs that can be part of arrange-act-assert. Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Again I've heard "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time", in which case you ought to take a moment to write down what you're trying to accomplish, rather than using mocks to pave over poor architectural decisions. Eventually, the query fails and the mock succeeds, because they were completely unrelated.
So far I've only seen mocks fail eventually and mysteriously. With setups and DI you can treat things mostly as a black box from a testing point of view, but when mocks are involved you need surgical precision to hit the right target at the right time.
The main reason why your mock breaks later is because you refactored the code. You did the one thing tests are supposed to help you do, and the tests broke. If code was never modified, you wouldn't need automated tests. You'd just test it manually one time and never touch it again. The whole point of tests is you probably will rewrite internals later as you add new features, improve performance or just figure out better ways to write things. Mock-heavy tests are completely pointless in this respect. You end up rewriting the code and the test every time you touch it.
There are really only a few reasons to use mocks at all. Like avoiding network services, nondeterminism, or performance reasons. If you need to do a lot of mocking in your tests this is a red flag and a sign that you could write your code differently. In this case you could just make the config file location an optional argument and set up one in a temp location in the tests. No mocking required and you're testing the real API of the config file module.
It is worth pointing out that you can often use containerized services as an alternative to mocking.