Commit Graph

31 Commits

Author SHA1 Message Date
Bert Blommers
3741058242
Techdebt: Replace sure with regular asserts in Batch (#6413) 2023-06-16 10:42:07 +00:00
Bert Blommers
24ed6c8d34
Add support for AWS China endpoints (#3661) 2021-10-18 16:13:08 +00:00
Bert Blommers
ee6f20e376
Batch - Test rework (#4134) 2021-08-04 13:40:10 +01:00
Thomas Maschler
d635c78bd1
AWS Batch enhancements (#3956)
* Check exit status of container

* Added support for job dependencies

* batch container overrides

* add AWS_BATCH_JOB_ID to container env variables

* lint with black

* refactor batch dependency test

* refactor batch dependency test

* fix index

Co-authored-by: jterry64 <justin.terry@wri.org>
Co-authored-by: Daniel Mannarino <daniel.mannarino@gmail.com>
2021-05-26 08:52:09 +01:00
Brian Pandola
f7467164e4
Fix Race Condition in batch:SubmitJob (#3480)
* Extract Duplicate Code into Helper Method

DRY up the tests and replace the arbitrary `sleep()` calls with a more
explicit check before progressing.

* Improve Testing of batch:TerminateJob

The test now confirms that the job was terminated by sandwiching a `sleep`
command between two `echo` commands.  In addition to the original checks
of the terminated job status/reason, the test now asserts that only the
first echo command succeeded, confirming that the job was indeed terminated
while in progress.

* Fix Race Condition in batch:SubmitJob

The `test_submit_job` in `test_batch.py` kicks off a job, calls `describe_jobs`
in a loop until the job status returned is SUCCEEDED, and then asserts against
the logged events.

The backend code that runs the submitted job does so in a separate thread. If
the job was successful, the job status was being set to SUCCEEDED *before* the
event logs had been written to the logging backend.

As a result, it was possible for the primary thread running the test to detect
that the job was successful immediately after the secondary thread had updated
the job status but before the secondary thread had written the logs to the
logging backend.  Under the right conditions, this could cause the subsequent
logging assertions in the primary thread to fail.

Additionally, the code that collected the logs from the container was using
a "dodgy hack" of time.sleep() and a modulo-based conditional that was
ultimately non-deterministic and could result in log messages being dropped
or duplicated in certain scenarios.

In order to address these issues, this commit does the following:

* Carefully re-orders any code that sets a job status or timestamp
  to avoid any obvious race conditions.
* Removes the "dodgy hack" in favor of a much more straightforward
  (and less error-prone) method of collecting logs from the container.
* Removes arbitrary and unnecessary calls to time.sleep()

Before applying any changes, the flaky test was failing about 12% of the
time.  Putting a sleep() call between setting the `job_status` to SUCCEEDED
and collecting the logs, resulted in a 100% failure rate.  Simply moving
the code that sets the job status to SUCCEEDED to the end of the code block,
dropped the failure rate to ~2%.  Finally, removing the log collection
hack allowed the test suite to run ~1000 times without a single failure.

Taken in aggregate, these changes make the batch backend more deterministic
and should put the nail in the coffin of this flaky test.

Closes #3475
2020-11-18 10:49:25 +00:00
Bert Blommers
273ca63d59 Linting 2020-11-11 15:55:37 +00:00
Bert Blommers
cb6731f340 Convert fixtures/exceptions to Pytest 2020-11-11 15:54:01 +00:00
Matěj Cepl
ea489bce6c Finish porting from nose to pytest. 2020-11-10 08:25:05 +01:00
Matěj Cepl
77dc60ea97 Port test suite from nose to pytest.
This just eliminates all errors on the tests collection. Elimination of
failures is left to the next commit.
2020-11-10 08:23:44 +01:00
Bert Blommers
bed769a387
Tech debt - increase test timeouts to remove intermittant test failures (#3146) 2020-07-17 12:11:47 +01:00
Bert Blommers
1b031aeeb0 Linting 2020-03-12 14:07:34 +00:00
Bert Blommers
bb5a54ca4b Batch - Fix tests 2020-03-12 13:37:46 +00:00
mzgierski
ad5314ad06 Enable the test that AWS-Batch describe_jobs fails at. 2020-03-12 12:27:22 +00:00
Mike Grima
7b63b2818e
Merge pull request #2631 from kislyuk/patch-2
Batch: computeResources.instanceRole is an instance profile
2019-12-14 13:46:08 -08:00
Andrey Kislyuk
624dafcb5c Fix tests 2019-12-13 18:46:26 +00:00
Andrey Kislyuk
777ff5a62a Add test 2019-12-13 05:07:29 +00:00
Asher Foa
96e5b1993d Run black on moto & test directories. 2019-10-31 10:36:05 -07:00
William Harvey
0d4d2b7041 Fix/tighten AWS Batch test_reregister_task_definition unit test 2019-09-10 14:24:00 -04:00
Berislav Kovacki
a35a55ec26 Add option to call batch submit_job with job definition name only
* Add option to call batch submit_job with job definition name only
* Fix bug which causes register_job_definition not to increment job
revision number after a second revision
2019-08-06 22:13:52 +02:00
William Rubel
e35d99ff09 Trying to improve coverage 2019-02-17 09:25:35 -06:00
Terry Cain
e3024ae1ba
Implemented Terminate, Cancel and List jobs 2017-10-11 23:46:27 +01:00
Terry Cain
e135344f0c
Added simple SubmitJob and DescribeJobs 2017-10-06 01:21:29 +01:00
Terry Cain
0ca3fcc7a2
Added DescribeJobDefinitions 2017-10-05 00:00:40 +01:00
Terry Cain
558f246115
Added RegisterJobDefinition 2017-10-04 20:17:29 +01:00
Terry Cain
4a45acc216
Implemented Update and Delete job queue 2017-10-04 18:52:12 +01:00
Terry Cain
b8f24298fd
Added filtering test part 2017-10-03 23:28:10 +01:00
Terry Cain
15218df12f
Added CreateJobQueue and DescribeJobQueue 2017-10-03 23:21:06 +01:00
Terry Cain
88a11b21ae
Added DeleteComputeEnvironment and UpdateComputeEnvironment 2017-10-03 22:35:30 +01:00
Terry Cain
f95d72c37c
Finialised create compute environment + describe environments 2017-09-29 23:29:36 +01:00
Terry Cain
56e4300ad4
Added preliminary CreateComputeEnvironment 2017-09-26 22:22:59 +01:00
Terry Cain
bba6d23eae Started on batch 2017-09-26 17:37:26 +01:00