Docker Image Guide: How to Build, Run, and Manage Images

text [root@prod-node-04 ~]# docker pull registry.internal.corp/analytics/data-cruncher:latest latest: Pulling from analytics/data-cruncher d5a1f291072d: Already exists f23a467d5e21: Pull complete 4f4fb5514a3d: Pull complete 7d23456789ab: Extracting [==================================================>] 2.1GB/2.1GB 8e34567890cd: Extracting [========================> ] 1.2GB/2.4GB failed to register layer: Error processing tar file(exit status 1): write /usr/lib/x86_64-linux-gnu/libLLVM-15.so.1: no space left on device [root@prod-node-04 ~]# df -h /var/lib/docker Filesystem Size Used Avail Use% … Read more

Top DevOps Best Practices for Faster Software Delivery

Incident ID: #8829-OMEGA. Status: Resolved (Barely). Subject: The day the load balancer decided to become a random number generator. Incident Summary * Duration: 02:04 UTC to 06:12 UTC (4 hours, 8 minutes). * Impact: Total loss of ingress traffic for the api.production.internal and checkout.production.internal zones. Estimated revenue loss: $2.1M. * Root Cause: A “minor” update … Read more

Docker Best Practices: Build Faster, Secure Containers

POST-MORTEM REPORT: THE DAY THE LAYERS COLLAPSED DATE: October 14, 2023 AUDITOR: Lead Infrastructure Engineer (Hardened Systems Division) STATUS: CRITICAL / FORENSIC COMPLETE INCIDENT REF: #882-ALPHA-FAILURE I’ve spent the last 72 hours staring at hex dumps and cleaning up the radioactive sludge left behind by a “standard” deployment. My eyes are bloodshot, my caffeine intake … Read more

Top Artificial Intelligence Best Practices for Success

text [2023-10-27T14:22:01.442Z] kernel: [12409.552101] python3[14201]: segfault at 0 ip 00007f8e12a34b12 sp 00007ffc8e12a340 error 4 in libtorch_cuda.so[7f8e10000000+12a34000] [2023-10-27T14:22:01.443Z] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.50 GiB (GPU 0; 23.65 GiB total capacity; 18.21 GiB already allocated; 4.12 GiB free; 19.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try … Read more

Master Docker Compose: Simplify Multi-Container Workflows

text [2024-05-24T03:14:22.891Z] ERROR: worker-node-04 kernel: [192834.12] Out of memory: Killed process 28491 (python3) total-vm:4.2GB, anon-rss:3.8GB, file-rss:0B, shmem-rss:0B, uid:1000 pgtables:8420kB oom_score_adj:0 [2024-05-24T03:14:23.002Z] CRITICAL: container_id=f3a2b1c0d9e8 exited with code 137. [2024-05-24T03:14:23.450Z] DEBUG: Attempting manual restart of service ‘api-gateway’… [2024-05-24T03:14:23.501Z] ERROR: docker: Error response from daemon: driver failed programming external connectivity on endpoint api-gateway (hash): Bind for 0.0.0.0:8080 failed: … Read more

10 Docker Best Practices to Optimize Your Containers

It is 05:42 AM. I have consumed four double-espressos, two cold slices of pepperoni pizza, and enough adrenaline to stop a rhino’s heart. The production cluster is finally stable, no thanks to the “optimization” PR merged by a junior developer who thought they knew better than the last ten years of containerization history. I’m writing … Read more

Master Docker Compose: The Ultimate Guide for Developers

text [2024-05-22T03:14:02.881Z] ERROR: Container “api-gateway” exited with code 137 (OOMKilled) [2024-05-22T03:14:05.112Z] CRITICAL: Service “auth-provider” failed to bind to 0.0.0.0:8080. Address already in use. [2024-05-22T03:14:05.115Z] FATAL: Dependency check failed. “postgres-db” not reachable at 172.17.0.2:5432. [2024-05-22T03:14:05.118Z] STACK_TRACE: deploy.sh: line 44: docker run -d –name api-gateway … [2024-05-22T03:14:05.120Z] SYSTEM_STATE: Load Average 45.12, 38.01, 22.10. Disk I/O 98% saturated. … Read more