20 seconts difference betwen ntp-synced servers - centos

I've got several CentOS 6 servers, synced to pool.ntp.org time-servers.
But sometimes time on them is out of sync, which make difference for 20-30 seconds, which causes errors in my app.
What can be the cause of this, and where should I look for it?
Config
tinker panic 1000 allan 1500 dispersion 15 step 0.128 stepout 900
statsdir /var/log/ntpstats/
leapfile /etc/ntp.leapseconds
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
disable monitor
server 0.pool.ntp.org iburst minpoll 6 maxpoll 10
restrict 0.pool.ntp.org nomodify notrap noquery
server 1.pool.ntp.org iburst minpoll 6 maxpoll 10
restrict 1.pool.ntp.org nomodify notrap noquery
server 2.pool.ntp.org iburst minpoll 6 maxpoll 10
restrict 2.pool.ntp.org nomodify notrap noquery
server 3.pool.ntp.org iburst minpoll 6 maxpoll 10
restrict 3.pool.ntp.org nomodify notrap noquery
restrict default kod notrap nomodify nopeer noquery
restrict 127.0.0.1 nomodify
restrict -6 default kod notrap nomodify nopeer noquery
restrict -6 ::1 nomodify
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
srv1
remote refid st t when poll reach delay offset jitter
==============================================================================
server01.coloce .STEP. 16 u 11d 1024 0 0.000 0.000 0.000
+mt4.raqxs.net 193.190.230.66 2 u 510 1024 377 6.367 5.984 7.433
+16-164-ftth.ons 193.79.237.14 2 u 217 1024 375 11.339 -0.028 4.564
*services.freshd 213.136.0.252 2 u 419 1024 377 6.735 2.048 4.321
LOCAL(0) .LOCL. 10 l - 64 0 0.000 0.000 0.000
srv2
remote refid st t when poll reach delay offset jitter
==============================================================================
+ntp2.edutel.nl 80.94.65.10 2 u 527 1024 377 11.924 1.469 0.753
-95.211.224.12 193.67.79.202 2 u 364 1024 377 12.989 4.930 0.628
+app.kingsquare. 193.79.237.14 2 u 339 1024 377 5.485 0.493 0.591
*ntp.bserved.nl 193.67.79.202 2 u 206 1024 377 7.007 0.539 0.420
LOCAL(0) .LOCL. 10 l - 64 0 0.000 0.000 0.000

Related

Liferay 7, Hikari-Pool connection not available error, happen on production but not on pre-production environment

First of all, i have two environments strictly identically configured (exept IP) with two vm each. One in pre-production and one in production (currently in configuration phase). There is one vm with a liferay 7.0.6 tomcat bundle (build from 7.0.6-cumulative patch of the community-security-team) and an other with postgresql 9.4.26.
Everything works fine on pre-production environment.
On production environment, a few hours after beginning to create users in liferay i ran into this error (full stack at the end) :
Caused by: java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 937980ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.getTargetConnection(LazyConnectionDataSourceProxy.java:403)
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.invoke(LazyConnectionDataSourceProxy.java:376)
at com.sun.proxy.$Proxy7.prepareStatement(Unknown Source)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:534)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:452)
at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1700)
at org.hibernate.loader.Loader.doQuery(Loader.java:801)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
... 50 more
So i checked if there is differences between my liferay configuration on pre-production and the one in production using comparison software, and exept IP i found nothing. Idem with postgresql configurations on both environment.
I also checked time synchronization between vm and they are both synchronize via ntp on debian pool server.
Database vm :
n$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
0.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
1.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
2.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
3.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
+mail.klausen.dk 193.67.79.202 2 u 116 1024 377 14.435 -0.214 0.358
+any.time.nl 85.199.214.99 2 u 871 1024 377 1.666 -0.183 0.258
-rag.9t4.net 131.188.3.221 2 u 102 1024 377 16.491 -3.769 0.571
*ntp1.m-online.n 212.18.1.106 2 u 318 1024 377 16.608 -0.263 0.240
-tor-relais1.lin 131.188.3.223 2 u 199 1024 377 14.149 0.272 0.661
-www.kashra.com .DCFp. 1 u 150 1024 377 22.623 1.126 0.816
and Liferay vm :
$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
0.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
1.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
2.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
3.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.002
+stratum2-4.ntp. 129.70.137.82 2 u 200 1024 377 30.471 1.069 2.414
+138.201.16.225 131.188.3.221 2 u 696 1024 377 16.613 1.357 0.397
-kuehlich.com 131.188.3.221 2 u 373 1024 377 22.656 2.025 0.885
-time.cloudflare 10.21.8.19 3 u 566 1024 377 8.123 -0.640 0.277
*a.chl.la 131.188.3.222 2 u 167 1024 377 14.472 1.033 2.448
+195.50.171.101 145.253.3.52 2 u 266 1024 377 10.804 0.928 0.395
I also notice an error line in postgresql log about a request of process cancelation on unknown PID happening exactly the amout (937980ms) of milliseconds before the timeout error in Liferay:
LOG: PID 1767 in the cancel request does not match any process
I have tried re-installing Liferay from scratch but nothing change.
It should exist a difference between pre-production and production because it works fine on pre-production but i can't find it.
HikariCP configuration in liferay is default on both environment
jdbc.default.connectionTimeout=30000
jdbc.default.driverClassName=org.postgresql.Driver
jdbc.default.idleConnectionTestPeriod=60
jdbc.default.idleTimeout=600000
jdbc.default.initialPoolSize=10
jdbc.default.liferay.pool.provider=hikaricp
jdbc.default.maxActive=100
jdbc.default.maxIdleTime=3600
jdbc.default.maxLifetime=0
jdbc.default.maxPoolSize=100
jdbc.default.maximumPoolSize=100
jdbc.default.minIdle=10
jdbc.default.minPoolSize=10
jdbc.default.minimumIdle=10
, and same for postgresql :
max_connections = 100
HikariPool full satck from liferay :
2021-06-30 00:12:32.397 ERROR [liferay/scheduler_dispatch-6][JDBCExceptionReporter:234] HikariPool-2 - Connection is not available, request timed out after 937980ms.
2021-06-30 00:12:32.401 ERROR [liferay/scheduler_dispatch-6][BasePersistenceImpl:264] Caught unexpected exception
com.liferay.portal.kernel.exception.SystemException: com.liferay.portal.kernel.dao.orm.ORMException: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
at com.liferay.portal.kernel.service.persistence.impl.BasePersistenceImpl.processException(BasePersistenceImpl.java:270)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl._obtainIncrement(CounterFinderImpl.java:391)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl._competeIncrement(CounterFinderImpl.java:339)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl._competeIncrement(CounterFinderImpl.java:325)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:111)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:100)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl.increment(CounterFinderImpl.java:95)
at com.liferay.counter.service.impl.CounterLocalServiceImpl.increment(CounterLocalServiceImpl.java:42)
at sun.reflect.GeneratedMethodAccessor638.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:163)
at com.liferay.portal.spring.transaction.CounterTransactionExecutor.execute(CounterTransactionExecutor.java:50)
at com.liferay.portal.spring.transaction.TransactionInterceptor.invoke(TransactionInterceptor.java:58)
at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:137)
at com.liferay.portal.spring.aop.ServiceBeanAopProxy.invoke(ServiceBeanAopProxy.java:169)
at com.sun.proxy.$Proxy78.increment(Unknown Source)
at com.liferay.counter.kernel.service.CounterLocalServiceUtil.increment(CounterLocalServiceUtil.java:238)
at com.liferay.portal.kernel.systemevent.SystemEventHierarchyEntryThreadLocal.push(SystemEventHierarchyEntryThreadLocal.java:134)
at com.liferay.portal.kernel.systemevent.SystemEventHierarchyEntryThreadLocal.push(SystemEventHierarchyEntryThreadLocal.java:96)
at com.liferay.portal.repository.capabilities.TemporaryFileEntriesCapabilityImpl._runWithoutSystemEvents(TemporaryFileEntriesCapabilityImpl.java:313)
at com.liferay.portal.repository.capabilities.TemporaryFileEntriesCapabilityImpl.deleteExpiredTemporaryFileEntries(TemporaryFileEntriesCapabilityImpl.java:113)
at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.deleteExpiredTemporaryFileEntries(TempFileEntriesMessageListener.java:111)
at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener$1.performAction(TempFileEntriesMessageListener.java:134)
at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener$1.performAction(TempFileEntriesMessageListener.java:130)
at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.performAction(DefaultActionableDynamicQuery.java:405)
at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery$1.call(DefaultActionableDynamicQuery.java:315)
at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery$1.call(DefaultActionableDynamicQuery.java:277)
at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.doPerformActions(DefaultActionableDynamicQuery.java:335)
at com.liferay.portal.kernel.dao.orm.DefaultActionableDynamicQuery.performActions(DefaultActionableDynamicQuery.java:86)
at com.liferay.document.library.web.internal.messaging.TempFileEntriesMessageListener.doReceive(TempFileEntriesMessageListener.java:139)
at com.liferay.portal.kernel.messaging.BaseMessageListener.receive(BaseMessageListener.java:26)
at com.liferay.portal.kernel.scheduler.messaging.SchedulerEventMessageListenerWrapper.receive(SchedulerEventMessageListenerWrapper.java:66)
at com.liferay.portal.kernel.messaging.InvokerMessageListener.receive(InvokerMessageListener.java:74)
at com.liferay.portal.kernel.messaging.ParallelDestination$1.run(ParallelDestination.java:52)
at com.liferay.portal.kernel.concurrent.ThreadPoolExecutor$WorkerTask._runTask(ThreadPoolExecutor.java:756)
at com.liferay.portal.kernel.concurrent.ThreadPoolExecutor$WorkerTask.run(ThreadPoolExecutor.java:667)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.liferay.portal.kernel.dao.orm.ORMException: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
at com.liferay.portal.dao.orm.hibernate.ExceptionTranslator.translate(ExceptionTranslator.java:34)
at com.liferay.portal.dao.orm.hibernate.SessionImpl.get(SessionImpl.java:205)
at com.liferay.portal.kernel.dao.orm.ClassLoaderSession.get(ClassLoaderSession.java:326)
at com.liferay.counter.service.persistence.impl.CounterFinderImpl._obtainIncrement(CounterFinderImpl.java:369)
... 36 more
Caused by: org.hibernate.exception.GenericJDBCException: could not load an entity: [com.liferay.counter.model.impl.CounterImpl#com.liferay.counter.kernel.model.Counter]
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:140)
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:128)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2041)
at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:86)
at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:76)
at org.hibernate.persister.entity.AbstractEntityPersister.load(AbstractEntityPersister.java:3294)
at org.hibernate.event.def.DefaultLoadEventListener.loadFromDatasource(DefaultLoadEventListener.java:496)
at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:477)
at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:227)
at org.hibernate.event.def.DefaultLoadEventListener.lockAndLoad(DefaultLoadEventListener.java:403)
at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:155)
at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:1090)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:1075)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:1066)
at com.liferay.portal.dao.orm.hibernate.SessionImpl.get(SessionImpl.java:201)
... 38 more
Caused by: java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 937980ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.getTargetConnection(LazyConnectionDataSourceProxy.java:403)
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.invoke(LazyConnectionDataSourceProxy.java:376)
at com.sun.proxy.$Proxy7.prepareStatement(Unknown Source)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:534)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:452)
at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1700)
at org.hibernate.loader.Loader.doQuery(Loader.java:801)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
... 50 more
I'm actually out of other idea to test, any help would be thankfull.
Finally we found the solution.
On pre-production environment the two vms are on the same VLAN, which was not the case on production.
Solution: putting the vms on the same VLAN solve the problem.

Ceph OSDs are full, but I have not stored that much of data

I have a Ceph cluster running with 18 X 600GB OSDs. There are three pools (size:3, pg_num:64) with an image size of 200GB on each, and there are 6 servers connected to these images via iSCSI and storing about 20 VMs on them. Here is the output of "ceph df":
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
cephfs_data 1 0 B 0 0 B 0 0 B
cephfs_metadata 2 17 KiB 22 1.5 MiB 100.00 0 B
defaults.rgw.buckets.data 3 0 B 0 0 B 0 0 B
defaults.rgw.buckets.index 4 0 B 0 0 B 0 0 B
.rgw.root 5 2.0 KiB 5 960 KiB 100.00 0 B
default.rgw.control 6 0 B 8 0 B 0 0 B
default.rgw.meta 7 393 B 2 384 KiB 100.00 0 B
default.rgw.log 8 0 B 207 0 B 0 0 B
rbd 9 150 GiB 38.46k 450 GiB 100.00 0 B
rbd3 13 270 GiB 69.24k 811 GiB 100.00 0 B
rbd2 14 150 GiB 38.52k 451 GiB 100.00 0 B
Based on this, I expect about 1.7 TB RAW capacity usage, BUT it is currently about 9TBs!
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 9.8 TiB 870 GiB 9.0 TiB 9.0 TiB 91.35
TOTAL 9.8 TiB 870 GiB 9.0 TiB 9.0 TiB 91.35
And the cluster is down because there is very few capacity remained. I wonder what makes this and how can I get it fixed.
Your help is much appreciated
The problem was mounting the iSCSI target without discard option.
Since I am using RedHat Virtualization, I just modified all storage domains created on top of Ceph, and enabled "discard" on them1. Just after a few hours, about 1 TB of storage released. Now it is about 12 hours passed and 5 TB of storage is released.
Thanks

What read throughput should be expected out of google cloud storage from a compute engine instance?

I am trying to get a feel for what I should expect in terms of performance from cloud storage.
I just ran the gsutil perfdiag from a compute engine instance in the same location (US) and the same project as my cloud storage bucket.
For nearline storage, I get a 25 Mibit/s read and 353 Mibit/s write, is that low / high / average, why such discrepancy between read and write ?
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 112.0 52.9 78.2 173.6
Delete 1 KiB 5 94.1 17.5 90.8 115.0
Delete 100 KiB 5 80.4 2.5 79.9 83.4
Delete 1 MiB 5 86.7 3.7 88.2 90.4
Download 0 B 5 58.1 3.8 57.8 62.2
Download 1 KiB 5 2892.4 1071.5 2589.1 4111.9
Download 100 KiB 5 1955.0 711.3 1764.9 2814.3
Download 1 MiB 5 2679.4 976.2 2216.2 3869.9
Metadata 0 B 5 69.1 57.0 42.8 129.3
Metadata 1 KiB 5 37.4 1.5 37.1 39.0
Metadata 100 KiB 5 64.2 47.7 40.9 113.0
Metadata 1 MiB 5 45.7 9.1 49.4 55.1
Upload 0 B 5 138.3 21.0 122.5 164.8
Upload 1 KiB 5 170.6 61.5 139.4 242.0
Upload 100 KiB 5 387.2 294.5 245.8 706.1
Upload 1 MiB 5 257.4 51.3 228.4 319.7
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 353.13 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 25.16 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
##.###.###.##
Temporary Directory:
/tmp
Bucket URI:
gs://pl_twitter/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-11 07:03:26 PM
Google Server:
Google Server IP Addresses:
##.###.###.###
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
4
CPU Load Average:
[0.16, 0.05, 0.06]
Total Memory:
14.38 GiB
Free Memory:
11.34 GiB
TCP segments sent during test:
5592296
TCP segments received during test:
2417850
TCP segments retransmit during test:
3794
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
sda1 31 5775 126976 1091674112 856 1603544
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
2.5
Latencies connecting to Google Storage server IPs (ms):
##.###.###.### = 1.1
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
For standard storage I get:
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 121.9 34.8 105.1 158.9
Delete 1 KiB 5 159.3 58.2 126.0 232.3
Delete 100 KiB 5 106.8 17.0 103.3 125.7
Delete 1 MiB 5 167.0 77.3 145.1 251.0
Download 0 B 5 87.2 10.3 81.1 100.0
Download 1 KiB 5 95.5 18.0 92.4 115.6
Download 100 KiB 5 156.7 20.5 155.8 179.6
Download 1 MiB 5 219.6 11.7 213.4 232.6
Metadata 0 B 5 59.7 4.5 57.8 64.4
Metadata 1 KiB 5 61.0 21.8 49.6 85.4
Metadata 100 KiB 5 55.3 10.4 50.7 67.7
Metadata 1 MiB 5 75.6 27.8 67.4 109.0
Upload 0 B 5 162.7 37.0 139.0 207.7
Upload 1 KiB 5 165.2 23.6 152.3 194.1
Upload 100 KiB 5 392.1 235.0 268.7 643.0
Upload 1 MiB 5 387.0 79.5 340.9 486.1
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 515.63 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 123.14 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
10.240.133.190
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 11:08:50 AM
Google Server:
Google Server IP Addresses:
##.###.##.###
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.28, 0.18, 0.08]
Total Memory:
Upload 1 MiB 5 387.0 79.5 340.9 486.1
49.91 GiB
Free Memory:
47.9 GiB
TCP segments sent during test:
5165461
TCP segments received during test:
1881727
TCP segments retransmit during test:
3423
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4229 0 1080618496 0 1605286
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
1.2
Latencies connecting to Google Storage server IPs (ms):
##.###.##.### = 1.3
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 145.1 59.4 117.8 215.2
Delete 1 KiB 5 178.0 51.4 190.6 224.3
Delete 100 KiB 5 98.3 5.0 96.6 104.3
Delete 1 MiB 5 117.7 19.2 112.0 140.2
Download 0 B 5 109.4 38.9 91.9 156.5
Download 1 KiB 5 149.5 41.0 141.9 192.5
Download 100 KiB 5 106.9 20.3 108.6 127.8
Download 1 MiB 5 121.1 16.0 112.2 140.9
Metadata 0 B 5 70.0 10.8 76.8 79.9
Metadata 1 KiB 5 113.8 36.6 124.0 148.7
Metadata 100 KiB 5 63.1 20.2 55.7 86.5
Metadata 1 MiB 5 59.2 4.9 61.3 62.9
Upload 0 B 5 127.5 22.6 117.4 153.6
Upload 1 KiB 5 215.2 54.8 221.4 270.4
Upload 100 KiB 5 229.8 79.2 171.6 329.8
Upload 1 MiB 5 489.8 412.3 295.3 915.4
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 503 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 1.05 Gibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
################
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:20:49 PM
Google Server:
Google Server IP Addresses:
#############
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.08, 0.03, 0.05]
Total Memory:
49.91 GiB
Free Memory:
47.95 GiB
TCP segments sent during test:
4958020
TCP segments received during test:
2326124
TCP segments retransmit during test:
2163
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4202 0 1080475136 0 1610000
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
1.6
Latencies connecting to Google Storage server IPs (ms):
############ = 1.3
2nd Run:
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 91.5 14.0 85.1 106.0
Delete 1 KiB 5 125.4 76.2 91.7 203.3
Delete 100 KiB 5 104.4 15.9 99.0 123.2
Delete 1 MiB 5 128.2 36.0 116.4 170.7
Download 0 B 5 60.2 8.3 63.0 68.7
Download 1 KiB 5 62.6 11.3 61.6 74.8
Download 100 KiB 5 103.2 21.3 110.7 123.8
Download 1 MiB 5 137.1 18.5 130.3 159.8
Metadata 0 B 5 73.4 35.9 62.3 114.2
Metadata 1 KiB 5 55.9 18.1 55.3 75.6
Metadata 100 KiB 5 45.7 11.0 42.5 59.1
Metadata 1 MiB 5 49.9 7.9 49.2 58.8
Upload 0 B 5 128.2 24.6 115.5 158.8
Upload 1 KiB 5 153.5 44.1 132.4 206.4
Upload 100 KiB 5 176.8 26.8 165.1 209.7
Upload 1 MiB 5 277.9 80.2 214.7 378.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 463.76 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 184.96 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
#################
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:24:31 PM
Google Server:
Google Server IP Addresses:
####################
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.19, 0.17, 0.11]
Total Memory:
49.91 GiB
Free Memory:
47.9 GiB
TCP segments sent during test:
5180256
TCP segments received during test:
2034323
TCP segments retransmit during test:
2883
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4209 0 1080480768 0 1604066
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
3.5
Latencies connecting to Google Storage server IPs (ms):
################ = 1.1
------------------------------------------------------------------------------
In-Process HTTP Statistics
------------------------------------------------------------------------------
Total HTTP requests made: 94
HTTP 5xx errors: 0
HTTP connections broken: 0
Availability: 100%
3rd run
==============================================================================
DIAGNOSTIC RESULTS
==============================================================================
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 5 157.0 78.3 101.5 254.9
Delete 1 KiB 5 153.5 49.1 178.3 202.5
Delete 100 KiB 5 152.9 47.5 168.0 202.6
Delete 1 MiB 5 110.6 20.4 105.7 134.5
Download 0 B 5 104.4 50.5 66.8 167.6
Download 1 KiB 5 68.1 11.1 68.7 79.2
Download 100 KiB 5 85.5 5.8 86.0 90.8
Download 1 MiB 5 126.6 40.1 100.5 175.0
Metadata 0 B 5 67.9 16.2 61.0 86.6
Metadata 1 KiB 5 49.3 8.6 44.9 59.5
Metadata 100 KiB 5 66.6 35.4 44.2 107.8
Metadata 1 MiB 5 53.9 13.2 52.1 69.4
Upload 0 B 5 136.7 37.1 114.4 183.5
Upload 1 KiB 5 145.5 58.3 116.8 208.2
Upload 100 KiB 5 227.3 37.6 233.3 259.3
Upload 1 MiB 5 274.8 45.2 261.8 328.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Write throughput: 407.03 Mibit/s.
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied a 1 GiB file 5 times for a total transfer size of 5 GiB.
Read throughput: 629.07 Mibit/s.
------------------------------------------------------------------------------
System Information
------------------------------------------------------------------------------
IP Address:
###############
Temporary Directory:
/tmp
Bucket URI:
gs://test_throughput_standard/
gsutil Version:
4.12
boto Version:
2.30.0
Measurement time:
2015-05-21 06:32:48 PM
Google Server:
Google Server IP Addresses:
################
Google Server Hostnames:
Google DNS thinks your IP is:
CPU Count:
8
CPU Load Average:
[0.11, 0.13, 0.13]
Total Memory:
49.91 GiB
Free Memory:
47.94 GiB
TCP segments sent during test:
5603925
TCP segments received during test:
2438425
TCP segments retransmit during test:
4586
Disk Counter Deltas:
disk reads writes rbytes wbytes rtime wtime
dm-0 0 0 0 0 0 0
loop0 0 0 0 0 0 0
loop1 0 0 0 0 0 0
sda1 0 4185 0 1080353792 0 1603851
TCP /proc values:
wmem_default = 212992
wmem_max = 212992
rmem_default = 212992
tcp_timestamps = 1
tcp_window_scaling = 1
tcp_sack = 1
rmem_max = 212992
Boto HTTPS Enabled:
True
Requests routed through proxy:
False
Latency of the DNS lookup for Google Storage server (ms):
2.2
Latencies connecting to Google Storage server IPs (ms):
############## = 1.6
All things being equal, write performance is generally higher for modern storage systems because of presence of a caching layer between the application disks, that said, what you are seeing is within the expected range for "nearline" storage.
I have observed far superior throughput results when using "standard" storage buckets. Though latency did not improve much. Consider using a "Standard" bucket if your application requires high throughput. If your application is sensitive to latency, then using local storage as a cache (or scratch space) may be the only option.
Here is a snippet from one my experiments on "Standard" buckets:
------------------------------------------------------------------------------
Latency
------------------------------------------------------------------------------
Operation Size Trials Mean (ms) Std Dev (ms) Median (ms) 90th % (ms)
========= ========= ====== ========= ============ =========== ===========
Delete 0 B 10 91.5 12.4 89.0 98.5
Delete 1 KiB 10 96.4 9.1 95.6 105.6
Delete 100 KiB 10 92.9 22.8 85.3 102.4
Delete 1 MiB 10 86.4 9.1 84.1 93.2
Download 0 B 10 54.2 5.1 55.4 58.8
Download 1 KiB 10 83.3 18.7 78.4 94.9
Download 100 KiB 10 75.2 14.5 68.6 92.6
Download 1 MiB 10 95.0 19.7 86.3 126.7
Metadata 0 B 10 33.5 7.9 31.1 44.8
Metadata 1 KiB 10 36.3 7.2 35.8 46.8
Metadata 100 KiB 10 37.7 9.2 36.6 44.1
Metadata 1 MiB 10 116.1 231.3 36.6 136.1
Upload 0 B 10 151.4 67.5 122.9 195.9
Upload 1 KiB 10 134.2 22.4 127.9 149.3
Upload 100 KiB 10 168.8 20.5 168.6 188.6
Upload 1 MiB 10 213.3 37.6 200.2 262.5
------------------------------------------------------------------------------
Write Throughput
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Write throughput: 3.46 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Write Throughput With File I/O
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Write throughput: 3.9 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Read Throughput
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Read throughput: 7.04 Gibit/s.
Parallelism strategy: both
------------------------------------------------------------------------------
Read Throughput With File I/O
------------------------------------------------------------------------------
Copied 5 1 GiB file(s) for a total transfer size of 10 GiB.
Read throughput: 1.64 Gibit/s.
Parallelism strategy: both
Hope that is helpful.

CEPH raw space usage

I can't understand, where my ceph raw space is gone.
cluster 90dc9682-8f2c-4c8e-a589-13898965b974
health HEALTH_WARN 72 pgs backfill; 26 pgs backfill_toofull; 51 pgs backfilling; 141 pgs stuck unclean; 5 requests are blocked > 32 sec; recovery 450170/8427917 objects degraded (5.341%); 5 near full osd(s)
monmap e17: 3 mons at {enc18=192.168.100.40:6789/0,enc24=192.168.100.43:6789/0,enc26=192.168.100.44:6789/0}, election epoch 734, quorum 0,1,2 enc18,enc24,enc26
osdmap e3326: 14 osds: 14 up, 14 in
pgmap v5461448: 1152 pgs, 3 pools, 15252 GB data, 3831 kobjects
31109 GB used, 7974 GB / 39084 GB avail
450170/8427917 objects degraded (5.341%)
18 active+remapped+backfill_toofull
1011 active+clean
64 active+remapped+wait_backfill
8 active+remapped+wait_backfill+backfill_toofull
51 active+remapped+backfilling
recovery io 58806 kB/s, 14 objects/s
OSD tree (each host has 2 OSD):
# id weight type name up/down reweight
-1 36.45 root default
-2 5.44 host enc26
0 2.72 osd.0 up 1
1 2.72 osd.1 up 0.8227
-3 3.71 host enc24
2 0.99 osd.2 up 1
3 2.72 osd.3 up 1
-4 5.46 host enc22
4 2.73 osd.4 up 0.8
5 2.73 osd.5 up 1
-5 5.46 host enc18
6 2.73 osd.6 up 1
7 2.73 osd.7 up 1
-6 5.46 host enc20
9 2.73 osd.9 up 0.8
8 2.73 osd.8 up 1
-7 0 host enc28
-8 5.46 host archives
12 2.73 osd.12 up 1
13 2.73 osd.13 up 1
-9 5.46 host enc27
10 2.73 osd.10 up 1
11 2.73 osd.11 up 1
Real usage:
/dev/rbd0 14T 7.9T 5.5T 59% /mnt/ceph
Pool size:
osd pool default size = 2
Pools:
ceph osd lspools
0 data,1 metadata,2 rbd,
rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
data - 0 0 0 0 0 0 0 0 0
metadata - 0 0 0 0 0 0 0 0 0
rbd - 15993591918 3923880 0 444545 0 82936 1373339 2711424 849398218
total used 32631712348 3923880
total avail 8351008324
total space 40982720672
Raw usage is 4x real usage. As I understand, it must be 2x ?
Yes, it must be 2x. I don't really shure, that the real raw usage is 7.9T. Why do you check this value on mapped disk?
This are my pools:
pool name KB objects clones degraded unfound rd rd KB wr wr KB
admin-pack 7689982 1955 0 0 0 693841 3231750 40068930 353462603
public-cloud 105432663 26561 0 0 0 13001298 638035025 222540884 3740413431
rbdkvm_sata 32624026697 7968550 31783 0 0 4950258575 232374308589 12772302818 278106113879
total used 98289353680 7997066
total avail 34474223648
total space 132763577328
You can see, that the total amount of used space is 3 times more than the used space in the pool rbdkvm_sata (+-).
ceph -s shows the same result too:
pgmap v11303091: 5376 pgs, 3 pools, 31220 GB data, 7809 kobjects
93736 GB used, 32876 GB / 123 TB avail
I don't think you have just one rbd image. The result of "ceph osd lspools" indicated that you had 3 pools and one of pools had name "metadata".(Maybe you were using cephfs). /dev/rbd0 was appeared because you mapped the image but you could have other images also. To list the images you can use "rbd list -p ". You can see the image info with "rbd info -p "

uwsgi long timeouts

I am using ubuntu 12, nginx, uwsgi 1.9 with socket, django 1.5.
Config:
[uwsgi]
base_path = /home/someuser/web/
module = server.manage_uwsgi
uid = www-data
gid = www-data
virtualenv = /home/someuser
master = true
vacuum = true
harakiri = 20
harakiri-verbose = true
log-x-forwarded-for = true
profiler = true
no-orphans = true
max-requests = 10000
cpu-affinity = 1
workers = 4
reload-on-as = 512
listen = 3000
Client tests from Windows7:
C:\Users\user>C:\AppServ\Apache2.2\bin\ab.exe -c 255 -n 5000 http://www.someweb.com/about/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking www.someweb.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Finished 5000 requests
Server Software: nginx
Server Hostname: www.someweb.com
Server Port: 80
Document Path: /about/
Document Length: 1881 bytes
Concurrency Level: 255
Time taken for tests: 66.669814 seconds
Complete requests: 5000
Failed requests: 1
(Connect: 1, Length: 0, Exceptions: 0)
Write errors: 0
Total transferred: 10285000 bytes
HTML transferred: 9405000 bytes
Requests per second: 75.00 [#/sec] (mean)
Time per request: 3400.161 [ms] (mean)
Time per request: 13.334 [ms] (mean, across all concurrent requests)
Transfer rate: 150.64 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 8 207.8 1 9007
Processing: 10 3380 11480.5 440 54421
Waiting: 6 1060 3396.5 271 48424
Total: 11 3389 11498.5 441 54423
Percentage of the requests served within a certain time (ms)
50% 441
66% 466
75% 499
80% 519
90% 3415
95% 36440
98% 54407
99% 54413
100% 54423 (longest request)
I have set following options too:
echo 3000 > /proc/sys/net/core/netdev_max_backlog
echo 3000 > /proc/sys/net/core/somaxconn
So,
1) I make first 3000 requests super fast. I see progress in ab and in uwsgi requests logs -
[pid: 5056|app: 0|req: 518/4997] 80.114.157.139 () {30 vars in 378 bytes} [Thu Mar 21 12:37:31 2013] GET /about/ => generated 1881 bytes in 4 msecs (HTTP/1.0 200) 3 headers in 105 bytes (1 switches on core 0)
[pid: 5052|app: 0|req: 512/4998] 80.114.157.139 () {30 vars in 378 bytes} [Thu Mar 21 12:37:31 2013] GET /about/ => generated 1881 bytes in 4 msecs (HTTP/1.0 200) 3 headers in 105 bytes (1 switches on core 0)
[pid: 5054|app: 0|req: 353/4999] 80.114.157.139 () {30 vars in 378 bytes} [Thu Mar 21 12:37:31 2013] GET /about/ => generated 1881 bytes in 4 msecs (HTTP/1.0 200) 3 headers in 105 bytes (1 switches on core 0)
I dont have any broken pipes or worker respawns.
2) Next requests are running very slow or with some timeout. Looks like that some buffer becomes full and I am waiting before it becomes empty.
3) Some buffer becomes empty.
4) ~500 requests are processed super fast.
5) Some timeout.
6) see Nr. 4
7) see Nr. 5
8) see Nr. 4
9) see Nr. 5
....
....
Need your help
check with netstat and dmesg. You have probably exhausted ephemeral ports or filled the conntrack table.